7,539 Matching Annotations
  1. Last 7 days
    1. How do you think about the authenticity of the Tweets that come from Trump himself?

      I actually think that the tweets coming from Trump himself in this context are MORE authentic than the ones coming from his campaign team. As users of the platform and people living in this country, we expect a certain flavor of content out of Trump's tweets. Seeing posts from his campaign next to posts from Trump in some ways acts to muddy the waters. If the public face of a presidential candidate was always angry and negative, many voters may be turned away from that candidate. But by intermixing calm, structured posts, it makes the candidate appear able to switch their anger and negativity on and off. Which may be more appealing to voters.

    1. Author response:

      General Statements

      We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous nonquantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions.

      Point-by-point description of the revisions

      Reviewer #1:

      Significance

      At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.  

      Evidence, reproducibility and clarity

      The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either ‘lost’ (filtered out) or ‘gained’ (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.

      Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      We thank the reviewer for this important comment. We replaced ‘regulon’ throughout the manuscript by ‘co-regulated, functional gene clusters’ (or similar).

      It is unclear whether the findings in Fig.3E are based on previous analysis of stagespecific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stageregulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms ‘ribosome biogenesis’, ‘rRNA processing’ and ‘RNA methylation’ shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      This control is now included in Fig S7 and we have added the corresponding description to the text.

      I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      (1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Rectified, thanks for pointing this out.

      (2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Rectified, thanks for pointing this out.

      (3) Fig.1D: "SNP frequency" is the preferred term in English.

      Corrected.

      (4) Fig.2A: not sure what "counts}1" mean.

      This figure has been replaced.

      (5) Ln 685: "Transcripts with FC < 2 and adjusted p-value > 0.01 are represented by black dots" > This sentence is inaccurate. The intended wording might be: "Transcripts with FC < 2 OR adjusted p-value > 0.01 are represented by black dots"

      We thank the reviewer and corrected accordingly.  

      (6) Ln 698: Same as ln 685 mentioned above.

      We thank the reviewer and corrected accordingly.

      (7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      This was corrected in the figure and the legends were updated accordingly.

      (8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      (9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      This figure was removed and the legend modified.

      (10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.

      Reviewer #2 Evidence, reproducibility and clarity:

      In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      Major comments:

      (1) As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.   

      (2) There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      (3) I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      (4) If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      See response to comment 1 above.

      (5) As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      This control has now been added to Fig S7.

      (6) It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      (7) All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      (8) Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      See response to comment 1 above.

      (9) Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      (10) It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      (11) Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC–MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR<1%), (ii) robust phosphosite localisation probabilities (localisation probability >0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.

      (12) For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0–100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach – see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term “phosphostoichiometry” is imprecise and not correct in this context.

      In response, we (i) replaced the term “phosphostoichiometry” throughout the manuscript with a more accurate description, such as “normalized phosphorylation level”, or “relative phosphorylation change normalized to protein abundance”, and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting pvalues thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      The references have been formatted.

      Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein–protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, coexpression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      We apologize and have translated the text in English.

      I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      We have the following statement to the legend: ‘Confidence values were derived as described in Supplementary Methods’.

      Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter’s potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d 0

      RNAseq data: https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f4c06-b4bd-bc10f2dc0b14

      Proteomic data:  http://www.ebi.ac.uk/pride

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      Significance

      Strengths:

      (1) The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      We thank the reviewer for this positive assessment of our work.

      (2) The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations:

      Potential lack of appropriate replication (see above).

      See response to comment 1.

      Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      See response to comment 2 above.

      The study applies well established techniques without any particular technical stepchange. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      We thank the reviewer for these positive comments.

      This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:

      (1) Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as ‘coexpression’.  

      (2) Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      We agree with the reviewer and have replaced ‘regulon’ with ‘co-regulated gene clusters’ (or similar).

      (3) LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value < 0.01), 1534 of which were exclusively detected in either ama or pro." If a protein is exclusively detected in one stage, then by definition it should not be detected in that number of replicates at both stages. This apparent contradiction should be clarified.

      We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: ‘Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value < 0.01), and 1534 that were exclusively detected in either ama or pro (Figure 3A left panel, Table 6).’ We also modified the legend of the Figure 3B. Concerning missing values that could be either missing not at random (MNAR) or missing completely at random (MCAR), rather than introducing potentially misleading imputed values, we chose to treat these missing values as genuine stage-specific differences (presence/absence): quantitative statistics are restricted to proteins with measurable LFQ in both stages, while proteins with consistent presence in one stage and non-detection in the other are reported as stage-restricted detections. We believe this strategy is transparent and minimizes modeling assumptions, while still highlighting robust stage-specific signals. Our approach is supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.  

      (4) L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      We thank the reviewer for this comment. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’ We also deleted the ‘infinity’ symbol from the Figure.

      Minor Comments:

      Methods

      L132: Typo: "A according" should be "according."

      The ‘A’ refers to RNase A. We added a comma for clarification (…RNase A, according to…)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: “The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics.” The description then goes into further detail.  

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      We thank the reviewer for this comment. Unlike the paper cited above (using longterm cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Corrected to ‘validated’

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      Results

      L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      We thank the reviewers for these suggestions and have reformulated into: ‘In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.’

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Thank you for the comment – we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      This has been corrected to ‘The discrepancies we observed in a sub-set of genes between….’.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      We added this information to the text (‘some of the most significantly enrichment terms included …’).

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      This statement was too speculative and has been removed. Instead, we added ‘Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner’.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a wellestablished regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      We deleted the term as suggested and reformulated to ‘….our results confirm the important role of protein degradation….’.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      Discussion

      L555: As noted in L494, reconsider using the word "unexpected."

      Removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Corrected to ‘some of the most significantly enriched GO terms’.

      Referee cross-commenting

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      (1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      (2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      (3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      (4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      (5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance):

      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.  

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      - The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.

      - RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and posttranscriptional regulation itself.

      - Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      We thank the reviewer for the time and implication dedicated to our manuscript.  

      Further details are organised by order of apparition in the text:

      Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      We corrected this sentence to ‘Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2’.

      L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNAseq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some overrepresented tandem gene arrays where all gene copies share the same location and GO term).

      L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      We thank the reviewer for this comment and have corrected the statement accordingly.

      Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stagespecific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’

      L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.  

      The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      We rephrased the conclusion to ‘In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.’ Please see the response to comment 9 regarding the unique proteins.

      L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      We agree with the reviewer and have toned this statement down by adding the statement ‘….or simply be a consequence of culture adaptation’.

      The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      A couple of typos:

      In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      L225 "...peptide match was disable." should be "...peptide match was disabled."

      Both corrected

      Reviewer #4 (Significance):

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely nonfunctional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificityswitching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

      This is a key question, which was one of the original motivations of our work. Both hypothesis of ‘abrupt switches’ (punctuated equilibria, corresponding to distinct specificities) and more gradual changes (smooth transition, through intermediate that exhibit mixed or intermediate specificity) are possible.

      Many natural specificity-switching events have probably resulted from the need to adapt to environmental change and selection for a different specificity, which can be compatible with an abrupt change in specificity. Others may reflect the gradual evolution of promiscuous ancestral sequences to more specialized ones, loosing cross-reactivity. A molecular mechanism that could allow abrupt switching is gene duplication, a frequent mechanism for WW domain diversification, beyond standard mutational-driven evolution processes.  

      As for the specificity-switching paths for WW domains found in this work, the presence of weakly responsive cross-reactive intermediates along the designed paths for I<->IV, and their absence in the I<->II path, suggests that designing promiscuous domains is hard (see also related response to point 3 of Reviewer 2) and generally not selected by natural evolution (as seen from the clear clustering of extant proteins in different specificity classes). 

      For a small domain such as WW, mutations that favor some specificity classes are known to have detrimental effects on fundamental properties, such as folding kinetics and stability, see Ref [72]. It is possible that larger, less constrained protein domains could allow for more crossreactive variants and smoother specifity switching. However, experiments on fluorescent proteins looking for interpolation between two wave-lengths have shown that the switch was abrupt [Poelwijk et al. Nature Communications (2019)].

      Our scope was to achieve a functional switch (imposed by the two extant end-points) through a path of designed, functional intermediates and to correctly predict, with our RBM model, the location of the specificity transition and of the cross-reactivity region (which we expected only along the I-IV path). This scope was successfully reached as demonstrated by experiments.  

      Reviewer #2:

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      We agree with Reviewer 2 that WW sequences are short and simple to handle from a computational point of view, and was chosen for this reason to test the design of full mutational paths (after having benchmarked it to lattice-protein models, see Refs. [30] and [44]). Our work gives additional support to the effectiveness of generative models learned from sequence data.  This said, from a biological point of view, WW is a highly constrained domain, see comment by Reviewer 1 above and our answer.

      In longer and more complex proteins, we expect it will be more difficult to disentangle specificityswitching latent units, see Fernandez-de-Cossio-Diaz et al., Physical Review X 2023 for a discussion and a possible computational approach to this issue. Notice that, while relating the latent units to specificity classes was convenient, it was not used to generate the paths themselves. Therefore, we believe that our method is quite robust and easily generalizable to applications to more complex and longer proteins. As an illustration, we have recently used it to sample viral trajectories (more precisely, variants of the Receptor Binding Domain of the SARSCoV-2 spike protein) capable of escaping antibody recognition, see Huot et al., PNAS 2026. In this recent work, we projected the paths onto the principal antigenic space, defined by the top two Principal Components of the viral variant binding affinities to 32 antibodies. In this representation, sampled paths displayed trends similar to natural paths, drawn from the sequences sampled during the pandemics. This finding supports the applicability and interpretation of our method for more complex proteins.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      We think that this finding, true for paths connecting classes I and IV, is not general. In a previous paper we have benchmarked our path-designing approach on simple models of insilico lattice proteins and shown that indirect path led to gains in the overall fitness (computed according with the ground-truth model) [Mauri, Cocco, Monasson, Physical Review E 2023, fig. 9-12].

      In general, we would expect that indirect paths could explore alternative mutations, important to compensate for transitory destabilizing mutations that could occur along the path. We speculate that these stabilizing mutations happen for non-direct paths at its extremity near class-I wildtype. A slightly decrease in binding response to peptide C1 for direct path is nevertheless observed (see Suppl Table 4), but our experimental detection, focused on binding response, is not tailored to directly detect a difference in stability. When approaching the class-IV anchoring point, we observe that paths interpolating between classes I and IV are very constrained and show limited diversity, going through a funnel in sequence space corresponding to the direct path. We agree with Reviewer 2 that a more exhaustive comparison with direct paths would be interesting, and will add a sentence in conclusion.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 1819. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      Class I to Class IV paths and Class I to Class II paths fundamentally differ because the binding pocket in Class I WW domains is different from the one of Class IV WWs, while Classes I and II/III share the same binding region. This important difference may explain why class I specificity can switch to class IV specificity (steps 20-21), without completely loosing affinity to the peptide of class I. To investigate if the two binding regions are really independent or not, we have tested some additional specific mutations along the I-IV mutational paths. In our attempts to engineer cross-reactivity, we have observed that it is important to substantially lower affinity to class I peptide to acquire class IV specificity, in agreement with previous studies [72]. Moreover, the I to IV path seems to go through a funnel-like part in the region with no natural sequences, with the same transition intermediates obtained in several designed paths. This indicates that the Class I to Class IV functional switch is more constrained than the Class I to II switch. Let us also emphasize that our assessment of class specificity is based on one peptide for each class. It would be interesting to test multiple WW-binding peptides with similar biochemical properties to acquire a more complete view of the specificities. 

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      Section 3.5 explains that RBM samples can be biased, by lowering the sampling temperature to 1/3 to obtain high-scores sequences, which are more likely to be functional as proven in [Russ et al., Science 2020]. We acknowledge (as also noted by Reviewer 1) that this section comes at the end of the manuscript, while differences in scores along the path are shown before, so the discussion of this important point is somewhat delayed. We will add a sentence earlier in Results to explain this point.  

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBMdesigned sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

      We agree with Reviewer 2 that the consensus sequence is an atypical sequence for an independent model with a large RBM score. We will update Figure 5 of the manuscript to show that this is also happening in our case. 

      We use Maximum Likelihood in ASR but our ASR path corresponds to all internal nodes of the reconstructed tree joining the two extant sequences, not only to the most ancestral node. Overall, the ancestral sequences along the ASR paths are different from the consensus sequence (mean identity of 76% and 60% respectively). The most ancestral nodes in the paths  are also different from the consensus having 81% (paths between type I and IV domains) or 54%(paths between type I and II/III domains) similarity, and an RBM score  of -21, or -58, respectively. We agree that some ASR internal-node sequence have a higher score than the natural wild-types (extant sequences). This is shown in Fig. 6: several points have larger RBM score than the two anchoring points at the extremities of the path, possibly due to the fact that natural sequences are not always the most stable ones. As discussed in conclusion, ASR nodes have moreover generally better scores than the sequences obtained by sampling an independent model. Phylogenetic reconstruction implicitly takes into account some degree of co-variation between sites in natural sequences, as shown by the success of the use of the phylogenetic distance of a mutated sequence to the wild-type for predicting the fitness effect of these mutations [Laine, Mol. Biol. Evol. 2019]. 

      To better show this effect we will update Figure 6, reporting also the scores of the « scrambled » sequences, which do not respect potential epistasis extracted by the RBM. It appears that ASR sequences generally have better scores than the scrambled sequences, and lower than RBM sequences (sampled at T=1/3). RBM models takes into account multiple-residues correlations, which could contribute to reaching better scores than ASR and BM models. Ongoing studies on larger proteins show that the score of sequences sampled from ASR reconstruction, including the Maximum Likelihood one, can still be improved according to the RBM score by a few mutations consistent with the ASR posterior probabilities (unpublished). 

      Mistakes in the reference list will be amended in the updated version.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank to all reviewers on their careful consideration of our manuscript. We highly appreciate their thoughtful comments and suggestions, that helped us to improve the quality of our work. We address each comment point-by-point below.

      2. Description of the planned revisions

      __Reviewer #1 __

      Minor comments:

      Figure 5 would be more informative if it included more higher magnification images that would reveal the staining at the cellular level.

      To fulfil the suggestion, we will perform a new round of immunostaining followed by high-resolution confocal imaging. This requires additional time for laboratory work.

      __Reviewer #2: __

      Major comments

      1d. The authors tried to attribute the minor phenotype to the incomplete depletion of S100A4+ cells. However, it is possible that if the S100A4+ cells only represented a minor population, their function may be compensated by other populations. This might be confirmed by quantification of S100A4+ cells or S100A4-Cre; GFP+ cells in fibroblast or CD45 populations from images showed in Figure 5.

      We will address this comment by performing required quantifications.

      Moreover, we have now included data on the presence of S100A4+ cells in S100a4-Cre;DTA mice (Figure for Reviewers 5a,b; Supplementary Figure 7a,b in the revised manuscript), which demonstrate incomplete depletion of the S100A4+ cells in the nipple and the mammary gland. This is likely due to ongoing tissue remodeling and continuous S100A4+ replenishment/ supply. Another study using the same S100a4-Cre;DTA mouse model reported an efficient S100A4+ cell depletion in mandibular condyle (Tuwatnawanit et al., 2025), which suggests that the presence of S100A4+ cells in the S100a4-Cre;DTA mammary gland and nipple is due to tissue-specific dynamics rather than lack of depletion efficiency.

              We have included in Discussion: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle46. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”
      

      The images in Figure 5 and Figure S4 are difficult to confirm colocalization. A higher magnification image would be required for each panel. Furthermore, a precise quantification based on the current images would be more supportive of the conclusion regarding the discrepancy of the composition of S100A4 lineage between epidermis and mammary gland (lines 163-165).

      To address this comment, we will perform a new round of immunostaining and high-resolution confocal imaging and quantifications and include the results in the fully revised manuscript.

      Line 163, the author hypothesis the Langerhans cells due to morphology. Those cells should be able to be confirmed by a co-staining with F4/80 in addition to the current form of Fig 5h.

      To address this comment, we will perform co-staining of GFP and F4/80 (or, eventually, AIF1, depending on antibody availability) and include the results in the fully revised manuscript.


      Reviewer #3

      Minor comments

      Figure 2c: The H&E images are not fully convincing. Immunofluorescence analysis of epithelial architecture would support the authors' interpretation and should be feasible if tissues are already available.

      We will perform immunostaining for epithelial markers, such as keratins, and include the results in the fully revised manuscript.

      Figure 4f: The proliferation data are compelling, but the authors could extend this by examining how cell differentiation and epithelial organisation are affected.

      We will perform immunostaining for epithelial markers (keratins, αSMA) and include the results in the fully revised manuscript.

      Figure 5b: To more convincingly show that GFP+ cells contact endothelial cells, co-labelling with an endothelial marker such as CD31 would be helpful.

      We will perform the requested co-labeling of GFP and CD31 and include the results in the fully revised manuscript.

      Figure 5f-h: The structures referenced in the text (lines 159-163) should be clearly indicated on the immunofluorescence images.

      We will incorporate these explanations into the new, high-resolution/detailed Figure 5 in the fully revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Major comments

      1. It is rather difficult to conclude whether the observed nipple phenotype reflects an early embryonic/prepubertal defect in establishing the nipple stroma, is caused by a constitutive response to ongoing cell death, or a response to continuous DTA expression (or a combination of some of these).

      The data raise a couple of additional questions: Is there a nipple phenotype at 3 wk of age? It would not be totally unsurprising if ablation of a major fraction of dermal fibroblasts in the nipple area would lead to an early embryonic/prepubertal phenotype but there is no data on this. Hence, is there a "congenital" nipple deformity, as concluded by the authors (line 191)?

      We appreciate the reviewer’s insightful comments. We have now included data on embryonic nipple development. These data demonstrate abundant S100A4-lineage cells in E15.5 and E18.5 skin of S100a4-Cre;mT/mG embryos (Figure for Reviewers 1a, corresponding to Figure S3a in the revised manuscript) and normal appearance of nipple sheath in S100a4-Cre;DTA embryos at E18.5 (Figure for Reviewers 1b, corresponding to Figure S3b in the revised manuscript), suggesting no embryonic defect.

      Unfortunately, we cannot provide data on 3-weeks old mice (we have not collected this timepoint previously and currently we do not have this mouse line alive). Instead, however, we provide in situ pictures of DTA and S100a4-Cre;DTA nipples at 7 weeks of age (Figure for Reviewers 1c; Figure S3c in the revised manuscript), which demonstrate that the phenotype of defective nipple is fully established at this timepoint. Because the late embryonic data did not support the “congenital” establishment of the nipple deformity and we could not provide any more data from early postnatal development, we have corrected the statement “we describe a congenital nipple deformity” in the discussion to “we describe a nipple deformity”.

      Are there S100a4+ cells in the nipple area of pubertal S100a4-Cre/DTA mice? I.e. is there a continuous supply of new S100a4+ cells and thereby continuous cell death and DTA expression as one might expect based on the RNA-seq data?

      The S100A4+ cells are present in the nipple area of S100a4-Cre;DTA mice, suggesting a continuous supply of new S100A4+ cells (Figure for Reviewers 1b, corresponding to Figure S3b in the revised manuscript; and Figure for Reviewers 5a,b, corresponding to Figure S7a,b in the revised manuscript). In the revised manuscript, we comment on this in Discussion: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle46. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”

      Figure for Reviewers 1 (Figure S3 in the revised manuscript): Embryonic and pubertal nipple phenotype. (a) Representative images of cleared whole-mount S100a4-Cre;mT/mG nipple tissue at embryonic developmental time-points: E15.5 and E18.5. Scale bar = 100 µm. (b) Immunofluorescent labeling for S100A4 on embryonic DTA and S100a4-Cre;DTA whole-mount skin (E18.5). Scale bar = 100 µm. (c) Representative in situ photographs of nipples from DTA and S100a4-Cre;DTA pubertal (7-weeks old) mice. Scale bar = 1 mm.

      The subtitle on line 54 implies that that S100a4-Cre/DTA mice display a branching phenotype. However, it looks to me as if there is a pubertal outgrowth defect (as is also written in the body text, line 64) rather than a branching phenotype, potentially reflecting the much smaller size of S100a4-Cre/DTA mice (Fig. 2a). Unless there is a change in branch point frequency, I suggest rephrasing the title and discussion. Instead, I suggest the authors discuss the observed outgrowth delay considering the gross overall growth defect (Fig. 2a). If ductal outgrowth was normalized to the overall growth defect, would one still observe 'a delay in branching morphogenesis'?

      We apologize for the section title confusion. We have analyzed branching frequency in 7-weeks-old females and observed reduced total number of branching points in S100a4-Cre;DTA mice (Figure for Reviewers 2a, corresponding to Figure 2f in the revised manuscript). A significant difference in number of branching points remained also after their normalization to body weight, (Figure for Reviewers 2c, corresponding to Figure 2h in the revised manuscript). We have now added the new quantifications to the revised manuscript with accompanying descriptions in the main text “Analysis of mammary epithelial development using whole-mount carmine staining revealed no significant differences in the prenatal establishment of the mammary epithelial tree but did reveal significantly delayed epithelial outgrowth and reduced branching in pubertal (7 weeks old) S100a4-Cre;DTA mice (Figure 2e,f). Normalization of epithelial outgrowth and branching to body weight indicates that the observed defect represents a mammary-specific impairment rather than a consequence of reduced body growth (Figure 2g,h).”.

      __Figure for Reviewers 2 (Figure 2 in the revised manuscript): __Pubertal branching morphogenesis is delayed in S100a4-Cre;DTA. (a-c) The plots show total number of branching points (a), epithelial outgrowth [mm] normalized to body weight [g] (b), and total number of the branching points normalized to body weight [g] (c) in 7 weeks old DTA and S100a4-Cre;DTA mice. All plots show the mean ± SD, *p

      Fig. 4e shows Masson's Trichrome and Picrosirius Red staining and the authors report the findings as follows (lines 120-124): "collagen fibers were loosened in the DTA nipples and more densely packed in the S100a4-Cre;DTA nipples". Perhaps the authors could help non-specialists to observe the loosened fibers and if they wish to make quantitative statements ("more densely packed"), such statements should be backed-up by quantifications.

      Picrosirius Red staining viewed under polarized light is a classic way to assess collagen organization, thickness, and packing. Red / orange / yellow color typically marks thicker, more mature, and more tightly packed collagen fibers (often associated with type I collagen), while green color usually marks thinner, less organized, or less densely packed fibers (often associated with type III collagen or immature collagen). We had included this explanation in the Figure legend of the submitted manuscript already: “Typically, thicker collagen fibers exhibit stronger birefringence and appear red or orange, while thinner fibers exhibit weaker birefringence and appear green or yellow.” To help with the quantification, we have extracted the red channel and quantified color intensity. The results are shown in Figure for Reviewers 3, corresponding to Figure S4 in the revised manuscript. Moreover, we will also quantify the differences in pattern of the collagen fibers. The fibers in DTA nipples look shorter and more curved, while the fibers in S100a4-Cre;DTA nipples look longer and straighter, more aligned. The results will be included in the fully revised manuscript.

      Figure for Reviewers 3 (Figure S4 in the revised manuscript): Collagen fibers are densely packed in S100a4-Cre;DTA nipples contain more . (a) Representative pictures of histological sections of DTA and S100a4-Cre;DTA stained for collagen by Picrosirius red. Polarized light images and the red channel (mature/densely packed collagen) are shown alongside detail pictures of selected regions A and B. Scale bar = 200 µm and 100 µm (in detail pictures). (b) Quantification of Intensity Mean Value for the red channel (densely packed collagen), showing statistically non-significant difference. The plot shows the mean ± SD, ns p > 0.05 (Mann-Whitney test), n = 3 DTA / 4 S100a4-Cre;DTA.

      I found the Discussion on the various mouse models somewhat problematic. Overall, the paper is written is a way that it often remains unclear whether it refers to studies addressing the role of S100a4 itself, studies addressing the function of S100a4+ cells via ablation approaches (S100a4-Cre or S10 0a4-CreERT2 crossed with floxed DTA), or those where S100a4-Cre has been used to delete gene X/Y/Z. These are all very different experimental approaches where one approach is not necessarily informative when trying to understand the results from another one. The authors should make these points clear and consider whether all their discussion points are relevant.

      We apologize for the confusion. We have carefully reviewed the references and their interpretations, and corrected them as necessary.

      The abstract states S100a4 (fibroblast-specific protein 1) is "expressed by mesenchymal cells and has been implicated in the development of eccrine glands, hair follicles, and mammary branching morphogenesis". However, the study on eccrine glands (ref. 19) shows that S100A4+ cells play a role in eccrine gland development but it does not address the role of S100a4 itself, while the study on hair follicles (ref.20) in turn reports the expression pattern of S100a4 in hair follicles but does not address its function, nor the role of S100a4+ cells. Finally, I failed to find references in the paper to studies addressing the role of S100a4, or S100a4+ cells in the mammary gland.

      Instead, the paper had references to studies where S100A4-Cre had been used to delete different genes and these mice had various mammary phenotypes - which, as indicated above, is a very different approach compared to deleting S100a4 or ablating S100a4+ cells.

      Thank you for your comment. We addressed the concern in the Abstract and further in the Discussion. We revisited the present the cited studies more carefully, clearly distinguishing the different approaches and particular findings.

      In our literature review, we also considered studies that used S100a4-Cre mouse model, to manipulate gene expression within S100A4+ cells. We believe that these studies bring indirect evidence of S100A4+ cell involvement in development and/or homeostasis of a tissue, such as mammary gland. Please, find the rephrased part of Abstract in the text, and below:

      “S100A4 (S100 calcium binding protein A4, also known as fibroblast-specific protein 1) is expressed by mesenchymal cells and has been associated with hair follicle regeneration. S100A4-expressing cells have been implicated in the development of eccrine glands, and studies using S100a4-Cre to manipulate gene function have suggested that S100A4-expressing cells may contribute to mammary branching morphogenesis.”

      __In Discussion (lines 197-200), __the authors write: "We described significant delay in mammary branching morphogenesis in puberty, confirming an important role for S100A4+ cells in mammary development, as it was previously described (refs 37-39)."

      It should be noted that none of these studies addressed the role of S100A4+ cells:

      • Ref 37 used S100a4-Cre to delete sharpin

      • Ref 38 used the same Cre line to delete Ptch1, did not address the role of S100a4 or S100a4 expressing cells

      • Likewise ref 39 deleted another gene using S100a4-Cre

      Later on in Discussion, the authors compare the reported phenotype to previous studies (lines 248-255): "...targeting S100A4+ cells through knockout experiments can result in severe phenotypes, such as a reduction in adipose tissue (ref 26), skin phenotypes, a disrupted estrous cycle, reduced fertility (ref. 38), and complete infertility, hypogonadism and defects in pituitary endocrine function (ref. 28).

      Of these, Ref. 26 used the same approach as the current study (S100a4-Cre; DTA) (Fig. 7A in the paper)

      • these mice were significantly lean, with markedly reduced fat compared with the control mice - also the mice in the current study are very small, so perhaps they could also be described as 'lean'. Yet ref. 26 reports that female mice had comparable food uptake, respiratory exchange ratio and physical activity, and slightly increased energy expenditure

      Ref. 38 (as mentioned above) reports deletion of Ptch1 using S100a4-Cre lines and these mice "displayed a disrupted estrous cycle and dramatically reduced fertility over 6.5 weeks". However, this has nothing to do with the approaches where Fsp1/S100a4+ cells are depleted with DTA. Likewise, reference 28 analyzed the phenotype of S00a4-Cre;Ptch1fl/fl mice. Obviously, deleting Ptch1 using S100a4-Cre mice is quite a different approach than "targeting S100A4+ cells" through knockout experiments". Ptch1 deletion leads to a combination of gain-of-function (of Hedgehog activation) and loss-of-function (loss of Hh-independent functions of Ptch1) and hence comparisons with these phenotypes is rather challenging. I suggest the authors focus their phenotype comparisons to ref. 26 where S100a4/Fsp1+ cells were ablated with DTA, i.e. the same approach as in the current study.

      Please, find the rephrased part of Discussion in the text (lines 236-256), and below:

      “A key consideration when interpreting studies involving S100A4 is that fundamentally different experimental approaches have been used to investigate its role. These include descriptive analyses of S100A4 expression, functional studies targeting the S100A4 protein itself, genetic models using S100a4-Cre to manipulate unrelated genes in S100A4-expressing cells, and ablation models such as S100a4-Cre;DTA, which deplete S100A4⁺ cells. These approaches are not equivalent and provide distinct types of information. In the present study, we specifically assess the consequences of ablating S100A4-expressing cells, and comparisons to other studies should therefore be interpreted within this context.

      Studies using S100a4-Cre to manipulate specific signaling pathways (e.g. Wnt or Hedgehog signaling via gene deletion) in S100A4-expressing cells have reported diverse phenotypes, including effects on fertility and endocrine function28,34. However, these phenotypes primarily reflect the consequences of pathway perturbations within S100A4-expressing cells rather than the role of S100A4⁺ cells themselves. This is fundamentally different from the ablation approach used here, which removes the S100A4⁺ cell population.

      In contrast, studies employing S100a4-Cre–driven DTA–mediated ablation represent a directly comparable approach. Such studies have reported systemic phenotypes, including reduced adipose tissue and altered metabolic parameters26, indicating that S100A4-expressing cells contribute to multiple aspects of tissue homeostasis. Consistent with these previous reports, S100a4-Cre;DTA mice used in our study were significantly smaller than their littermates. Our findings extend these observations by identifying a specific and previously unrecognized role for this cell population in nipple morphogenesis.”

      I find the Discussion is somewhat off the topic by starting with WHO recommendations on breastfeeding and linking this to observed mouse phenotype. Overall, the discussion is rather long and from time-to-time more like a literature review. I would recommend keeping the Discussion more succinct and focused.

      To improve the conciseness and focus of Discussion, we have deleted this part of text.

      **Referee cross-comenting**

      I agree with the comments of other reviewers. However, to me it seems that the analysis of S100a4 knockout mice would not be feasible within a reasonable timeframe and would represent a study of its own. My understanding was that the authors were not interested in S100a4 itself. Rather, S100a4-Cre was used as a tool to understand the importance of a certain (fibroblast) cell population for mammary gland morphogenesis.

      Indeed, our goal was to study the role of a specific cell population (S100A4+ cells) in mammary gland morphogenesis, not to study the role of S100A4 protein per se.

      Reviewer #1 (Significance (Required)): General assessment:

      This study reveals the importance of the S100a4+ cell lineage for nipple formation while showing the same cells are dispensable for mammary gland morphogenesis. The main limitation is that it remains unclear whether the observed nipple phenotype is derived from an early embryonic/prepubertal defect in establishing the nipple stroma, is caused by a constitutive response to ongoing cell death, or a response to continuous DTA expression (or a combination of some of these). Hence its relevance as a model of human inverted nipple condition remains rather speculative.

      Thank you for consideration of our work and valuable feedback. We did not intend to claim that S100a4-Cre;DTA mouse represents a model of human inverted nipple condition. However, considering morphological features, it might resemble it. We now rephrased the Discussion so it is clearer and more concise.

      Reviewer #2

      Major comments:

      1. My key concern is the discussion part. I think the authors need to re-organize/re-phrase the discussion part, it confused me a bit in terms of logic, phrases and interpretation of literatures.

      We have significantly re-organized and re-phrased the Discussion.

      Here are few examples:

      1. The lines 195-199 contain lot of repeated information

      We have rephrased the paragraph and removed repeated information. The new text can be found in lines 201-206 in the revised manuscript.

      1. The authors mentioned the studies in ref 26,28 and 38 using "targeting S100A4+ cells through knockout experiment can result in sever phenotypes". This is very misleading. Those studies using the same (or similar if the origin is different) S100A4-Cre line as the current study but induced the activation of Wnt and sHH signalling pathways, respectively. The observed phenotypes are largely due to the pathway function, rather than the S100A4 gene or normal S100A4+ cell itself. This is significantly differed from the current study.

      We apologize for the confusion; we have now rephrased our claims (lines 236-256):

      “A key consideration when interpreting studies involving S100A4 is that fundamentally different experimental approaches have been used to investigate its role. These include descriptive analyses of S100A4 expression, functional studies targeting the S100A4 protein itself, genetic models using S100a4-Cre to manipulate unrelated genes in S100A4-expressing cells, and ablation models such as S100a4-Cre;DTA, which deplete S100A4⁺ cells. These approaches are not equivalent and provide distinct types of information. In the present study, we specifically assess the consequences of ablating S100A4-expressing cells, and comparisons to other studies should therefore be interpreted within this context.

      Studies using S100a4-Cre to manipulate specific signaling pathways (e.g. Wnt or Hedgehog signaling via gene deletion) in S100A4-expressing cells have reported diverse phenotypes, including effects on fertility and endocrine function28,34. However, these phenotypes primarily reflect the consequences of pathway perturbations within S100A4-expressing cells rather than the role of S100A4⁺ cells themselves. This is fundamentally different from the ablation approach used here, which removes the S100A4⁺ cell population.

      In contrast, studies employing S100a4-Cre–driven DTA–mediated ablation represent a directly comparable approach. Such studies have reported systemic phenotypes, including reduced adipose tissue and altered metabolic parameters26, indicating that S100A4-expressing cells contribute to multiple aspects of tissue homeostasis. Consistent with these previous reports, S100a4-Cre;DTA mice used in our study were significantly smaller than their littermates. Our findings extend these observations by identifying a specific and previously unrecognized role for this cell population in nipple morphogenesis.”

      1. In the lines 253-255, why the author believe complete S100A4+ depletion would leads to the fatal of mouse? Is there study suggest that? Or have authors checked the expression of S100A4 in the S100A4-Cre;DTA model to confirm the efficiency?

      We have now included, also in response to other Reviewers’ comments, data on S100A4 expression in the S100A4-Cre;DTA model (Figure for Reviewers 5, corresponding to Figure S7 in the revised manuscript), and commented on these results in lines 257-262: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle48. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”

      In Fig. 1, the authors described the impaired nursing capacity of S100A4-Cre;DTA dam. However, it seems the little size is also smaller (Fig 1a). Do authors have any explanation or hypothesis?

      Thank you for this insightful observation. It is well established that metabolic and nutritional condition directly affect female reproductive functions. Adult S100A4-Cre;DTA mice are generally smaller compared to their litter counterparts, potentially because of lower body fat content or other anatomic/metabolic condition that might negatively influence fecundity, for instance, lowering ovulation rate and/or embryonic survival. In support of this, earlier studies have reported a positive correlation between growth rate/body condition and litter size (Eisen & Durrant, 1980). Unfortunately, in the case of S100A4-Cre;DTA mice, we can only speculate about the possible explanations, as we do not have supporting data which could confirm it.

      In lines 181-184, the authors states "the results showed that the tissue reacted to a foreign chemical or an endogenous compound....." , which results are referring here? I could not find any inflammation related GO terms in figure 6b. It would be more accurate to specify them in lines 179-181, which appears to be a technical statement rather than a result in current form.

      Thank you for this comment. Indeed, there are no GO terms explicitly labeled as “inflammation” and “repair”; however, several GO terms are functionally related to these processes. Our interpretation was based on broader biological context rather the explicit annotation. To clarify this, we revisited the text and included GO terms that reflect the tissue response (lines 187-193).

      “The GO terms indicated that the tissue reacted to a foreign chemical or an endogenous compound (xenobiotic metabolic process, cellular response to xenobiotic stimulus, response to xenobiotic stimulus, epoxygenase P450 pathway), and responded to inflammation and repair (actin filament-based process, actin cytoskeleton organization; eicosanoid and lipid metabolic processes) (Figure 6b).”

      The lines 182-184 was not clear. Does the author refer the "nipple tissue response" in general as malfunction of development or inflammation and tissue repair as mentioned in the previous sentence? If the later cases, the authors should consider the failure of lactation might mimic the involution, which may cause the apoptosis and inflammation as well. This might be independent of the DTA expression.

      Thank you for raising this point. Indeed, in this line, we refer to ongoing tissue inflammation and repair. We also considered the hypothesis that the ejection incapability (and consecutive milk stasis) triggers involution. However, tissues were collected within a few hours after parturition, when only very early signs of involution, if any, would be detectable; therefore, we expect minimal influence of involution. To reflect this comment, we added new text to the Discussion (lines 272– 277). “The observed tissue response can be also associated with hallmarks of mammary involution, the process which is triggered by the milk stasis. However, the tissues were collected within few hours after parturition, when the effect of involution should be minimal53. Rather, we hypothesize that immune cell recruitment, and the upregulation of the lipid skin barrier might be caused in response to the continuous apoptosis of S100A4+ cells and their replacement.”

      Minor comments:

      1. The authors demonstrated in Figure S1 and lines 92-96 that no significant differences were observed in pituitary glands and ovaries in S100a4-Cre:DTA and DTA mice. Have the authors checked the S100A4 expression or lineage cells in these organs, or have been reported by others?

      Yes, we checked the S100A4-lineage cells in the pituitary gland and ovary and have now included the results here (Figure for Reviewer 4a,b corresponding to Figure S1a,b in the revised manuscript), along with relevant text description (lines 94-95 in the revised manuscript). “We observed S100A4-lineage traced cells in pituitary gland and ovaries using S100a4-Cre;mT/mG model (Figure S1a,b).” The presence of S100A4+ cells in these organs was also reported previously (Ren et al., 2019).

      Figure for Reviewers 4 (Figure S1 in the revised manuscript): S100A4-lineage cells are abundant in the pituitary gland and ovary. (a) Representative images of a cleared whole-mount pituitary gland from a S100a4-Cre;mT/mG mouse. (b) Representative images of a cleared whole-mount ovary from a S100a4-Cre;mT/mG mouse. Scale bar = 100 µm.

      The authors have performed live imaging to evaluate the contraction of alveoli. It would be better to include a video together with the snapshots showed in Figure S2.

      We have included the videos as supplementary movies, Movie S1 (DTA) and Movie S2 (S100a-Cre;DTA).

      Since the study is mainly using S100a4, it would be better to avoid using FSP1 in the results, for example Fig 5h.

      We apologize for this oversight; it has now been corrected.

      What does L1 stand for? Lactation Day 1? It should be spelt out in the first instance.

      Yes, indeed, L1 is lactation day 1. Please note that it was already spelled out in the first version of the manuscript, now in line 48.

      Line 150. Figure S4 should be Figure S4a.

      (Please note, that by adding new Supplementary figures, this comment is referring to Figure S6 in the new version of manuscript.) Thank you for this comment. In the text, we state “GFP+ cells were spread throughout the fat pad but were also localized in the periepithelial stroma and infiltrated the epithelium”. This we show in Figure S6a and in S6b; therefore, we now changed the reference accordingly, as it might be more accurate.

      **Referee cross-comenting**

      I agree with the other reviewers, as well as the Consultation Comments. The manuscript would benefit greatly from a thoroughly optimised Discussion section to address issues raised by all reviewers.

      __ Reviewer #2__ (Significance (Required)):

      • Overall, this study is well designed and the key findings are valid, especially the role of S100A4 during nipple development is novel and interesting.

      -One limitation of the study is that RNA-seq was performed using a mixture of all cell types present in the nipple. While this approach is reasonable-given that depletion of the S100A4+ lineage may exert both direct and indirect effects contributing to nipple dysfunction-it should be more clearly acknowledged and discussed in the manuscript. Additionally, this experimental design may limit the utility of the dataset for other researchers interested in nipple development and the specific functions of S100A4.

      Reviewer #3

      Major comments:

      2) The differential systemic versus mammary-specific effects of DTA-mediated S100A4 cell ablation are intriguing. The authors should address why the mammary fat pad appears unaffected.

      Thank you for this comment. The role of S100A4+ cells in adipose tissue was previously reported (Zhang et al., 2018). Authors reported significantly smaller adipose tissue of S100a4-Cre;DTA mice (males and females), measured as the weight of the dissected fat pad. In our work, we measured the in-situ area of the fat pad, which appeared to be unaffected. It is possible that the volume (weight) of the fat pad would be different, however we do not have data to confirm / reject this hypothesis.

      Are S100A4 expressing cells present during embryonic mammary development, or are they mainly postnatal? Would an inducible S100A4CreERT model lead to similar phenotypes, or might the timing of depletion influence the outcome? Discussing these points would reinforce the conclusions regarding the contribution of S100A4-expressing cells to mammary and nipple development and could also clarify the transient nature of the ductal branching phenotype.

      S100A4-expressing cells are present during embryonic mammary development, too. Please, refer to the embryonic lineage-tracing time-points incorporated in the first version of the manuscript (Figure 5a and Figure S6a). Now, we have added Figure for Reviewers 1 corresponding to Figure S3 in the revised manuscript), which focuses on the embryonic nipple phenotype but also provides information on the presence of S100A4+ cells.

      We agree that the use of inducible S100a4-CreERT model could potentially bring new insights toward developmental stage-specific roles of S100A4+ cells, and thus would be interesting to use in a follow-up study. Currently, such experiments are beyond our capacity.

      Therefore, we have included a new subsection on Limitations of the study, where we comment:

      “A major limitation of this study is that the timing of DTA-mediated cell depletion cannot be precisely defined in the constitutive mouse model employing S100a4-Cre because recombination may occur continuously following the initial expression of S100a4 (E8.518). This limitation could be overcome by usage of inducible S100a4-CreERT instead. With this approach, it could be more feasible to determine if the nipple deformity arises as a defect of embryonic development or postnatal morphogenesis.”

      3) Although the authors attribute lactation failure primarily to defects in nipple architecture, the RNA seq data reveal downregulation of key milk production genes and luminal differentiation keratins, strongly suggesting impaired secretory activation. The authors should more explicitly discuss the relative contributions of epithelial functional maturation defects versus nipple structural abnormalities to the lactation failure observed upon S100A4+ cell depletion. Thank you for this comment. We believe that performing an immunofluorescence labeling of epithelial architecture (requested in the Minor comment 2) could bring more light into this. However, we deduce that secretory activation is not impaired, as the presence of the milk observed on in situ wholemounts, and H&E-stained alveoli (Figure 3d) implies luminal secretion of milk components. The observed phenotype of the lactating mammary gland strongly suggests there is a structural abnormality inhibiting the milk ejection.

      The downregulation of key milk production genes and luminal keratins in the bulk RNA-seq data may be influenced by differences in tissue composition between samples. In control mice, more fully developed nipples and an extended ductal network likely contribute to a greater representation of differentiated luminal epithelial cells, thereby increasing the expression of these markers.

      Minor comments:

      1. Figure 1: Including an immunohistochemistry or immunofluorescence control confirming depletion of S100A4 expressing cells would strengthen the conclusions.

      We have now included Figure for Reviewers 5 that corresponds to Figure S7 in the revised manuscript and comment on the results in sections Results (lines 169-171) and Discussion (lines 257-262).

      In Results: “Interestingly, S100A4 antibody labeling revealed presence of S100A4+ cells in S100a4-Cre;DTA tissues (Figure S3b, Figure S7a,b).”

      In Discussion: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle48. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”

      Figure for Reviewers 5 (Figure S7 in the revised manuscript): S100A4+ cells are found in S100a4-Cre;DTA nipple and mammary tissues. (a) Immunofluorescent labeling for S100A4 and vimentin on FFPE sections of DTA and S100a4-Cre;DTA L1 nipples. (b) Immunofluorescent labeling for S100A4 and smooth muscle actin on FFPE sections of DTA and S100a4-Cre;DTA L1 mammary gland. Scale bar = 100 µm.

      Figure 3c: The histological defects more accurately reflect failure of secretory activation rather than "lactation failure" per se. The terminology should be refined to reflect this more precisely.

      Thank you for this comment. As explained in the response to your major comment 3, we believe our results show that the secretory activation is conserved in S100a4-Cre;DTA lactating mice. We understand that “lactation failure” might be misleading terminology, as the production of the milk is conserved as well. We therefore change the phrasing into “nursing defect” (line 51, 73, 83), as this could reflect the phenotype most precisely.

      **Referee cross-comenting**

      I agree with the Reviewer, the authors do not need to do knockout experiments in the revised manuscript. However, it would be great if they could address my comment in the discussion.

      Reviewer #3 (Significance (Required)):

      This is an important study for mammary developmental biology, addressing the relatively understudied mechanisms that govern nipple development at the stromal-epithelial interface, and the determinants of lactational performance. A major strength is the elegant integration of DTA-mediated cell ablation, advanced imaging, lineage tracing, and transcriptomics to uncover previously uncharacterised roles for S100A4-expressing stromal populations in shaping nipple morphology and function. The work lays a foundation for future studies into nipple biology and pathologies and mechanisms underlying successful lactation.

      Although the study is already mature, it could be further strengthened by incorporating more specific genetic models, such as inducible S100A4CreERT or S100A4 gene knockout/knockdown approaches.

      Thank you for appreciation of our work.

      4. Description of analyses that authors prefer not to carry out

      Reviewer #1

      Major Comment 1.

      It is rather difficult to conclude whether the observed nipple phenotype reflects an early embryonic/prepubertal defect in establishing the nipple stroma, is caused by a constitutive response to ongoing cell death, or a response to continuous DTA expression (or a combination of some of these). The data raise a couple of additional questions: Is there a nipple phenotype at 3 wk of age?...

      Unfortunately, we cannot provide data on 3 weeks old mice because we did not collect such samples before and we had to terminate our mouse colony due to an infection in the animal house (mouse line reanimation is possible because we had stored sperm of the mouse line but it would take a lot of time and resources). Nevertheless, we tried to address this comment by providing other relevant available data (see Figure for Reviewers 1).

      Reviewer #2

      Major Comment 3.

      In Fig S1c, d and lines 93-96, the authors investigated the estrus cycles to determine the potential cause of lactation failure. The data was presented as the number of mice in each stage. A more intuitive approach would be to follow the same mice for two to three cycles and observe the duration of each stage.

      We agree that the suggested approach would be more accurate in determining truly cycling females. Unfortunately, we cannot perform this experiment currently because we do not have these mice alive anymore. Nevertheless, because the S100a4-Cre;DTA females bore pups, they had cycled and were fertile.

      Reviewer #3

      Major comment 1.

      While the S100A4Cre::DTA model is powerful for evaluating the roles of S100A4 expressing cells, the authors should discuss the potential outcomes of using S100A4 knockout or knockdown approaches. If the authors have such data available, this could help distinguish phenotypes caused by loss of S100A4 function itself from those arising due to ablation of S100A4 expressing cell populations and would add mechanistic depth to the study.

      We thank the Reviewer for this insightful suggestion. We agree that genetic approaches targeting S100A4 function (e.g., knockout or knockdown) could, in principle, help disentangle cell-autonomous effects of S100A4 from those resulting from the loss of S100A4-expressing cell populations. However, we would like to clarify that the primary objective of our study is to investigate the functional contribution of S100A4⁺ stromal cells at the population level, rather than to dissect the molecular function of S100A4 protein per se. In this context, the S100A4-Cre;DTA model provides a well-established and appropriate strategy to ablate this cell population and assess its role in tissue development. Importantly, S100A4 is not only a functional protein but also a widely used marker of a heterogeneous stromal cell population. Genetic ablation of S100A4 itself would not eliminate these cells, and may result in relatively subtle or compensable phenotypes due to functional redundancy within the S100 protein family or context-dependent roles of S100A4. Therefore, such approaches would address a distinct biological question and may not directly recapitulate the phenotypes observed upon cell ablation.

      References

      Eisen, E. J., & Durrant, B. S. (1980). Genetic and Maternal Environmental Factors Influencing Litter Size and Reproductive Efficiency in Mice. Journal of Animal Science, 50(3), 428–441. https://doi.org/10.2527/jas1980.503428x

      Ren, Y. A., Monkkonen, T., Lewis, M. T., Bernard, D. J., Christian, H. C., Jorgez, C. J., Moore, J. A., Landua, J. D., Chin, H. M., Chen, W., Singh, S., Kim, I. S., Zhang, X. H. F., Xia, Y., Phillips, K. J., MacKay, H., Waterland, R. A., Cecilia Ljungberg, M., Saha, P. K., … Richards, J. A. S. (2019). S100a4-Cre–mediated deletion of Ptch1 causes hypogonadotropic hypogonadism: Role of pituitary hematopoietic cells in endocrine regulation. JCI Insight, 4(14). https://doi.org/10.1172/jci.insight.126325

      Tuwatnawanit, T., Wessman, W., Belisova, D., Sumbalova Koledova, Z., Tucker, A. S., & Anthwal, N. (2025). FSP1/S100A4-Expressing Stem/Progenitor Cells Are Essential for Temporomandibular Joint Growth and Homeostasis. Journal of Dental Research, 104(5), 551–560. https://doi.org/10.1177/00220345251313795

      Zhang, R., Gao, Y., Zhao, X., Gao, M., Wu, Y., Han, Y., Qiao, Y., Luo, Z., Yang, L., Chen, J., & Ge, G. (2018). FSP1-positive fibroblasts are adipogenic niche and regulate adipose homeostasis. PLoS Biology, 16(8). https://doi.org/10.1371/journal.pbio.2001493

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

      We are grateful for the dedication and constructive feedback provided by the editors and reviewers. We have revised our manuscript according to the suggestions by both reviewers.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The new version of the manuscript reads exceedingly well and the corrections the authors have made during their revision made the manuscript much easier to read and digest than the first version. Below are minor details that may be corrected:

      Abstract:

      Line 45-47: "IDE is known to transition between a closed state, poised for catalysis, and an open state, able to release cleavage products and bind a new substrate." (consider adding a)

      Fixed

      Line 48-50: "Combining cryo-EM heterogeneity analysis with all-atom molecular dynamics (MD) simulations, we identified the structural basis and key residues for IDE conformational dynamics that were not previously revealed by IDE static structures." (consider adding previously)

      Changed

      Line 52-54: "Our small-angle X-ray scattering analysis and enzymatic assays of an R668A mutant indicate a profound alteration of conformational dynamics and catalytic activity." (consider adding analysis)

      Changed

      Line 54: Consider leaving out "Upside" in the abstract (to avoid confusion when reading the abstract) and leave it to be introduced in the introduction when Upside MD simulations are first mentioned.

      Changed

      Results:

      Figure 5D: There seems to be an error in the legend for Figure 5D. It says "... presence of varying amounts of insulin", but this must be Aβ1-40. Please add info on whether the replicates are technical or biological.

      The legend has been revised as suggested.

      Line 125: Consider switching the order of "here" and "we"

      “here” has been removed.

      Line 128: Replace "5" with "five"

      Changed

      Line 137: Replace "when insulin is present" with "in the presence of insulin"

      Changed

      Line 228: Replace "5" and "6" with "five " and "six"

      Changed

      Line 229: Consider adding the word "form": "First, the open subunits did not close to form a singular structure."

      We have adjusted the sentence to read “close to a singular consensus structure”

      Line 327: Replace "2" with "two"

      Changed

      Line 276: Consider replacing "Conversely" with a more suitable connecting term as it implies that the observation presented in the two sentences are reverse or rephrase what is being compared. Is it the fact there is a dose dependency or not between the substrates or is it the actual kinetic parameters that are described. I just don't think conversely is fair with the current formulation as "the R668A mutant did not exhibit a dose-dependent response to the presence of Aβ" not that the Ki is reduced for WT compared to the R668A construct when looking at Aβ.

      The connecting term has been removed completely, beginning the sentence with “When Abeta…”

      Line 359: Replace "6" with "six"

      Changed

      Consider getting rid of possessive apostrophes to keep a formal tone, e.g. lines 211 (cryoSPARC's), 259 (IDE's) and 382 (IDE's). Exception to this is Alzheimer's disease.

      All instances of possessive apostrophes, aside from Alzheimer’s, have been replaced alter more formal wording.

      Figure 7 supplement 1: The color scheme for the local resolution is missing the unit (Å).

      This has been corrected.

      Finally, the supplementary videos illustrating IDE conformational dynamics are difficult to interpret and somewhat redundant in their current form. The transitions occur very rapidly, making it hard to appreciate the described motions, and the uniform coloring of IDE further limits visual clarity. I apologize for not including this point in my initial review. I recommend either removing the videos or re-rendering them to improve interpretability, for example by slowing down the motion and applying the same domain color scheme introduced in the new Figure 1 (and used in the MD trajectory video). This would greatly aid readers in connecting the descriptions in the text to the visual representations in the movies.

      Figure 3 videos 1-4 were slowed down, simplified, and recolored to improve clarity.

      Reviewer #2 (Recommendations for the authors):

      Comments after first revision for authors:

      Thanks a ton to the authors for the detailed explanation on my comments. I believe the discussions will help a large group of audience, especially the non-experts. Please address the minor comment below:

      Minor comment:

      Please update Supplementary file 1 (Cryo-EM data collection, refinement, and validation statistics) regarding the new volume obtained by time-resolved cryo-EM. Kindly also check line 47 in the abstract: "Here, we present five cryo-EM structures" , which may need an update (six structures and resolution 3.0-5.1 Å) or rephrase the sentences accordingly. If similar instances are found in the manuscript, where list of all the structures are mentioned together, please update accordingly if necessary.

      The cryo-EM statistics for the time-resolved cryo-EM are shown in supplementary file 2 to differentiated two datasets. The abstract has been changed, as has line 149.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      (a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      We agree with the reviewer that both can contribute to the different lifespan effect. In the original paper presenting flySAM1.0 and flySAM 2.0 (Jia et al., 2018), the authors first tested how flySAM1.0 overexpression (OE) phenotypes compare to several VPR (CRISPRa) and UAS:cDNA OE lines. They found that flySAM1.0 reliably outperforms (i.e., produces stronger OE phenotypes) than VPR in most cases, and produces OE phenotypes that are comparable (i.e., generally equivalent) to UAS:cDNA (Jia et al., 2018). After determining how flySAM1.0 performance compares to VPR and UAS:cDNA, the authors next tested if flySAM2.0 also outperforms VPR; they found that like flySAM1.0, flySAM2.0 outperforms VPR in most cases (Jia et al., 2018). In general, the data suggest that we should expect comparable overexpression phenotypes for our flySAM2.0 and UAS:cDNA lines.

      We chose to proceed with the DIP-β flySAM line for the climbing assays and snRNA-seq, as it gave a stronger lifespan effect and we thought it was likely to be the more robust OE line. While our glial cell-surface proteomics initially identified DIP-β isoform C as the candidate, it is possible that other DIP-β isoforms were also present (such as isoform F, which is identical in polypeptide sequence to isoform C) (FlyBase). Ultimately, we believe that the larger increases in lifespan observed for DIP-β flySAM are likely because flySAM targets all isoforms, whereas UAS:cDNA lines target only one isoform. Importantly, our UAS- DIP-β line was specific to DIP-β isoform C, which is the same isoform that was identified by our proteomics.

      We have made clarifications in the manuscript to address these comments.

      (b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due to the toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      This is a good point. The flySAM lines do exhibit a shorter baseline lifespan compared to the traditional UAS lines. This is likely due to the specific genetic background of the flySAM transgenic insertions, or a low level of "leaky" expression, as previously noted in the literature (Jia et al., 2018).

      However, we believe that the lifespan extensions we observed for DIP-β flySAM is a robust biological effect, rather than an artifact of reduced viability for the following reasons. First, by utilizing the GeneSwitch (GS) system, we can compare the lifespan of flies with the exact same genetic background (+/- RU-486). This ensures that the extension we report is specifically due to the induction of the transgene, rather than a comparison between disparate lines with different basal fitness levels. Second, if the lifespan extensions merely represented a recovery from lower baseline viability, we would expect to see similar improvements across other flySAM lines in our screen. However, DIP-β was the only candidate across our screen that significantly increased lifespan in both sexes (Extended Data Figs. 7 & 8). Third, the lifespan-extending effect of DIP-β was independently confirmed using a traditional UAS-cDNA line, which importantly does not share the same baseline viability issues as the flySAM lines.

      (c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      We have updated the figure legends for Figure 2 to include the missing statistical details and sample sizes.

      Specifically, for Fig. 2A: The reviewer is correct that with only two replicates of each time point (5d vs. 50d) in the initial proteomic screen, traditional p-value calculations lack the necessary power for meaningful interpretation. We have revised the legend to clarify that this panel represents a discovery-based screen. Candidates were selected based on biological relevance and specific enrichment thresholds to narrow the 872 proteins down to the 48 top candidates for screening (we were initially aiming to identify approximately 50 candidate genes for screening). For Fig. 2B: We have updated the legend to detail the parameters used for the Gene Ontology (GO) enrichment analysis.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      We thank the reviewer for this careful observation regarding the expression pattern of the GSG3285-1 line and acknowledge that the overlap between this driver and the Repo-positive cells is not absolute.

      Our selection of this specific GeneSwitch line was based on several critical experimental considerations: 1) To minimize background toxicity. We initially tested multiple Repo-GeneSwitch lines; however, we found they exhibited significant, genotype-dependent lifespan reductions upon RU486 administration, even in control crosses. This baseline toxicity confounded the interpretation of any potential lifespan effects. GSG3285-1 was chosen for this study, as it provided a robust control baseline and didn’t show lifespan effects with RU486 treatment in multiple control lines. This is essential for lifespan studies. 2) The driver breadth and specificity. As noted in its original characterization (Nicholson et al., 2008) and a later study (Catterson et al. 2023), GSG3285-1 is characterized as a pan-glial driver, though it may include a small population of sensory neurons. Furthermore, while Repo is a standard glial marker, its antibody does not label all glial subtypes with equal intensity. The "non-overlapping" signal observed in Figure 3A may reflect this staining bias. 3) The expression mosaicism. The fact that some glial cells do not show GFP expression suggests a degree of mosaicism, which is common to many GeneSwitch lines (Osterwalder et al., 2001). While we acknowledge this means our manipulations may target a broader subset — rather than every single glial cell — the fact that we still observed significant lifespan effects across two independent platforms (UAS and CRISPRa) suggests that the targeted population is sufficient to mediate these systemic effects.

      We have added a clarifying statement to contextualize the choice of the GSG3285-1 driver and its relationship to the Repo population.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      (a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      We agree that the sex-specific effects observed in our lifespan screen are one interesting aspect of this study. We have added a dedicated section to the Discussion exploring these differences from both a technical and biological perspective.

      On the technical side, the GeneSwitch inducer, RU486, can have sex-specific effects on metabolism and lifespan, depending on the nutritional environment (Dos Santos & Cocheme, 2024). Specifically, RU486 has been shown to counteract the lifespan-shortening effects of mating in females, an effect that is less pronounced in males (Landis et al., 2015; Tower et al., 2017). While we optimized our media and used the GSG3285-1 line to minimize these baseline effects, it remains possible that certain genotypes exhibited a sex-specific sensitivity to the inducer itself. Beyond the technical considerations, sex differences in aging are well-documented in Drosophila and other organisms (Regan et al., 2016; Austad & Fischer, 2016). Male and female flies exhibit distinct transcriptional trajectories and metabolic shifts as they age. Furthermore, recent studies have highlighted that glial function and the neuroinflammatory landscape can differ significantly between sexes, which may dictate how a specific genetic manipulation impacts the aging process in a sex-dependent manner (PMID: 40951920). While our screen identifies DIP-β as a rare candidate that extends lifespan in both sexes, the prevalence of female-specific hits in our data suggests that the female "aging program" may be more plastic or responsive to the specific glial pathways we targeted. These observations provide a valuable foundation for future studies into the mechanisms of sex-specific neuroprotection.

      (b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      It is a mixture of half male and half female flies. This information has been added to the main text, Fig. 1, and to the methods section.

      (c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      Agreed, this would be a great idea for future studies.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      We thank the reviewer for this insightful point. While our initial proteomic screen focused on the adult central brain, our behavioral validation used a pan-glial driver, which targets glia throughout the entire nervous system, including the ventral nerve cord (VNC). We have addressed the reviewer's comment as below:

      Additional behavioral data: As suggested, we performed Drosophila Activity Monitoring (DAM) assays to evaluate circadian locomotor rhythms in 50-day-old DIP-β overexpression flies compared to negative controls. Interestingly, we did not detect significant changes in circadian activity at this time point.

      The difference between our climbing and circadian results highlights the complexity of age-related decline. In Drosophila, locomotor performance (i.e., climbing) and circadian coordination often decouple. For example, specific isoforms of human Tau (hTau) can induce severe cognitive and neurodegenerative deficits without affecting lifespan or motor coordination in the same manner (Sealey et al., 2017). Furthermore, motor-specific defects can emerge independently of systemic lifespan changes, as seen in certain SOD1 models of ALS (Hirth, 2010). It is possible that the 50-day timepoint represents a specific window where motor coordination is improved by DIP-β, while circadian circuits — governed by distinct glial-neuronal interactions — remain largely unaffected, or require a different temporal window for observation.

      We agree that identifying the specific glial populations (central brain vs VNC) responsible for the improved climbing would be highly informative. While the current study establishes the pro-longevity effect of DIP-β, future work utilizing in-situ proteomics on the fully intact CNS (including the VNC) or specific VNC will be essential to map the stereotyped progression of these effects across the peripheral and central nervous systems.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      We agree that the observed changes likely represent a combination of direct cell-cell interactions and a broader, more indirect maintenance of a "younger" physiological state.

      Direct: Among the DIP family, DIP-β exhibits some of the strongest and most promiscuous binding affinities, interacting with a wide array of partners including Dpr6, 8, 9, 15, and 21 (Cosmanescu et al., 2018; Sergeeva et al., 2020). This biochemical flexibility allows DIP-β to potentially interface with a much broader range of neuronal subtypes than other DIP family members, such as DIP-δ, which exclusively binds Dpr12 and did not extend lifespan in our screen. It is possible that by overexpressing DIP-β, we may be partially compensating for the global downregulation of CAMs that typically occurs during aging, thereby preserving essential glial-neuronal communication integrity.

      Indirect: By maintaining these primary glial functions and communication activities, DIP-β overexpression likely delays the overall "aging" of the brain. This preservation of neural health can have downstream effects on systemic physiology, such as the improved glia-fat body communication we observed in 50-day-old flies. In this model, the broad transcriptomic shifts are not necessarily all direct targets of DIP-β, but rather a signature of a brain that has successfully avoided the catastrophic breakdown of homeostasis typically seen in aged wild-type flies.

      We have expanded the Discussion to clarify this distinction, adding that DIP-β likely acts as a "scaffold" or “bridge” for maintaining a younger brain state, which in turn preserves multi-organ communication.

      Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      We agree that a more precise description of “improved” or “increased” cell-cell communication is necessary.

      Our conclusion that DIP-β overexpression is associated with “increased” cell-cell communication is based on the quantification of our CCC scores, which was performed using FlyPhoneDB2, a computational tool used to estimate cell-cell signaling from single-cell RNA-sequencing data (Liu et al., 2021; Qadiri et al., 2025). To infer cell-cell signaling, FlyPhoneDB2 and its predecessor, FlyPhoneDB, calculate “interaction scores,” comparing the expression levels of a curated list of ligand-receptor pairs between cell types (Liu et al., 2021; Qadiri et al., 2025). For example, if we detect a ligand in cell type A and its receptor in cell type B in DIP-β overexpression flies but didn’t detect both ligand and receptor in control flies, the CCC score is increased by 1. FlyPhoneDB2 additionally enables users to estimate signaling activity by also taking into consideration the expression of downstream reporter genes (Qadiri et al., 2025).

      “Improved cell-cell communication” is our interpretation based on the CCC analysis. It is important to note that the metric being used here (increased CCCs) is the number of predicted ligand-receptor interactions, and that our CCC analysis was based entirely on inferences from snRNA-seq data. We have added further clarification to our manuscript, which now further expands on the results of our CCC analysis (i.e., the increased expression for 61% and decreased expression for 39% of ligand-receptor pairs we observed in our DIP-β overexpression group, compared to our negative control), which ultimately led us to conclude that DIP-β overexpression is associated with improved cell-cell communication.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      We appreciate the reviewer’s suggestion to improve the clarity of the lifespan screen results. We have significantly revised Figures 3D, 3E, and 3F to provide a more intuitive summary of the candidate gene manipulations. Figures 3D and 3E now explicitly include the effect sizes and p-values for each candidate gene, broken down by sex. We also added a new Figure 3G with a visual layout that has been streamlined to allow for quick identification of manipulations that successfully extended lifespan.

      The reviewer raises an important point regarding the use of positive controls to calibrate the magnitude of lifespan extension. We carefully considered adding a standard control (such as Rapamycin treatment); however, we opted against it for several methodological reasons:

      As noted in the literature, the magnitude of lifespan extension from standard controls can vary drastically depending on genetic background and lab environment. For instance, Rapamycin-induced extension ranges from ~10% (Schinaman et al., 2019), to over 80% (Landis et al., 2024). We felt that adding a single positive control might provide a false sense of "calibration" rather than a true universal benchmark.

      To ensure the robustness of our findings, we instead employed a dual-validation strategy. We confirmed the lifespan-extending effects of our candidates using both traditional UAS:cDNA and CRISPR-based overexpression. The fact that two independent genetic systems yielded consistent results provides strong internal evidence for the reported effects.

      We acknowledge that the effects of DIP-β are modest when compared to the baseline impact of RU486 feeding. We have added a section to the Discussion addressing this. While the effects are subtle, their reproducibility across different overexpression platforms suggests they are biologically relevant, even if they do not reach the dramatic shifts seen in some caloric restriction or drug-based models.

      We have further addressed this in the results section.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      We have modified and improved the labeling of these figures to increase the clarity. For Fig. 1D, we added the explanation to the Figure legends. In brief, in the Tandem Mass Tag (TMT) isobaric labeling system, 128N is one of many channels (126, 127N, 127C, 128N, 128C, etc.) used to index and compare up to 18 samples simultaneously, improving throughput and reducing missing values.

      Fig. 3A has been updated to clarify that Repo is the glial marker. Fig. 4A-D have been reordered so that the DIP- β lifespan results are presented before the control lifespan, which hopefully improves the narrative flow of this figure. The Fig. 4 references in the manuscript have also been updated to match these changes. Additionally, Fig. 5C has been updated to include the truncated x-axis and y-axis labels.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      These are good suggestions. We have added references for the phenoxy-radical labeling radius (Durojaye, 2021), the impact of H₂O₂ exposure (J. Li et al., 2021), and the binding specificity of neutravidin (J. Li et al., 2021). We have also removed the term “unbiased” from our manuscript.

      Regarding the request to further address the downregulation of synapse-related GO terms, we believe this indicates a lack of clarity on our part. We did not intend to suggest that our GO analyses, which were based on our proteomics data, were necessarily indicative of impaired neuron-glia communication. Our conclusions regarding altered neuron-glia communication have come from our later snRNA-seq data and analyses. Inspired by this comment, we agree that our differential gene analysis may reflect transcriptional changes rather than impaired glia-neuron communication. We have added such alternative interpretation.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful. 

      Agreed! As the intended focus of this study was the more general changes occurring during normal brain aging, we chose to focus on the central brain for our glial cell-surface proteomics, which is responsible for most of the brain’s higher order functions, including learning and memory, signal integration, behavior, etc. As the optic lobes account for approximately half of all neurons in the adult Drosophila brain and are specialized to process visual stimuli (Robinson et al., 2025), we were concerned that including the optic lobes in our glial cell-surface proteomics could strongly bias our findings towards age-related changes in visual function, rather than the more general changes we intended to focus on. Such clarification has been added to the results section (Quantitative comparison of young and old proteomes).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 62: Can the authors expand on "several changes"?

      We have added a sentence expanding upon this in the manuscript draft.

      (2) Line 137: Can the authors provide a reference for the phenoxyl radical half-life?

      Thanks for catching this. We’ve added our reference for the phenoxyl radical half-life.

      (3) Figure 1B: The authors state that neutravidin stained glia; however, there is no glial marker (e.g., anti-Repo) in this panel.

      We acknowledge the reviewer’s point. The lack of anti-Repo staining in Figure 1B is due to the requirements of the Neutravidin-Alexa 647 detection method. Because this procedure bypasses traditional primary and secondary antibody incubation to preserve the biotin signal, co-staining with Repo was not technically feasible. Nevertheless, we utilized the Repo-GAL4 driver to express UAS-CD2-HRP; since this driver is well-documented and specific to glial cells, the Neutravidin signal serves as a functional readout of the targeted glial population.

      (4) Line 254: There is no Figure 2D.

      We’ve corrected this to Fig. 2C.

      (5) Lines 390-396: No reference to the respective figures.

      We’ve made a couple corrections to reference all the respective figures.

      (6) Figure 5C: The X-axis is cut off.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Minor inconsistencies (e.g., figure references-line 254 references "Figure 2D" where none exists) should be corrected.

      We’ve corrected this to Fig. 2C.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Genetically encoded fluorescent proteins expressed in specific cell types allow recognising them in vivo and, if the protein is a functional indicator, as in the case of genetically encoded calcium indicators (GECIs), to record activity from the same cellular ensemble. Ideally, if proteins (fluorophores) have perfectly distinct spectral properties, signals can be distinguished from as many cell types as the number of employed fluorophores. In practice, fluorescent proteins have non-negligible crosstalk both in absorption and emission bands. In addition, fluorescence contribution of each fluorophore normally varies from cell to cell and therefore spectral properties of cells expressing two or more proteins are different. The work of Phillips et al. addresses this challenge. The authors present an approach defined as "Neuroplex", allowing identification of up to nine cell types from the same number of fluorophores. The fingerprint of each cell is then associated with functional fluorescence from the GECI GCaMP, allowing recording calcium activity from that specific cell. The method is implemented in vivo using head-mounted miniscopes.

      The authors used a mouse line expressing GCaMP in cortical pyramidal neurons and developed an experimental pipeline. First, they injected the nine AAV viruses, causing expression of fluorophores in a different brain area. The idea was not to image that area, but a non-infected medial prefrontal cortex (mPFC) section where neurons could be infected by their axons projecting in an injected area, in this way being identified by their targeting region(s). A GRIN lens, allowing spectral analysis, was mounted in the mPFC section, and GCaMP fluorescence was then recorded during behavioural tasks and analysed to identify regions of interest (ROIs) corresponding to neuron somata. After functional imaging, the head of the mouse was fixed, spectral analysis was performed, and after necessary correction for chromatic distortions, the fluorophore contribution was determined for each ROI (neuron) from where GCaMP signals were detected. Notably, the procedures for estimation and correction of chromatic aberration and light transmission (described in Figure 2) were a major challenge in their technical achievements. The selection of the nine fluorophores was another big effort. This was done by combining computer simulations and direct measurement of spectra from individual proteins expressed in HEK293 cells. It is important to say that the authors could simulate arbitrary combinations of two or more different fluorophores and evaluate the ability of their algorithm to detect the correct proteins against wrong estimations of false-negative (absence of an expressed protein) or false-positive (presence of a non-expressed protein). Not surprisingly, this ability decreases with the level of GCaMP expression. The authors underline that most errors were false-negatives, which have a milder impact in terms of result interpretation, but the rate of false positives was, nevertheless, relevant in detecting a second fluorophore from a cell expressing only one protein. The experimental profiles of fluorophores were dependent both on the specific fluorescent protein and on the projecting area, and the distribution of double-labelled did not match anatomical evidence. This result should be taken as the limitation of the present pioneering experiments, presented as proof-of-principle of the approach, but Neuroplex may provide far improved precision under different experimental conditions.

      In my view, the work of Phillips et al. represents a significant advance in the state-of-the-art of the field. The rigorous analysis of limitations in the use of Neuroplex must be considered an important guideline for future uses of this approach.

      We appreciate the reviewer’s positive evaluation and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript introduces Neuroplex, a pipeline that integrates miniscope Ca²⁺ imaging in freely moving mice with multiplexed confocal and spectral imaging to infer projection identities of recorded neurons. This technical approach is promising and could broaden access to projection-resolved population imaging. However, the core quantitative analyses apply a winner-take-all single-label assignment per neuron even when multiple fluorophores exceed threshold, with additional labels treated descriptively as "secondary hits." While the authors acknowledge and simulate dual labeling, the extent to which this single-label decision rule affects subtype fractions and behavioural comparisons remains uncertain without a multi-label (or probabilistic) sensitivity analysis and propagation of classification uncertainty.

      We thank Reviewer #2 for the careful statistical perspective and focus on assignment strategy and uncertainty. Importantly, we emphasize that Neuroplex is presented as a methodological proof-of-principle, not as a definitive quantification of projection convergence.

      Strengths:

      (1) Conceptual advance and practicality: Decoupling acquisition from identity readout constitutes an innovative approach that is, in principle, applicable in laboratories currently using single-color miniscopes.

      (2) Engineering thoroughness: The manuscript offers detailed consideration of GRIN optics, spectral libraries, registration procedures, and simulations that address signal-to-noise ratio, background, and class imbalances.

      (3) Immediate community value: If demonstrated to be robust, the pipeline could enable projection-resolved analyses without reliance on specialized multicolor miniscopes.

      Weaknesses:

      (1) Single-label assignment in the main analyses: When multiple fluorophores exceed threshold for a neuron/ROI, the workflow applies a winner-take-all rule and assigns a single label (the fluorophore with the largest standardized beta), while additional above-threshold fluorophores are retained only as "secondary hits." This is a reasonable specificity-first choice, but because cortical excitatory neurons can collateralize, collapsing dual-threshold ROIs to one identity may under-represent dual-projecting cells and could bias estimated subtype fractions and behavioural comparisons.

      We thank the reviewer for raising this important conceptual point.

      We agree that cortical excitatory neurons frequently collateralize and therefore may legitimately express more than one retrograde fluorophore. Our use of a winner-take-all (WTA) rule in the primary analyses was an intentionally conservative methodological choice designed to prioritize specificity over sensitivity in this proof-of-principle study.

      As demonstrated in our simulations (Supp. Fig. 5–6), under realistic background and noise conditions, secondary assignments are more susceptible to false-positive errors than primary assignments. For this reason, we chose to assign a single primary identity for quantitative behavioral stratification while retaining additional above-threshold fluorophores as “secondary hits” and reporting their distribution separately (Supp. Fig. 7).

      We did not intend to imply that projections are exclusive. Rather, the WTA strategy provides a conservative lower-bound estimate of subtype proportions and avoids inflation of dual-label rates under conditions where spectral separability is imperfect.

      We agree that this rationale should be stated more explicitly in the manuscript, and that the potential impact of assignment strategy on subtype fractions and behavioral comparisons should be acknowledged clearly as a methodological trade-off rather than a biological claim.

      Importantly, the biological analyses presented in this manuscript are illustrative demonstrations of functional stratification capability and do not depend on exclusivity of projection identity. We have revised the manuscript to clarify this framing as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using a winner-take-all rule. We emphasize that this assignment strategy does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.” (Result, Fluorophore distribution in behaviorally relevant ROIs)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications. ” (Results, Neuronal Cell Type and Behavior)

      “Cortical pyramidal neurons frequently collateralize to multiple downstream targets, and accordingly some ROIs exceeded threshold for more than one fluorophore. In this proof-of-principle implementation, we adopted a specificity-first winner-take-all assignment rule for primary analyses to minimize false-positive multi-label calls under realistic noise conditions. This strategy likely underestimates the true prevalence of dual-projecting neurons and should therefore be interpreted as a conservative stratification approach rather than a statement of projection exclusivity.” (Discussion)

      (2) Dual-label detection is acknowledged but remains descriptive in vivo: the manuscript explicitly discusses the possibility of dual projection, evaluates dual-fluorophore detection in simulations (including performance under realistic noise/background), and reports in vivo rates of secondary hits. However, these dual-threshold events are not incorporated as co-identities in the main statistical analyses, making it difficult to judge how robust the principal biological conclusions are to the single-label decision rule.

      We thank the reviewer for this important clarification request.

      We agree that dual-projection neurons are biologically plausible and that dual-threshold ROIs were detected in vivo. In this manuscript, however, our primary goal was to establish the feasibility of high-dimensional spectral assignment and projection-resolved stratification, rather than to provide a definitive quantification of projection convergence.

      For this proof-of-principle study, we chose a conservative winner-take-all (WTA) framework for primary behavioral analyses in order to minimize false-positive multi-label assignments under realistic noise and background conditions, as demonstrated in our simulations (Supp. Fig. 5–6). Secondary hits were retained and reported descriptively (Supp. Fig. 7), but not incorporated into the primary statistical comparisons to avoid overinterpretation of potentially ambiguous dual-label calls.

      Importantly, the principal biological conclusions presented in the manuscript are qualitative demonstrations that projection-defined stratification is feasible within a single animal. These conclusions do not rely on projection exclusivity or on precise quantification of dual-projecting fractions.

      We agree that this distinction should be made clearer in the manuscript, and we have revised the text as follows:

      “Although dual-threshold ROIs were detected in vivo, these secondary assignments were not incorporated as co-identities in the primary behavioral analyses. This decision reflects a conservative specificity-first framework designed to minimize false-positive multi-label calls under realistic noise conditions. Accordingly, dual-label rates reported here should be interpreted descriptively. The present study focuses on demonstrating the feasibility of projection-resolved stratification, rather than providing definitive quantification of projection convergence.” (Results, Fluorophore distribution in behaviorally relevant ROIs)

      “We then stratified these neurons by projection target and examined behaviorally selective activity across cell types. These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Behavioral Analysis)

      (3) Uncertainty is not propagated: False-positive/false-negative rates from simulations and uncertainty from registration/segmentation are not carried forward into quantitative confidence bounds on subtype proportions or behaviour-by-subtype effects.

      We agree that formal propagation of classification and registration uncertainty into subtype proportions and behavioral comparisons would be appropriate in a study primarily focused on precise anatomical quantification. However, the central goal of the present manuscript is methodological and to demonstrate that high-dimensional spectral identity can be reliably linked to miniscope-recorded functional activity within a single animal.

      We have shown that simulations under realistic noise, background, and class imbalance conditions (Supp. Fig 5-6) show that errors are predominantly false negatives rather than false positives. However, behavioral analyses are presented as qualitative demonstrations of the feasibility of projection-resolved stratification rather than as definitive quantitative anatomical measurements.

      In the revised manuscript, we clarified that 1) subtype proportions and behavioral effects are assignment-dependent estimates, 2) simulation-derived error rates provide guidance for experimental design rather than formal confidence intervals, and 3) future studies centered on precise quantification of projection fractions would benefit from formal uncertainty modeling, as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “Because classification uncertainty was not formally propagated into these analyses, behavior-by-subtype comparisons should be interpreted as qualitative demonstrations of functional stratification rather than precise quantitative estimates.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      Reviewer #3 (Public review):

      This manuscript presents Neuroplex, a technically rigorous and carefully validated pipeline that links miniscope calcium imaging in freely behaving animals with high-dimensional fluorophore-based cell-type identification using in vivo multiplexed spectral confocal imaging through the same implanted GRIN lens. The work overcomes a major practical limitation of head-mounted microscopy by enabling the identification of up to nine projection-defined neuronal populations within the same animal, without post-fixation histology. The approach is well motivated and supported by extensive calibration and simulation. While the biological results are primarily illustrative, the methodological contribution is clear and likely to be broadly useful.

      Major comments

      (1) The approach relies on the assumption that fluorophore identity assigned during anesthetized confocal imaging accurately reflects the identity of neurons recorded during prior behavioural sessions. While the use of the same GRIN lens and in vivo co-registration mitigates many concerns, the manuscript would benefit from a more explicit discussion, or empirical demonstration, if available, of the stability of fluorophore assignments across time. Even limited repeat spectral imaging in a subset of animals would strengthen confidence in longitudinal applicability.

      We thank the reviewer for highlighting this important conceptual assumption.

      Fluorophore identity in Neuroplex is genetically encoded via AAVretro delivery and therefore does not depend on transient physiological state. Spectral imaging is performed in vivo through the same GRIN lens and field of view used during behavioral imaging, and co-registration relies on anatomical landmarks. While repeat spectral imaging was not formally performed as a longitudinal experiment, the underlying fluorescent protein expression is stable over weeks, and there is no biological mechanism in this paradigm that would alter fluorophore identity across sessions.

      We revised the manuscript to explicitly state this assumption and clarify why identity stability is expected as follows:

      “…fluorophore signals and reduce unmixing fidelity, leading to an increased false positive rate. Fluorophore identity in this framework is genetically encoded via retrograde AAV delivery and is therefore expected to remain stable across behavioral and spectral imaging sessions. Because both functional and spectral data are acquired in vivo through the same GRIN lens and co-registered using anatomical landmarks, assignment stability is not expected to vary across time unless expression levels change substantially. While repeat spectral imaging was not performed as a formal longitudinal experiment in this study, the stability of fluorescent protein expression supports the assumption that fluorophore identity reflects a persistent cellular attribute.” (Discussion)

      (2) Fluorophore identity is determined using thresholding of linear unmixing coefficients relative to an empirically defined baseline, followed by a second adaptive pass for over-represented fluorophores. While this heuristic is extensively validated via simulations, it remains ad hoc from a statistical perspective. The authors should more explicitly justify this choice and discuss its limitations relative to probabilistic or likelihood-based classifiers, particularly with respect to uncertainty estimation at the single-ROI level.

      We agree that the dual-pass thresholding approach is heuristic rather than fully probabilistic. More formal probabilistic classifiers are possible but would introduce additional modeling assumptions and training requirements beyond the scope of this proof-of-principle study.

      We revised our manuscript to clarify this as follows:

      “The current classification framework relies on linear unmixing followed by empirically defined thresholding rather than full probabilistic inference. This approach provides transparency and practical robustness under realistic noise and background conditions but does not generate single-ROI posterior uncertainty estimates. ” (Discussion)

      (3) Identifiability of fluorophores is demonstrated empirically, but the manuscript does not explicitly quantify spectral separability (e.g., similarity metrics between basis spectra or conditioning of the unmixing matrix). A brief analysis of spectral independence or sensitivity of beta estimates to noise would provide mathematical reassurance, especially given the reliance on linear regression in a high-dimensional feature space.

      We agree that spectral separability is conceptually important. In this manuscript, separability is demonstrated empirically through 1) In vitro fingerprint acquisition under identical optical conditions, 2) simulation under background and noise, and 3) successful in vivo classification across regimes. We did not compute formal matrix conditioning metrics, but we agree that the separability rationale should be described more explicitly. We revised our manuscript as:

      “While formal conditioning metrics were not explicitly computed empirical fingerprint acquisition and simulation-based perturbation analyses demonstrate sufficient spectral independence for reliable linear unmixing under the tested regimes.” (Discussion)

      (4) The spectral unmixing treats CNMF-derived ROIs as fixed supports. I wonder whether ROI boundaries, neuropil contamination, and partial overlap can introduce structured uncertainty that could bias spectral estimates. If so, the authors should acknowledge this dependency more explicitly and discuss how ROI quality or overlap might influence false negatives or false positives, particularly in densely labelled regions.

      We agree that ROI definition influences spectral extraction. Spectral fingerprints are derived by averaging all pixels within the ROI mask, and therefore neuropil contamination, partial ROI overlap, and dense labeling could influence beta estimates. In the revised manuscript, we have acknowledged this dependencies more explicitly.

      “Spectral unmixing operates on CNMF-derived ROI masks treated as fixed supports. Accordingly, segmentation quality, neuropil contamination, and partial overlap between neighboring cells can influence extracted spectral fingerprints and may contribute to false negatives or secondary assignments, particularly in densely labeled regions. These structured sources of uncertainty are expected to have the greatest impact under regimes of extreme class imbalance, low fluorophore brightness, strong neuropil signal, or pairing of spectrally overlapping reporters. Use of refined segmentation strategies or nuclear-localized reporters could reduce such structured uncertainty in future implementations.” (Discussion)

      (5) The manuscript reports meaningful rates of secondary fluorophore detection, but also nontrivial false-positive rates for secondary labels under realistic conditions. The authors appropriately caution against over-interpretation, but the Discussion should more clearly delineate when dual-label assignments are likely to be biologically interpretable versus methodologically ambiguous, and how experimental design (e.g., fluorophore pairing) should be optimized accordingly.

      We agree and will delineate interpretability boundaries explicitly.

      “Dual-label assignments are most reliable when fluorophores are spectrally well separated and when signal-to-noise ratios are high. In contrast, spectrally adjacent fluorophore pairs or densely labeled regimes increase ambiguity and false-positive risk. Experimental design should therefore prioritize pairing spectrally distant fluorophores when projection convergence is of primary interest.” (Discussion)

      (6) I suspect that Neuroplex will be most effective in certain regimes (moderate convergence, bright and spectrally distinct fluorophores) and less reliable in others. A more explicit discussion of best practices, anticipated failure modes, and experimental scenarios where the method may be inappropriate would increase the practical value of the paper for adopters.

      “More broadly, Neuroplex is expected to perform most robustly in regimes characterized by moderate projection convergence, balanced fluorophore representation, bright and spectrally distinct reporters, and adequate signal-to-noise ratio. Imaging directly within a projection target that has received dense retrograde labeling may introduce substantial class imbalance, which simulations predict will reduce detection sensitivity for the dominant fluorophore. In such cases, conservative assignment strategies, reduced spectral complexity, or refinement of ROI definition may improve interpretability. Careful fluorophore selection and pilot validation under intended imaging conditions are therefore recommended prior to large-scale application. Future implementations incorporating nuclear-localized reporters may further reduce segmentation-dependent ambiguity by constraining spectral signals to somatic compartments.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address a few points that are not clear.

      (1) At the end of the Results, the authors assess their approach using only four fluorophores and conclude that Neuroplex works "even" under reduced complexity. There is something I am missing. In my mind, lower complexity should be easier and should work better. As a researcher, I would first assess a four-fluorophores scenario and then step up with complexity, but the authors did the opposite. Also, I think that the present Supplementary Figure 9 should be in the main text; I don't understand why the authors decided to relegate a clear result to the bottom of everything. The authors should give some explanations.

      We agree that reduced spectral complexity should, in principle, improve separability and classification performance. Our original presentation order was intended to first demonstrate feasibility under the most challenging condition (nine fluorophores plus GCaMP), thereby establishing maximal multiplexing capacity. The reduced-complexity experiment was included to demonstrate scalability and generalizability under more typical experimental regimes. However, we agree that this rationale was not sufficiently clear and that the reduced-complexity results merit presentation in the main text.

      Accordingly:

      We have moved former Supplementary Figure 9 into the main Results (Fig. 6).

      We have clarified explicitly why the nine-fluorophore condition was presented first as follows:

      “To evaluate the performance of Neuroplex under more typical experimental regimes with reduced-complexity, we applied the pipeline to two GCaMP transgenic animals injected with a subset of four fluorophores.”

      (2) The question of relative expression is crucial. Among the infected regions, there is the contralateral mPFC and I imagine that if they image there, the contribution of the expressed protein might dominate all other components, preventing detection of other fluorophores, including GCaMP. But is it the case, or would it be possible to detect projecting neurons in that region? I would be surprised that the authors never tried it; this test would simply imply mounting the GRID lens on the other hemisphere.

      This is an important conceptual point.

      Our simulations (Supp. Fig. 5) explicitly model over-representation of a single fluorophore. These results show that heavy class imbalance primarily increases false negatives (due to baseline normalization) rather than false positives.

      In the revised manuiscript, we discussed this limitation more explicitly.

      “Relative fluorophore representation within the imaged field of view influences classification robustness. As demonstrated in our simulations of class imbalance (Supp. Fig. 5g–h), extreme over-representation of a single fluorophore primarily increases false-negative rates due to baseline normalization effects. In the present study, we intentionally avoided imaging directly within heavily infected projection targets (e.g., contralateral mPFC) in order to maintain moderate fluorophore representation across ROIs. Imaging in a densely labeled region would represent a more challenging regime, and we would expect reduced sensitivity for the dominant fluorophore under such conditions.” (Dicussion)

      (3) The possibility to utilise Neuroplex goes beyond the type of experiment presented as proof-of-concept in this technical paper. In the Discussion, the authors mention genetically defined subtypes and activity-tagged neurons. But, if one changes the pipeline, can it be used by expressing GECIs with different spectra, or GECIs and genetically-encoded voltage indicators (GEVIs)? I would be very interested in knowing what the authors think about this putative "shortcut".

      We thank the reviewer for this forward-looking and insightful question.

      In principle, the Neuroplex framework could be extended to incorporate spectrally distinct genetically encoded functional indicators, including multi-color GECIs or combinations of GECIs and GEVIs. However, it is important to distinguish this from the identity-assignment strategy implemented in the present study.

      Simultaneous multi-color functional imaging under a head-mounted miniscope is optically more demanding than assigning cell identity from single-color functional recordings followed by high-dimensional spectral readout. Multi-color GECI or GEVI imaging requires real-time excitation and emission separation during dynamic recording, increases optical complexity, and is particularly sensitive to chromatic aberration, photon efficiency, and signal-to-noise constraints imposed by GRIN lenses.

      In contrast, Neuroplex decouples functional acquisition from spectral identity determination. Functional activity is recorded using a single optimized channel, while spectral separation is performed separately under controlled confocal conditions with multiplexed excitation and emission sampling. This design substantially reduces optical burden during behavioral imaging.

      While integration of multiple functional reporters is conceptually feasible within this framework, successful implementation would require careful validation of brightness, spectral separability, and temporal stability for each reporter combination.

      Reviewer #2 (Recommendations for the authors):

      (1) Implement a principled multi-label calling mode for cells with >1 above-threshold fluorophore (e.g., per-fluorophore FDR control or Bayesian posteriors). Report cell-wise weights and re-run key results three ways: single-label, hard multi-label, and soft (probabilistic) assignments; state explicitly how conclusions change.

      We appreciate this suggestion and agree that multi-label or probabilistic calling frameworks are well motivated, particularly for studies in which projection convergence is the central biological question. In the current manuscript, however, our goal is to establish a practically deployable proof-of-principle pipeline for linking miniscope functional recordings to a high-dimensional spectral-identity readout. Consistent with this scope, we used a conservative winner-take-all (WTA) strategy for primary analyses to prioritize specificity under realistic noise and background conditions, and we treated multi-hit events descriptively. Importantly, the qualitative conclusions regarding projection-resolved functional stratification are unchanged when secondary-hit distributions are examined.

      In the revised manuscript, we explicitly stated that: (i) single-label assignment is a conservative analysis choice rather than a biological claim of exclusivity, and (ii) multi-label or probabilistic calling is a natural extension for future work, as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (2) Add ground truth for dual projectors in a subset (paired orthogonal tracers or staged injections) and provide a confusion matrix including dual-positives; use this to calibrate thresholds/priors.

      We agree that ground truth validation of dual projectors using orthogonal tracers or staged injections would be valuable, particularly for calibrating priors and enabling confusion-matrix-based evaluation. However, these experiments require additional cohorts and experimental design beyond the scope of the current proof-of-principle technical manuscript. Our goal here is to demonstrate the feasibility of multiplexed identification and projection-resolved stratification within a single animal, not to provide definitive anatomical quantification of collateralization.

      We have revised the manuscript to clearly state that dual-label in vivo observations are descriptive and that studies aimed at quantitative convergence mapping should incorporate orthogonal ground truth validation.

      “Accurate quantification of projection convergence would benefit from orthogonal ground-truth validation (e.g., paired tracers or staged injections) to establish confusion matrices for dual positives and to calibrate thresholds or priors.”

      (3) Propagate uncertainty from simulations and registration/segmentation to subtype fractions and behavior effects (error bars or sensitivity analyses).

      We agree that formal uncertainty propagation is appropriate for studies focused on precisely quantifying subtype proportions or effect sizes. In this manuscript, subtype fractions and behavioral comparisons are presented primarily as demonstrations of the feasibility of projection-resolved functional stratification, rather than definitive anatomical measurements. Simulation analyses are included to characterize expected performance under defined noise and background regimes, but we did not propagate these uncertainties into downstream confidence bounds in this proof-of-principle work.

      We have revised the manuscript to clarify this explicitly as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (4) Mitigate sources of spurious multi-hits (neuropil handling, ROI mask erosion, nuclear-localized reporters, spectral basis choices) and quantify their impact on dual-label recovery.

      We agree that neuropil contamination, ROI boundary choices, and spectral basis selection can influence multi-hit rates. In the current manuscript, we already implement background subtraction and evaluate multi-hit behavior through simulations under realistic background and noise regimes. Quantitative evaluation of additional mitigation strategies (e.g., ROI erosion comparisons) would require new analyses beyond the current scope.

      We have revised the Discussion to include concrete best-practice recommendations (e.g., fluorophore pairing, conservative interpretation of multi-hits, and potential use of nuclear-localized reporters).

      “Multi-hit events can reflect true biological collateralization but may also arise from structured sources of ambiguity such as neuropil contamination, partial ROI overlap, or imperfect ROI boundaries. These factors may bias spectral estimates and contribute to secondary assignments, particularly in densely labeled regions. Practical mitigation strategies include conservative assignment rules, improved segmentation, and use of nuclear-localized reporters to reduce neuropil contribution. ”

      (5) Clarify claims in the main text/figures wherever exclusivity is implied; label which panels use single-label vs multi-label/soft assignments.

      We agree and thank the reviewer for emphasizing clarity. We did not intend to imply projection exclusivity. We have revised the manuscript text and figure legends to explicitly state where single-label (winner-take-all) assignment is used, and to avoid language that could be read as claiming exclusive projection identity as follows:

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using conservative winner-take-all rule. This assignment reflects the strongest spectral contribution and does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.”

    1. Reviewer #3 (Public review):

      Summary:

      This important work provides a web-based tool to contextualize effect sizes in psychiatry with respect to reliability and base rates (collectively referred to as predictive utility analysis). The methods for the tool incorporate established psychometric principles that I think are of use for multiple fields in this seemingly easy-to-use tool. I agree with the critical importance of this tool and the methodological points made in this manuscript. Enthusiasm for the manuscript is weakened by a lack of clarity on the formulation of the paper and stated goals of the examples used, with the inferences and impact on clinical decision making from various parameterizations via this tool left open-ended.

      Strengths:

      This paper presents a well-considered and, what I think will be highly useful, web-based tool to contextualize effect sizes with respect to reliability and base rates. As the authors rightly point out, such a tool could be used in conjunction with widespread analytic power analysis tools in study planning. The paper also well contexualizes the need for such a tool in the relatively recent history of concerns of power, reliability, and inference in psychiatry specifically, and more general meta-scientific debates in psychology and neuroscience.

      Weaknesses:

      My primary feedback on this manuscript is the lack of clarity in what the paper itself, specifically, separate from the tool, is hoping to achieve. There is a central, but unresolved, tension in whether the reader is supposed to:

      (1) focus on the specifics of the examples used and whether to reevaluate the substantive claims from the studies, (2) buy in to how various reliability and base rate parameters impact modeling outcomes, (3) receive an introduction to the tool itself.

      In my estimation, the largest contribution to the field here is in (2) and (3), but currently much of the real estate of the paper is dedicated to several examples of (1). While these specific examples may be illustrative to some degree, I think given the number and brevity of such, they are unlikely to incidentally achieve points (2) and (3) above. Specific examples include the assertion of kappas for DSM diagnoses, without much nuance (e.g., see https://psycnet.apa.org/buy/2015-27500-001). Given the relatively limited space given to this example, however, it's hard to be entirely certain what the reviewer should take away.

      A second point of concern is where this tool would be situated in the research pipeline. I agree with the authors that this tool could be used in ways that parallel power analysis. With that in mind, it seems the most common use of this tool for an individual investigator is likely to be in a priori study planning. In contrast, and with my point above in mind, the use of the tool for existing results is likely best done with multiple estimates of effect sizes, reliability, and base rates, as is common in meta-analysis or consensus reviews. Nevertheless, there is no real example or guidance around how this influences new study planning.

      A third point is that more nuance would be useful in the introduction about the current state of psychiatry research. For example, I share many of the authors' concerns about reliability, power, reproducibility, and barriers to translation. That said, it is the case that while effect sizes should be considered considerably more, they are widely considered in psychiatry research via the common place of meta-analysis and other data pooling approaches. Another such example that the authors state in the context of reliability: "However, this [reliability] attenuation is rarely accounted for in routine analyses in psychiatry". This is true in practice, but somewhat misleading insofar as the method by which to do this remains unclear. For example, should we all report disattenuated associations, assuming there is no error and everything is perfectly reliable? This, of course, would be unrealistic to expect zero error. That we can achieve this with the new tool is clear, but the nuance of how and under what circumstances it should be done is not clear, and such nuance should be better reflected in the framing of the problem. That is, there is also a lack of clarity on what ought to be best practices and field-wide goals, rather than simply the lack of an ability to model these factors.

      Minor point

      For conceptual clarity, it would benefit the manuscript to at least briefly mention the role of validity in translational importance. Of course, the current psychometric issues of reliability, base rate, power, etc are critical, but it should at least be mentioned, given the potential wide audience of this manuscript, validity is important as well. For example, highly reliable measures may not be valid indicators of underlying disease etiology (e.g., fMRI head motion is a highly reliable trait-level feature, but typically not considered an important predictor or consequence of mental health worth investing translational resources in). Relatedly, confounding as a general topic would be useful to mention just briefly, to help with the spirit of considering underlying issues in translation.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility, and clarity (Required)):

      Summary: In this manuscript, the authors examine how peripherin-2 (PRPH2) contributes to the localization of CNGβ1 within rod outer segment structures. PRPH2 and its homolog ROM1 are structural components of rod discs and are required for disc morphogenesis. In the absence of PRPH2, rod outer segments do not form, and various outer segment materials accumulate and are released as cilia-derived ectosomes. PRPH2 is thought to be transported through an unconventional secretory pathway, whereas cGMP-gated channels follow a conventional trafficking route. Although these components reach the outer segment through distinct pathways, PRPH2 is necessary for the proper delivery of CNGB1, a subunit of the cGMP-gated channel, to its correct destination. It was previously reported that a small fraction of PRPH2 reaches the outer segments through the conventional pathway when it forms a complex with Rom1 in mouse photoreceptors. Using Rom1 KO mice, the authors show that this conventionally trafficked PRPH2 fraction is not required for CNGB1 transport to the outer segment. Using various chimeric constructs, the authors verified that tetraspanin core of PRPH2, delivered to the OS, is sufficient to promote OS localization of CNGB1. Ct and Nt cytoplasmic regions of PRPH2 are dispensable for the role. Overall, the majority of the experiments are well-executed with statistical rigor, written in a way that others can reproduce, and support the major conclusion indicated in the title, "PRPH2 is essential for OS localization of CNGB1".

      Major comments: I believe that the majority of the conclusions are well-supported in this manuscript. Below, I am listing the major points that may need additional experiments or clarifications: 1) CNGA1 subunit is transported to and enriched within ciliary exosomes or the outer segment in PRPH2 deficient mice (Figure 1). The reduced levels of CNGA1 and CNGB1 in rds-/- mice suggest limited stability of these proteins. Their diminished abundance is also influenced by decreased mRNA expression of the corresponding genes. These findings imply that CNGB1 may not be essential for outer segment delivery of cGMP-gated channels if CNGA1 alone contains adequate targeting information. Related to these points, it is unclear whether CNGB1 exhibits a trafficking defect or encounters other problems before leaving the endoplasmic reticulum. Such problems may involve deficiencies in folding, holo-channel assembly, or related quality control processes.

      RESPONSE: We agree with this reviewer and have added additional data and interpretation to address this point. Our new data finds that in fact a low level of CNGB1 can reach ectosomes in rds-/- rods, which makes sense since we and others had observed CNGA1 was present and we know that channel assembly occurs in the ER. This suggests that the CNG channel can properly fold and assemble. Furthermore, overexpressing CNGB1 did not restore ciliary localization in Rds-/-, leading to our interpretation that in the absence of an outer segment membrane compartment, there is no place to deliver the CNG channel and it is subsequently degraded. Apart from perihperin’s binding partner, ROM1, this is unique to the CNG channel. CNG channel subunits are still significantly lower at P21 than other outer segment membrane proteins, such as ABCA4 (shown here), rhodopsin, and PCDH21(shown elsewhere).

      2) CNGB1 overexpression in rds-/- mice does not result in outer segment localization of CNGB1 channels (Figure 2A). These findings do not clarify whether CNGB1 successfully transits through the Golgi apparatus or associates properly with CNGA1 subunits. Elevating expression levels alone would not compensate for problems in folding or assembly.

      RESPONSE: We recognize that our previous submission lacked clarity on this point. Therefore, we have restructured the order of figures and provided additional controls to improve our manuscript. First, the fact that CNG channel is present at P21 and even increases over time suggests that in rds-/- rods channel processing (folding and assembly) is unaffected. Second, we recognize that channel stoichiometry is important for proper channel assembly, so we added a new supplementary figure that shows endogenous CNGA1 expression increases in rds-/- rods that are overexpressing myc-CNGB1 and FLAG-peripherin-2. This adds credence to our CNGB1 overexpression experiments and shows that CNGB1 being trapped is not due to inefficient channel assembly.

      3) Claims related to Figure 6 (P45 rds-/-) need further evidence. It remains uncertain whether CNGA1 and CNGB1 are delivered to lamellar ciliary membranes or to a distinct plasma membrane compartment comparable to that observed in wild type rod outer segments, or whether they accumulate in ciliary ectosomes. Those lamellar structures could be a part of cone outer segments. The observed GARP signal may originate solely from soluble GARP proteins. It is also unclear if CNGA1 and ROM1 colocalize in P45 rds-/- mice. Clarifying these points would strengthen the conclusion that lamellar formation, rather than specific function of PRPH2, is sufficient for CNGB1 delivery to the cilium or outer segment plasma membrane.

      RESPONSE: CNGA1/B1 are not expressed in cones, so the elevated outer segment localization observed at P45 must be coming from rods. In mouse retina, cones make up only 3% of the photoreceptor population. The SEM data clearly show that the lamellar ciliary protrusions are present on the majority of the photoreceptors. We now include CNGB1 staining from Rds-/- P45 sections that corroborate these data and show that CNGB1 is present at P45 and not P21 (Supplemental Figure 2).

      Below are minor comments: 1) The study does not establish whether a direct interaction between PRPH2 and CNGB1 is required for CNGB1 delivery to rod outer segments. Prior work by the senior author (ref 13) suggests that this interaction is not essential, since the PRPH2 binding site within the GARP domain is distinct from outer segment transport signal of CNGB1. Including a discussion of the PRPH2-GARP (or CNGB1) interaction and its relevance to CNGB1 trafficking would help readers interpret the findings more fully.

      RESPONSE: We have included this in our discussion.

      2) The authors propose that the ROM1 core is sufficient for outer segment delivery of CNGB1 based on experiments with chimeric constructs. However, in Figure 1, ROM1 is present in the outer segments (or ciliary ectosomes) of rds-/- mice even though CNGB1 is not delivered to these structures.

      RESPONSE: Our new data, including MS analysis and Western analysis from an enriched ectosome preparation, reveal that, along with ROM1, low levels of the CNG channel are delivered to ciliary ectosomes in Rds-/- mice. However, at this early timepoint photoreceptor cilia do not produce a membrane protrusion, which we observe is required to augment CNG delivery. We expressed a FLAG-ROM1 construct to try to drive earlier creation of these membrane protrusions, but this was unsuccessful, as we observed ROM1 was primarily localized to the inner segment. This suggests that overexpression of ROM1 did not increase ROM1 delivery to the cilia. Luckily, we were able to overcome this bottleneck with several of our chimeric ROM1/Prph2 constructs that did localize to the cilia and restore CNG localization. All of these new results have been included in the revised manuscript.

      3) Line 80: "Theouter" A space shall be inserted between "The" and "outer".

      RESPONSE: Done

      **Referee cross-commenting**

      Both reviewer #2 and reviewer #3 express views that align with mine. They clearly described the study's limitations, and their comments are highly valuable.

      Reviewer #1 (Significance (Required)):

      Prior studies showed that CNGB1 is not present in cilia-derived ectosomes of rds-/- mice, indicating that PRPH2 is necessary for ciliary or outer segment localization of CNGB1 in rods. Building on these earlier findings, I consider this study significant for the following reasons: 1) Using detailed analysis of different PRPH2 domains and chimeric constructs, it clarifies that PRPH2 core region, delivered to OSs, is essential and sufficient for OS localization of CNGB1. 2) PRPH2 and CNGB1 are thought to travel through different post-ER transport routes, with one pathway bypassing Golgi regions and the other passing through them. This study shows that CNGB1 depends on PRPH2, which suggests that these two routes may converge or interact at later stages and opens new directions for future investigation. 3) The study is relevant to basic scientists and biologists investigating how membrane structures acquire specialized functions in neurons, and its implications extend beyond photoreceptor biology.

      Limitation of the study: I believe that clarifying these points will make the manuscript more significant. 1) Is it not clear, as mentioned above, how PRPH2 contributes to the delivery of CNGB1 to the OSs in the different secretory pathways.

      RESPONSE: In the absence of ROM1, Prph2 only travels through the unconventional secretory pathway directly from the ER. By looking at CNG trafficking and localization in ROM1-/- mice, we rule out the possibility that the small portion of PRPH2/ROM1 complexes that traffic conventionally through the Golgi are required for channel localization (Figure 3). Further, our Rho-Prph2 chimera that includes the trafficking signal from Prprh2 did not rescue CNGB1 localization (Figure 4). These findings suggest that it is unlikely that these proteins engage during secretory transport to the outer segment.

      2) The prior study using a fluorescence complementation approach (Ritter et al, 2011) suggests that PRPH2 and CNGB1 can associate within rod ISs, likely before their delivery to OSs. However, it remains unclear whether this interaction supports the potential cotransport of CNGB1 and PRPH2 or whether the authors view these proteins as being transported independently.

      RESPONSE: As described above, our experiments rule out the notion that co-transport through the Golgi is driving CNG channel ciliary localization. We now note in our discussion that this data does not rule out the possibility of an earlier association between these proteins. However, the bulk of our data supports that any early interaction is not required for ciliary delivery.

      3) At the end of the result section (Figure 6, rds-/- P45), the authors suggest that lamellar formation (evaginations?) is required for CNGB1 transport. However, CNGB1 is normally not seen in evaginations or lamellar structures, and thus the assumption is not consistent with prior findings.

      RESPONSE: Absolutely, we agree that the CNG channel does not enter newly forming disc membranes, which has been shown by multiple groups. We included this in our discussion and have now added a clearer statement of our hypothesis: “Together, these data suggest that the partitioning of disc membranes from the plasma membrane by tetraspanin proteins is a key step for localizing the CNG channel and could play a role in segregating other proteins into the plasma membrane.”

      Overall, the manuscript is insightful and has the potential to advance our field and related disciplines.

      RESPONSE: Thanks!

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Cyclic nucleotide gated channels (CNG) localize to the plasma membrane of the rod photoreceptor outer segments, and are a key component of the phototransduction cascade. Understanding how outer segment proteins are trafficked and sequestered to the outer segments is an important field of investigation as it addresses both a fundamental aspect of cell biology and mechanism of disease, many of which have trafficking defects at the core of the pathogenic process. Using primarily IHC analysis of rodent models in combination with introduction of various expression constructs to the retina (through electroporation), this study finds that two rod outer segment structural proteins, peripheral-2 and ROM1, facilitate CNG channel localization to the outer segment.

      While this conclusion is interesting, a major concern that tempers enthusiasm is that in peripherin-2 null photoreceptors, there are no outer bona fide segments. In lieu of outer segments, there are rudimentary membranous protrusions and vesicles distal to the connecting cilia where outer segments should be. So the basis for concluding that peripherin-2 is required for CNG localization to the outer segment seems a bit wobbly. It is understood that the authors assumed the membranous materials distal to cilia as proxy for outer segments in their analysis and narrative. This assumption may have some merits. However, it is well known that when outer segment morphogenesis is severely compromised, all normally outer segment-bound proteins are ectopically localized or largely absent due to increased degradation. This could be simply due to the loss of their destination compartment, among other things. It is not clear how the authors could distinguish between a direct causal relationship where loss of one protein leads to the mislocalization of another, from secondary outcomes due to loss of the outer segments. The last sentence of the Abstract is telling. "Interestingly, this notion is supported by endogenous staining of CNGB1, which reappears in aged Rds-/- rods that have produced ciliary membrane protrusions." So in aged mice CNGB1 did localize to the OS, but what changed? There was more OS like material to house the CNGB1 protein in the aged mice.

      RESPONSE: We agree that the loss of the OS compartment is likely driving downregulation of all OS proteins and have included a statement as such in our manuscript. We also performed additional qRT-PCR analysis on ROM1 and ABCA4 to show global downregulation at the mRNA level – consistent with the notion that there are reduced outer segment proteins when morphogenesis is compromised. However, our Westerns and IHC (as well as published data) clearly find a specific decrease in the CNG channel at the protein level, suggesting that not all proteins behave similarly when the outer segment is not formed. We included additional discussion on this point as well. While not directly examined in our manuscript, previous reports have shown the reverse effect: some outer segment proteins (e.g. PCDH21, Prom1) are upregulated in rds-/- retinas (Rattner et al JBC 2004). Therefore, it is an oversimplification to state that all outer segment proteins behave the same when outer segments are not formed properly. Other models of outer segment dysmorphia (e.g. RhoKO, PCDH21KO, Prom1KO, or WASF3) localize the CNG channel properly. We have added this to the discussion and hope that by restructuring our manuscript, we clearly outline that we do think that membrane retention at the tip of the cilia is driving CNG channel localization and that molecularly the tetraspanin proteins play a role in organizing these membranes.

      Reviewer #2 (Significance (Required)):

      Trafficking of nascent proteins to the outer segment in support of its renewal is an important subject, which has significant impact in understanding the mechanisms of retinal degeneration. The conclusion from this study, that peripherin-2 and ROM1 have a direct role in supporting CNG subunit trafficking may well be meritorious. However the data presented are less than fully convincing, and specifically the question of a direct vs secondary effect needs to be better addressed.

      RESPONSE: We appreciate this reviewer’s enthusiasm for investigating this process. The initial premise of our study was to investigate whether a direct effect of peripherin-2 on CNG delivery was possible, which was meritorious based on previously published data. However, we now find no direct trafficking link between CNG and peripherin-2; instead, our data largely find that CNG delivery is dependent on the presence of retained membranes at the ciliary tip – either through natural mechanisms or by driving “rudimentary” outer segment membrane lamination by overexpression of tetraspanin domains. We have restructured the manuscript to help guide the discussion.

      The following quote underpins some of the reasoning in the study. Lines 139-144, "(Figure 2A). This localization pattern suggests that the CNGB1 subunit is trapped in the biosynthetic pathway. In contrast, when FLAG-tagged rhodopsin is overexpressed in Rds-/- rods it traffics properly to outer segment ectosomes (Figure 2B, (19)). We posit that without proper exit from the biosynthetic pathway, the endogenous CNGB1 protein is rapidly degraded to undetectable levels, which we circumvent through overexpression. These data suggest the localization defect of CNGB1 in Rds-/- rods is in the trafficking of CNGB1. " This in my view is an over- interpretation of limited data. The statement implies that rhodopsin and CNGB1 qualitatively differ in their fate but I would argue that both proteins are heavily degraded intracellularly except more of rhodopsin escaped to the "OS" and shows up in IHC. In many rhodopsin mutant transgenic mice, mutant rhodopsin appeared in OS even though intracellular degradation (gumming up the system) is a major factor in the disease process. The claim "rhodopsin trafficked properly to outer segment ectosomes" is not grounded in solid data.

      RESPONSE: We do fundamentally agree that the endogenous CNG channel is heavily degraded, which we confirm by overexpressing an exogenous CNGB1-myc and finding it trapped in the biosynthetic pathway. As stated by the reviewer, this localization pattern is in contrast to what we and others have observed for endogenous rhodopsin, and now show for overexpressed FLAG-rhodopsin – that rhodopsin does traffic to the OS ectosomes. By comparing the localization of both endogenous and overexpressed constructs (using the same promoter), we feel that our conclusion is well supported. We appreciate that our wording of “rhodopsin trafficked properly to the outer segment” is misleading, as traffic of membrane proteins in Rds-/- rods is generally affected and not “proper”. Importantly, we follow up this “limited data” with additional experiments showing that at high expression levels, we are unable to drive CNGB1 localization to OS ectosomes unless we co-express with a tetraspanin domain.

      A further minor comment is that the scope of the study appear limited, with no attempted experiments on how these proteins might interact to effect facilitation of trafficking.

      RESPONSE: Our approach was to be agnostic to the outcome of our hypothesis that peripherin-2 was directly involved in CNG channel trafficking. The experiments we performed to test this (ROM1-/- analysis and Prph2 C-terminal chimeras) did not support a role for peripherin-2 in CNG trafficking. Instead, our data support a model in which membrane retention and organization at the ciliary tip drives CNG channel delivery. We feel that our approach was not limited.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      in the gene encoding tetraspanin protein peripherin 2 (Prph2), i.e., Rds-/-, examining the requirements for various portions of the Prph2 protein in the context of an assortment of chimeric constructs expressed via transfection into photoreceptor cells, to restore localization of the beta subunit of the cyclic nucleotide-gated channel (CNGbeta1) to photoreceptor outer segments (OS) (in a small number of experiments) or, in the majority of experiments, to do so for a recombinant tagged version of this protein also overexpressed by transfection.

      The concluding sentences of the Discussion, which summarize the major conclusions are as follows: "Our data clearly show that localization of the CNG channel is dependent upon peripherin-2 after biosynthetic exit, further suggesting that the necessary action is at the ciliary base. Supporting evidence for this comes from analysis of Rhodopsin knockout outer segments which have internal disc-like structures and localize CNG channel properly. Therefore, in the absence of a fully elaborated outer segment, peripherin-2's ability to delineate a disc is sufficient to drive CNG channel delivery. Together, these data suggest that the partitioning of disc membranes from the plasma membrane by tetraspanin proteins is a key step for trafficking the CNG channel and could play a role in segregating other proteins into the plasma membrane.

      The first sentence contains both reasonable conclusions and phrases whose meaning is unclear or not supported by the results presented. The statement: 'localization of the CNG channel is dependent upon peripherin-2 is supported by the data but, of course, has long been known from previous studies of Rds-/- mice. What is meant by "...after biosynthetic exit..." is unclear. If, by this term, apparently newly invented, the authors mean "after its synthesis of the protein is complete," the statement is accurate, but also a truism.

      RESPONSE: The absence of CNGB1 was reported in previous studies, but the mechanism driving its absence has not been investigated. In our resubmission, we have added additional data that now shows CNGB1 is present at very low levels in Rds-/- ectosomes but remains undetectable by IHC, which is consistent with previous studies mentioned by the reviewer, but is also a novel finding. Importantly, we find specific downregulation of CNG channel subunits in Rds-/- retinas compared to ABCA4, supported by Western blot analysis (Figure 1), and we investigate the mechanism driving this result.

      We appreciate the reviewer pointing out that “biosynthetic exit” is a niche term not broadly understood. We have removed this statement.

      The statement, "the necessary action is at the ciliary base," is NOT supported by the data presented, as the effect of the "successful" Prph2 constructs on CNGbeta1 localization is primarily to increase its levels at the distal end of cilia and at the base of OS-related structures formed in response to the presence of the Prph2 constructs. The restoration of these membranes, which, as the authors note, has been previously reported, is overwhelmingly the biggest effect of these constructs, and it could be argued that the restored localization, rather than degradation, of CNGbeta1 is merely a downstream consequence of the formation of these structures, with perhaps, an element of stabilization of CNGbeta1 toward degradation from direct binding to Prph2, which has also been previously reported.

      RESPONSE: We agree with the reviewer. Our interpretation of our data is that the presence of Prph2 (or its variants) at the distal end of the cilia localizes CNGB1, likely due to the formation of outer segment membrane structures. Previous to this work, there was a possibility that targeting information of Prph2 was required for CNGB1. That had never been explored. We definitively rule this possibility out when we express the C-terminal tail of Prph2, which is unable to rescue CNGB1 localization. Because the tetraspanin domain of Prph2 (or ROM1) can localize CNGB1, we do agree that the definition of an outer segment structure is the driving force for CNGB1 delivery – these are new findings. We’ve restructured and added additional discussion to the manuscript to clarify this point.

      The next suggested conclusion is, "Therefore, in the absence of a fully elaborated outer segment, peripherin-2's ability to delineate a disc is sufficient to drive CNG channel delivery," is partly accurate and partly misleading. If the word "localization" were to replace the term, "delivery," concerning which there are no data (aside from those confirming that Prph2 and CNGbeta1 pass through distinct secretory pathways), this statement would be an accurate summary.

      RESPONSE: We have updated to “localization”, but the fact that we confirm these two proteins do not traffic together through the Golgi would suggest that delivery is independent of trafficking.

      The final sentence, "Together, these data suggest that the partitioning of disc membranes from the plasma membrane by tetraspanin proteins is a key step for trafficking the CNG channel and could play a role in segregating other proteins into the plasma membrane," sentence, would also be accurate if the word "localization," were to replace the term, "trafficking." The key point for these qualifications is that the experiments presented measure steady state levels of CNGbeta1 constructs at certain locations, which are determined not only by rates of trafficking, but also rates of synthesis and degradation, and the data presented confirm that total levels of CNGbeta1 are greatly diminished in the absence of functional Prph2, rendering any conclusions about the relative roles of trafficking kinetics and degradation kinetics speculative in nature.

      RESPONSE: We agree and have revised.

      Aside from these major conceptual issues, there is one overriding technical question: why are almost all the experiments presented carried out with a highly over-expressed engineered version of CNGb1 with a tag, which is clearly context far from the physiological one, as opposed to examining redistribution of the endogenous CNGbeta1, which is of much greater interest. In some results relegated to a Supplemental figure (Supp. Fig. 2), the authors clearly demonstrate that sufficient signal can be obtained from immunofluorescence staining the endogenous proteins for such experiments to be readily interpretable. If the concern was cross-reactivity with non-covalently attached GARP proteins, a few experiments showing that similar results are obtained for immunostaining of the endogenous protein or of the tagged construct would haver been sufficient, and the paper could have had more physiological relevance and impact.

      RESPONSE: We agree that endogenous CNG staining is important and valuable, which is why we included it in our manuscript. We were able to confirm that overexpressed CNG recapitulated the endogenous staining. We proceeded with analyzing overexpressed, tagged CNG for the reasons stated by the reviewer. Yes, cross-reactivity with soluble GARP proteins was one consideration, as was the fact that the GARP antibody is a mouse monoclonal antibody. Increased IgG due to inflammation in the RDS-/- model can obscure the outer segment region in these retinas, confounding our quantification. The tagged versions of CNGB1 and corresponding quantification offered the most clarity and continuity for the reader; therefore, we relegate the endogenous staining to the supplement.

      The remaining concerns are generally of less significance and mostly conceptual or quite minor technical concerns. Technically, the imaging data and their quantification are of good quality and analyzed with reasonable rigor.

      RESPONSE: Thanks!

      Abstract: "In this study, we investigate how peripherin-2 is engaged in CNG channel delivery to the outer segment. Might this not be more a question of how the absence of properly formed discs impacts the formation of outer segments with plasma membranes surrounding the disks? Is this really a question of "delivery" or "lack of address to make the delivery"?

      RESPONSE: Our interpretation of this comment is that it boils down to semantics. Delivery is inclusive of both trafficking and localization, which we investigate in our manuscript.

      Page 3, "fluorescence complementation between peripherin-2 and CNGb1 in the inner segment of transgenic Xenopus rods (23) ". The wording is unclear. It should be stated clearly that they are describing results of "bimolecular fluorescence complementation assays" of highly overexpressed recombinant proteins expressed from transgenes.

      RESPONSE: We have revised.

      Page 4, "...trapped in the biosynthetic pathway," It is unclear what the authors mean by this phrase. Obviously, "biosynthesis," i.e., translation is indeed complete, but biochemical pathways are not places. Is the intention to suggest that post-translational processing, such as addition and editing of carbohydrate chains or assembly with the alpha subunit has not been completed? If so, it would be better just to say so clearly. Or, is it meant to imply that it is physically "trapped" in the ER and/or Golgi apparatus? In any case the meaning should be made clear. Co-staining with ER and Golgi markers would have been very informative with respect to the compartments in which the highly overexpressed recombinant protein is trapped.

      RESPONSE: We acknowledge that our phrasing here was indirect. We have revised. Co-staining with Calnexin (an ER-marker) was attempted, but proved to be uninformative.

      It should also be noted that accumulation of highly overexpressed membrane proteins within internal membranes and membrane aggregates is a very commonly observed experimental phenomenon, and not restricted to the highly specialized trafficking routes in photoreceptors.

      RESPONSE: We agree that exogenous expression of membrane proteins can lead to increased presence within internal membranes of the inner segment, which we routinely see in our experiments. Importantly, our analysis is restricted to the ability of these exogenously expressed proteins to reach the ciliary compartment in Rds mice. We also conduct these experiments in wild-type retinas to ensure that our constructs are expressed, and the proteins reach the ciliary outer segment under normal conditions.

      Page 4, " peripherin-2 facilitates trafficking of the CNGb1 subunit to the outer segment " The data presented to this point do not demonstrate an enhancement of transport, but only of steady-state levels. There is nothing to rule out the possibility that some beta subunit is trafficked in Rds-/-, but is unstable to degradation in the region near the cilium when peripherin-2 and outer segments are not available. An increase in transport is certainly a possible explanation for the results, but should not be taken as an unambiguous conclusion.

      RESPONSE: We have altered the description of these results to allow for more interpretation of our data, which show that CNGB1 delivery to the outer segment is reduced in Rds-/- mice and enhanced when peripherin-2 is re-expressed.

      Page 4, " We confirmed that the fraction of peripherin-2 that traffics conventionally through the Golgi is indeed absent in Rom1-/- retinas and found that trafficking of the CNG channel via the conventional pathway is unaffected (Figure 3A) . This is one of the stronger and more interesting results in this manuscript, and tilts the argument against trafficking as being the mechanism for enhancement by overexpressed peripherin-2 of beta subunit levels in the distal region of the photoreceptor layer.

      RESPONSE: We agree.

      Page 5, " Our finding that secretory trafficking of peripherin-2 and CNGb1 is distinct . Clumsy syntax- needs to be rewritten for clarity.

      RESPONSE: Revised

      Page 5, "two previously characterized fusion proteins... have been shown to localize to the outer segment and build a rudimentary membrane structure (19) " This previous result, which is critical to interpretation of the results in this manuscript, should be introduced early, before any experimental results using related constructs are presented, in order to avoid confusion.

      RESPONSE: Prior to these experiments, we used only full-length peripherin-2, rhodopsin, or CNGB1. This paragraph is the first introduction of any chimeric protein, and we explain these two constructs thoroughly. We believe this satisfies this reviewer’s request.

      Page 5, " We confirmed these data by staining for endogenous CNGb1 in Rds-/- rods electroporated with each construct (Supplemental Figure 2B,C) " This is the most informative result in this manuscript with regard to the ability of these constructs to restore proper localization of CNGB1- it is not clear that the overexpression constructs for CNGB1 present any advantage beyond stronger signal and they may not be assumed, a priori, to be faithfully reporting on interactions of Prph2 with endogenous CNGB1, which is the biologically significant question. A big problem with Supp. Fig. 2 is that there is no real control, i.e., one without any Prph2 construct electroporated. Even the Rho-Prph2CT construct has some ROS-related structures and some CNGB1 localized to the one shown at higher magnification. The Prph2-RhoCT construct seems to lead to a substantial increase in endogenous CNGB1 in inner segment membranes. This looks like a phenomenon that is potentially very interesting, although it doesn't fit with any of the models put forth in the manuscript.

      RESPONSE: We agree that endogenous staining (shown in Supplemental Figure 3 of our revised manuscript) is informative, but it was technically challenging. Once we verified that our overexpression system recapitulated results for endogenous CNGB1, we went forward with the epitope-tagged CNGB1, which was clearer when quantifying CNGB1 localization to rudimentary outer segments.

      Our electroporation method provides an excellent internal control, as all of the non-electroporated cells show no endogenous CNGB1 localization without peripherin expression (Sup Fig 3A).

      Page 5, " cytosolic N- and C-termini of peripherin-2 are dispensable for CNGb1 outer segment localization " No- if you could simply remove them and get proper localization, that would show they are "dispensable." In these experiments they are always replaced with the corresponding region of some other protein that is localized to OS, or in one case, with 3 copies of the FLAG tag at the N-terminus. There are also clear differences in the efficacy of the different "successful" constructs, but these results and their implications are not really discussed.

      RESPONSE: We make this statement in the context of these termini being dispensable to CNGB1 localization, not to peripherin-2’s stability, function, or localization. A complete truncation of either domain results in a non-functioning protein. Our supplemental data shows reduced expression with a truncated N-terminus, preventing analysis (Sup Fig 5C). The 3X-FLAG has no known function in the cell, and we believe it serves as a proxy for removing the N-terminus altogether. Removing the C-terminus would prevent proper outer segment targeting, which is key to determining how peripherin-2 impacts CNGB1 ciliary delivery. Replacing this C-terminus with an outer segment targeting domain from another protein is an established method of investigation.

      Page 6, " We then wanted to determine whether the ROM1 tetraspanin region was sufficient to facilitate CNGb1 delivery by further replacing ROM1's cytoplasmic N-terminus with that of peripherin-2 (Prph2NT/CT-ROM1) . " This experiment obviously does NOT test "sufficiency" of the TM segments, as the construct has the termini replaced with the corresponding regions of Prph2, which might functionally substitute for the missing ROM1 regions.

      RESPONSE: Our previous results had already ruled out a role for these termini in CNGB1 localization.

      Page 6, " We show a dramatic increase in GARP staining in the aged Rds-/- retinal sections " The age dependence of this phenomenon is quite interesting and puzzling. Any thoughts on the mechanism?

      RESPONSE: We agree that this natural process is very interesting. We have restructured the order of our figures and provided additional controls to support this finding. We have added this to the discussion and hope that by restructuring our manuscript, we clearly outline that we do think that membrane retention at the tip of the cilia is driving CNG channel localization and that molecularly the tetraspanin proteins play a role in organizing these membranes.

      Page 6, " Although CNGα1, known to form homotetramers, can localize to the extracellular vesicles released into the outer segment area. " Not a sentence.

      RESPONSE: Revised

      Page 6, " Our data now shows that the population of peripherin-2 in complex with ROM1 that travels through the conventional trafficking pathway does not play a role in CNGb1 localization to the outer segment. " This is an oddly accurate, albeit somewhat contradictory sentence. Yes, you have failed to answer the question you claim this work was designed to address. Apart from this negative result, nothing is learned about trafficking, per se, from the experiments in this manuscript.

      RESPONSE: Please see our response to the reviewer’s comment above that clarifies our thinking regarding our results on trafficking.

      Page 7, " anticipated " Hopefully, the authors mean to say, "hypothesized," here.

      RESPONSE: Revised

      **Referee cross-commenting**

      My impression from reading the reviewers' comments is that there is general agreement on both the strengths and the limitations of this work. In my opinion, the issues raised by the reviewers could be addressed by editing the manuscript to be more circumspect in drawing definite conclusions from data that are not fully conclusive, without necessarily adding new experiments.

      Reviewer #3 (Significance (Required)):

      This study addresses a problem of great interest in the photoreceptor field and in cell biology more generally of trafficking and localization of specialized membrane proteins to specialized ciliary membranes. The strengths are technical quality of data with good controls, in most cases. The limitations are largely conceptual in nature and derive from the rather simplistic approach to the experimental design, as described above. The rather dated, "mix and match" approach based on chimeric construct with pieces of sequences removed and replaced at will does not properly account for the conclusion reached many times from many experiments, including some this manuscript, that the "roles" of stretches of amino acid sequence depend exquisitely on the multidimensional context in which they are tested, not simply on their position in the linear sequence. The paper presents interesting and convincing results with respect to functional requirements for formation disc-like membranes, but very little with respect to 'trafficking."

    1. AbstractIt has been empirically established that genome mixing between divergent species can trigger meiotic aberrations, ultimately leading to the emergence of asexual reproduction through the production of unreduced gametes in various metazoan lineages. Yet, it remains poorly understood how such asexual hybrids cope with co-inherited differences in sex determination systems, diverged regulatory networks, and chromosomal incompatibilities— especially in the context of increased ploidy. Addressing these questions requires high-quality, chromosome-level reference genomes of the parental species involved in hybrid formation.Here, we present the first chromosome-level genome assemblies for three hybridizing Cobitis species (C. elongatoides, C. taenia, and C. tanaitica), providing a comprehensive framework to investigate the genetic and cytogenetic basis of hybrid sterility and the transition to asexuality. By integrating genome scaffolding, male/female pooled sequencing, and molecular cytogenetics, we uncover extensive structural variation among homologous chromosomes of the three species, despite their overall syntenic conservation.Population-level Pool-Seq analyses further revealed that each species possesses a distinct, non-homologous sex chromosome, highlighting sex chromosome turnover even among recently diverged lineages. These assemblies enabled the design of chromosome-specific painting probes, which we applied to meiotic metaphase I spreads of diploid hybrids. This approach revealed striking differences in the pairing success of orthologous chromosomes, with some (e.g., Ch01B) frequently forming bivalents, while others (e.g., Ch01A, Ch05, Ch20) failed to do so and remained unpaired.Our results demonstrate that chromosome-specific features, shaped by structural evolution and sex-linked divergence, contribute unequally to hybrid meiotic failure. Together, this work provides a high-resolution genomic and cytogenetic framework to understand how interspecific hybridization gives rise to clonality, and how the architecture of inherited parental genomes shapes the success or breakdown of meiosis in hybrid vertebrates.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag031), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1:

      The authors assembled the genomes of three Cobitis species native to Eurasia in an attempt to investigate the effects of structural variants on hybrid meiotic failure. This is certainly an interesting topic given the advances in our abilities to study hybridization that have been enabled by modern genomic sequencing methods, and the evolutionary consequences of asexually-reproducing species that result from rare instances of these hybrid events.

      Major comments: The introduction of the manuscript is well-written and focused on the topic at hand. Language was mostly clear throughout the manuscript. However, the paper overall is very lengthy and would benefit from extensive revision. Personally, I think the assembly and annotation of the three genomes is worthy of being a paper (genome report) on its own. Extraction of this material into a separate manuscript would allow the authors to hone the remainder of the paper into a much more concise and focused manuscript. Some aspects of the methods section related to genome assembly and annotation could be clarified and/or bolstered. Presentation of methods is mostly clear, but the description of genome annotation methods is a bit tough to follow. This procedure included many complicated steps and may benefit from a flow chart, even if included only as a supplemental figure.

      Several important quality control steps pertaining to genome assembly and DNA/RNA sequence processing were not mentioned. Authors do not report methods used for quality filtering or trimming. They do not report any process for removal of sequencing adapters. Additionally, they do not report screening of the genome assemblies for contamination from other species. These are critical steps in producing high-quality genome assemblies that need to be addressed.

      Presentation of statistics describing genome assembly quality, contiguity, and completeness could be improved. Authors might want to take some inspiration from statistics required for reporting in genome reports published by other journals, such as G3 or Genome Biology and Evolution. Sequencing depth is not reported in any context for the initial assemblies. Only log-transformed values are available in a single figure. Throughout the manuscript, authors conflate sequencing coverage (the proportion of a genome or genomic region that has been sequenced) with sequencing depth (the number of times a base or genomic region has been sequenced).

      For the sex-linked primers designed by the authors - I would recommend development of an internal positive control that would be expected to amplify in both sexes and be easily distinguishable from the sex-linked locus by size or fluorescent label. This allows the users to distinguish between failed PCRs and identification of the homogametic sex. This is especially important because the fish selected for marker development were collected from a relatively small portion of the species' distributions (Figure 1) so there could be population-specific differences that affect reliability of these markers for identifying sex. This is a problem I regularly encounter in my own work for wide-ranging species.

      I was also surprised that the authors did not conduct a GWAS analysis. That seems to be a fairly typical analysis included in studies of this type to elucidate sex-linked SNPs. It would add to an already extensive manuscript; however, this could add an additional argument for splitting this manuscript in two. It would provide more space to include it in a more focused manuscript.

      The results section contains many statements that would be more appropriate in the Methods section, or could be deleted entirely because they are redundant with statements already present in the Methods section. Additionally, there are some sentences that are more appropriate for inclusion in the Discussion section because they are interpretive. I have included examples under the 'Minor comments' section of this review. Some of the material presented as results in the Supplementary tables is presented in a confusing manner, and appears to contain errors (see examples in 'Minor comments' section below).

      The first several paragraphs of the Discussion section either repeat material already covered in the Results section, or go on tangents that are not directly related to the main purpose of the paper. However, some of it could be more appropriate to include in a genome report if the authors split the manuscript in two.

      Given the above issues, I find that the paper needs extensive editing and possibly more analytical work (if some of the methodological deficiencies were overlooked in the analysis phase as well as the writing phase of this project). It is unlikely this work could be accomplished in the normal window for a revision. Therefore, I regrettably suggest rejection of the manuscript.

      Finally, I have no meaningful experience with FISH probes or chromosomal painting so unfortunately, I can't provide much comment on those portions of the paper.

      Minor comments: Line 291: please provide specific version number for Hisat2 Line 319: version numbers for D-Genies and SyRI missing Line 331: version number for NGenomeSyn missing Line 439-440: Authors provide N50 values, but the paper would benefit from providing some additional metrics, such as N90 and L90, to help readers gauge the contiguity of these genomes. Line 442 - 443: I'm having a hard time understanding how the authors are calling these 'chromosome-level' assemblies when nearly a third (>30%) of the genome of two species (C. tanaitica and C. elongatoides) could not be assembled into chromosomal scaffolds. Line 457 - 458: Either the term 'topologically associated domains' is missing, or the authors need to remove the parentheses from around TADs if it was defined earlier in the manuscript. Line 470: change 'less' to 'fewer' Line 483 - 486: The statements that observed patterns of repeat families 'suggest' something are interpretive and should be moved to the discussion. Line 499 - 500: This sentence repeats content of the methods section. I suggest deleting it. Line 540 - 564: If I am understanding correctly, the discussion of 'coverage' here would be more accurately described as 'depth' since the authors seem to be talking about average sequencing depth in different areas of the genome. Furthermore, authors never provide untransformed measures of sequencing depth in any context (the initial genome assemblies, pool-seq data, re-sequenced individuals, etc.). Therefore, it is difficult to determine if the differences being discussed here are derived from data with enough statistical power to measure differences in sequencing depth between male and female fish. Lines 614 - 619: This could be explored with GWAS Lines 635 - 641: Much of this paragraph is a description of methods and belongs in the Methods section. Lines 664 - 667: Much of this is interpretive - more appropriate for the discussion. Lines 700 - 711: This paragraph has little or no relevance to the main topic of this paper (hybrid meiotic failure). Line 745: remove "loci's" Line 813 - 815: PMER was already defined earlier in the paper. Line 854: I suggest removal of "the first of their kind in an asexually reproducing vertebrate," because such statements rarely age well, and the concept behind the paper is interesting enough to stand on its own without pointing out the novelty of it being the 'first' time it was detected. References section: Capitalization of article titles varies from one reference to the next. Scientific names are sometimes italicized; other times they are not. Table 2: 'L50' and 'Number of Chromosomes' are always going to be integers. Why are there two significant digits to the right of the decimal point? Supplementary Figure S2: 'Cobitis' should be italicized. Supplementary Table S7: This table presents pre- and post-HiC values in a confusing manner that is nonsensical and probably erroneous. For example, the N50 values seem problematic. How do you have a 154 Kbp pre-HiC N50 contig value for C. elongatoides, but a 154 Mbp post-HiC N50 contig value for the same species? This is longer than the longest reported chromosome for any species (C. taenia) in Supplementary Table S8 (99 Mbp). Supplementary Table S10: I don't know what the percentages in line 33 refer to?

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (i) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (ii) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping, across the whole genome, to ensure full understanding and clarity.

      In the revised manuscript, the authors have improved the presentation and analysis of their data, expanding the description of SNS-seq mapping across the genome, and more clearly assessing to what extent there is correlation between SNS-seq signal and previous mapping approaches to predict origins (by MFA-seq and ChiP-chip of ORC1/CDC6). With regard the correlation between SNS-seq and ORC/1CDC6 ChIP-chip, it should be noted that two datasets were generated in distinct strains of T. brucei (Lister 427 and TREU927, respectively), and it is unclear if the latter dataset can be accurately mapped to the strain used here. Notwithstanding this concern, these improvements clarify a number of aspects of the SNS-seq mapping: (1) the signal is more prevalent in the transcribed core of the genome than in the largely transcriptionally silent subtelomeres; and (2) whereas previous work revealed strong correlation between ORC1/CDC6 localisation and MFA-seq peaks at the ends of multigene transcription units, neither of these data show significant overlap with SNS-seq signal, which is not seen at transcription start or stop sites ('SSRs'; supplementary Fig.8D) and shows marked depletion at predicted ORC1/CDC6 sites (supplementary Fig.8C). To the authors' credit, they acknowledge this lack of correlation in the discussion.

      The authors have not provided any new data to substantiate their assertion that SNS-seq accurately detects origins in T. brucei, and therefore the work rests on a single experimental approach, without validation. As a result, the suggestion of abundant, previously undetected origins in the intergenic regions of multigene transcription remains a prediction. One key untested limitation of the work lies in the observation that the very large majority of SNS-seq signal overlaps with previously RNA-DNA hybrids; without an experimental test, the suggestion that the authors have 'disclosed for the first time a strong link between RNANA hybrid formation and DNA replication initiation' remains conjecture.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of origins of replications. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Between the initial submission and this revision, the raised major concerns have not been resolved, and no additional validation has been provided.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript is concluded with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) There are substantial discrepancies between the origins identified here and those reported in previous studies. Given that the other studies precede this manuscript, it is the authors' duty to investigate these differences. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      We agree that orthogonally validation of origins detected by stranded SNS-seq is necessary and we are working on it.

      (2) I am concerned that up to 96% percent of all SNS-seq peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Upon request, the authors have performed a control, where randomly placed peaks were run through the same filtering process. Only approximately twice as many experimental peaks passed filtering compared to random peaks. While the authors emphasize reproducibility between replicates, technical artifacts from the protocol would also be reproducible. Moreover, in other SNS-seq studies, for example, Pratto et al. Cell 2021, Fig. 1B, + and − strand peaks always appear closely paired. This pattern contrasts strongly with Fig. 2A in this manuscript.

      The size and overlap of peaks depend on the length of the SNS. In our study, the width of the peaks corresponds to the size of the short nascent strands (0.5–2.5 kb) chosen as the starting material, whereas the width of the peaks in Pratto et al., Cell, 2021 are much larger (few kb). This could be due to the longer SNS used in the Pratto et al. study. Consequently, the overlap of the longer SNS is more pronounced since the SNS fibres elongate in both directions: at the 3′ end by DNA polymerase and at the 5′ end by ligation of Okazaki fragments. Additionally, the genomic regions displayed in our Figure 2A and in Pratto et al, Figure 1B are presented at substantially different resolutions, with a roughly ten‑fold difference in scale.

      Further, I have some minor concerns that do not affect the main conclusions of the manuscript:

      - Fig 2C: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions. This data would be better presented with all regions stretched to the same size. This has not been addressed in the revision.

      As the reviewer suggested, we have produced scaled plots of the stranded SNS-seq origins over genic and intergenic regions (see Figure 3, which is attached along with the Reviewer #2 (Recommendations for the authors)). However, we would prefer to keep the unscaled versions in the manuscript and add a note in the text as part of the Version of Record, explaining that the origins are evenly distributed throughout intergenic regions rather than being centred within them.

      - Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length. This has not been addressed in the revision.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding potentially wider origins.

      We'll modify the text as part of the Version of Record.

      Are claims well substantiated?:

      The identification of origins via SNS-seq appears to be incompletely supported to me.<br /> All downstream analyses depend on the reliability of origin identification.<br /> Impact:

      This study has the potential to be valuable for two fields: In research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Further, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eucaryotic model organism.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      - Two other attempts to identify origins in T. brucei - ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154) - were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      - We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      - MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Four broad issues need to be addressed.

      (1) The authors have attempted to test the overlap between ORC1/CDC6 (an ORC subunit) binding in the genome and SNS-seq. If there were an overlap, this would provide evidence that the SNS-seq signals represent origins. However, the analysis provided is inadequate: merely a statement that "we obtained an overlap of 4.2% between origins and ORC1/CDC6 binding sites within a window of {plus minus}2 kb and 6.2% in the window of {plus minus}3 kb". Nowhere are these data shown or properly discussed:

      a) The authors need to provide a diagram showing where in the genome the very small amount of overlapping SNS-seq and ORC1/CDC6 binding occurs, and to clearly show and state how many of the intergenic SNS-seq peaks are sites of ORC1/CDC6 binding. In the absence of such analysis, a key question is unanswered: is there any evidence of ORC1/CDC6 (or ORC more broadly) binding at the SNS-seq signals within the polycistronic transcription units?

      In the original version of the manuscript, these data were already presented as percentages in the text and as a metaplot (Supplementary Fig. 8C).

      We based our analysis on the set of 350 TbORC1/CDC6 binding sites available on TriTrypDB at the time of analysis. This dataset was a filtered subset of the originally reported TbORC1/CDC6 ChIP‑on‑chip peaks (personal communication, TriTrypDB). Since then, the unfiltered dataset has been made available. We therefore re‑analyzed the overlap using this dataset, to which we applied a filtering that yielded 990 binding sites closely matching the 953 sites reported by the McCulloch group. We need to stress here that the original 953 sites reported by the McCulloch group (Tiengwe et al., 2012 PMID: 22840408), is not available anymore and that the authors:

      - do not provide genomic coordinates for the 953 binding sites and

      - do not release any scripts or methodology that would allow independent reproduction of the 953 sites.

      A similar remark also applies to the MFA-seq data (see below).

      To address the reviewer’s request, we have now:

      (1) Recalculated the overlap using the updated TbORC1/CDC6 dataset (990 binding sites) from TriTrypDB.

      (2) Added the absolute number of overlapping SNS‑seq origins and TbORC1/CDC6 binding sites in the Results section for clarity.

      (3) Included the TbORC1/CDC6 binding sites in the chromosomal overview (newly added to Supplementary Fig. 8A), so that their genomic localization relative to SNS‑seq peaks is visually accessible.

      (4) Revised the metaplots of TbORC1/CDC6 distribution around SNS‑seq origins using the updated dataset (Supplementary Fig. 8C).

      With these improvements, we now find that:

      - Within ±2 kb, 12.9% (253) of SNS‑seq origins overlap with 25.6% of TbORC1/CDC6 binding sites.

      - Within ±3 kb, 18.8% (370) of SNS‑seq origins overlap with 37.4% of TbORC1/CDC6 binding sites.

      The updated metaplot shows a clear depletion of TbORC1/CDC6 signal at the origin center, with modest enrichment ~5 kb upstream and downstream. The underlying reason for this pattern remains unknown, and we agree that additional studies will be needed to understand it.

      b) Equally, the authors need to explain what they conclude from this analysis. They make a comparison with T. cruzi ORC1/CDC6 and SNS-seq overlap, which does not illuminate what the data tell us. For instance, if there is no or minimal overlap between ORC1/CDC6 binding and SNS-seq peaks within the polycistronic transcription units, do they conclude that the major SNS-seq signal they detail is evidence for ORC-independent DNA replication? If there is no overlap, what further evidence can they provide that these signals truly are origins?

      First, we would like to clarify that, to date, there is no evidence supporting ORC‑independent DNA replication in T. brucei, and—importantly—no published data demonstrating that TbORC1/CDC6 is universally required for DNA replication initiation. Because of this, we consider that it would be inappropriate to conclude that regions lacking detectable TbORC1/CDC6 signal undergo ORC‑independent initiation. We would prefer not to speculate in the absence of supporting evidence and would gratefully consider any reference the reviewer wishes to provide on this subject.

      Second, the low overlap between TbORC1/CDC6 binding sites and SNS‑seq origins does not, in our view, invalidate our mapping of replication initiation sites. Multiple factors contribute to this:

      (1) Low overlap between ORC1/CDC6 and origin‑mapping techniques has been repeatedly reported across kinetoplastids. For instance, in T. cruzi, 88.2% of origins detected by DNAscent nanopore sequencing showed no overlap with TcORC1/CDC6–Ty1 ChIP signal within ±3 kb, and only 11.7% co‑localized. This is strikingly similar to our observations in T. brucei. Thus, our data are consistent with the broader pattern in trypanosomatids rather than an exception.

      (2) The origin topology detected by stranded SNS‑seq is supported by several genomic characteristic found frequently in other eukaryotes, including:

      - A highly specific and polarized poly(dA)/poly(dT) sequence environment.

      - Strand‑specific G4 structures positioned around origin centers.

      - A conserved nucleosome‑depleted region flanked by well‑positioned nucleosomes.

      These features are absent from shuffled controls, appear at high significance, and recapitulate hallmark signatures of replication origins in other eukaryotes.

      Together, these findings give us confidence that the SNS‑seq peaks represent genuine origins - despite the incomplete overlap with TbORC1/CDC6 binding.

      Third, we fully agree with the reviewer that a definitive conclusion would require an additional, independent validation method.

      Given the lack of complete ORC subunit datasets and the unusual biology of trypanosomatid replication complexes, we believe that the cautious interpretation above is the most appropriate.

      c) The authors state (Discussion): "Validation of origins is generally a difficult task, particularly in trypanosomatids, where proteins involved in the initiation of DNA replication are difficult to determine. Few proteins have been described as potential ORC subunits (reviewed in 61), and none of them have been shown to be a specific marker that indicates the origins." There are two problems with the statement. First, most of the subunits of ORC have now been described in T. brucei; the authors should make this clear. Second, mapping of ORC1/CDC6 localisation, contrary to what the authors state here, shows precise correlation with the peaks of every MFA-seq signal described (see Tiengwe et al, Cell Reports, 2012); thus, ORC1/CDC6 binding provides evidence that MFA-seq is detecting origins, something that cannot be said for SNS-seq. The authors need to correct this misleading paragraph.

      As suggested, we have removed the paragraph from the Discussion to avoid confusion. However, we disagree with the reviewer's assessment and clarify below our position regarding the issues raised.

      First, we agree that five candidate ORC subunits have now been identified in T. brucei. Our intention was not to suggest the contrary, but rather to emphasize that, although candidate ORC components have been described, direct functional evidence for their roles in replication initiation is still limited. For this reason, we were cautious in referring to any ORC component as a definitive marker of replication origins.

      Second, regarding the reviewer’s statement that TbORC1/CDC6 binding “shows precise correlation with the peaks of every MFA‑seq signal”, we respectfully disagree based on several observations:

      (1) MFA‑seq does not identify individual origin centers, but rather broad replicated regions that often span hundreds of kilobases. By design, this method cannot define the number or position of discrete origins within each peak. For that reason, MFA-seq regions do not have the resolution required to validate TbORC1/CDC6 binding sites as individual origins.

      (2) In the published datasets (Tiengwe et al., Devlin et al.), no metaplots or locus‑wide quantification of the overlap between MFA‑seq peaks and TbORC1/CDC6 binding were provided. The coordinates or the approach used to define the discrete regions that they define as the originsin the MFA‑seq broad peaks have never been described or made available, making it difficult to evaluate the claimed correspondence.

      (3) Notably, McCulloch’s group later reported that only 4.4% of the 953 TbORC1/CDC6 sites overlapped with their 42 MFA‑seq “origins”, underscoring that the degree of correspondence is in fact limited (PMID: 29491738).

      (4) Finally, as noted in our response to point (1b), low overlap between ORC1/CDC6 binding sites and origin‑mapping techniques is a consistent observation across kinetoplastids, including T. cruzi, where DNAscent‑mapped origins show only ~12% overlap with TcORC1/CDC6 ChIP signals. This suggests that the limited overlap we observe is not unique to our dataset.

      For these reasons, we are not convinced that the TbORC1/CDC6 binding sites have been shown to align precisely with MFA seq peaks, nor that these datasets definitively validate origin mapping in T. brucei. Nevertheless, to avoid over‑interpretation and potential confusion, we have removed the paragraph from the Discussion as requested. We hope this clarifies our position and improves the accuracy and neutrality of the manuscript.

      (2) Like for ORC1/CDC6 localisation, the authors' evaluation of the relationship between MFA-seq and SNS-seq mapping is inadequate, and the depth of the analysis and discussion needs to be improved:

      a) The authors state: "We found 28-42% stranded SNS-seq origins overlapped with early and 43-55% overlapped with late S-phase MFA-seq replicated regions (Supplementary Figure 8B)." This seems important and provides (limited) validation of both datasets, but cannot be discerned from the supplied figure. Please provide a metaplot of the two datasets centred on the MFA-seq loci, including the SNS-seq peak amplitude.

      We would like to emphasize that MFA‑seq is not a method designed to map individual origins, and this fundamentally limits the interpretability of metaplots centered on MFA-seq regions. MFA‑seq identifies broad replication‑enriched domains, typically spanning 100–500 kb, within which multiple origins may fire asynchronously across the cell population.

      This concern is reinforced by the original MFA‑seq publications (Tiengwe et al., 2012; Devlin et al., 2016), which:

      - do not provide positional data for the 42-47 MFA‑inferred origins,

      - do not describe the computational method used to derive individual origin coordinates from the broad peaks, and

      - do not release any scripts or methodology that would allow independent reproduction of the claimed origin positions.

      Because of this, it is not possible to reconstruct or validate how the 42 MFA‑seq “origin” sites were defined, nor to use those coordinates as anchors for metaplot analyses.

      Most importantly, we disagree with the underlying assumption that each MFA‑seq peak corresponds to exactly one origin. This assumption runs counter to the principle of the technique, which identifies regions of higher DNA content in replicating cells than in non-replicating cells; it is also contradicted by our stranded SNS‑seq data and by DNA combing measurements:

      - SNS‑seq detects multiple discrete origins within the same genomic regions that produce a single broad MFA‑seq peak.

      - DNA combing reveals inter‑origin distances of ~36–422 kb (median ~150 kb) (PMID: 26976742), which is far shorter than the ~400–600 kb replication domains identified by MFA‑seq.

      - Furthermore, with only 42 origins detected by MFA-seq, it is not possible to achieve complete genome replication in T. brucei during S-phase. DNA combing has found that the average speed of replication forks in the procyclic forms is 1.9 Kb/min. (PMID: 26976742). Dividing the size of the Trypanosoma brucei brucei TREU927 genome (26.1 Mb) by 42 origins (PMID: 22840408) shows that 621 Kb must be replicated during the S phase. Using the calculated average replication speed of 1.9 Kb/min, we can estimate that the replication of 621 Kb would take 327 min (5.45 hours) (621 Kb/1.9 Kb/min = 327 min). However, this exceeds the estimated length of the S-phase in these parasites, which is 2.31 hours (138.6 minutes) (PMID: 32397111, 31811174, 28258618) or less, 1.36 hours (PMID: 2190996, 10574712) in Trypanosoma brucei procyclic forms. Therefore, more than 42 origins are necessary to complete replication during the short S phase.

      This makes it unlikely that MFA-seq regions represent single functional origins. For these reasons, a metaplot centered on MFA‑seq “loci” may lead to misinterpretations and would not provide biologically meaningful information.

      We hope that the expanded explanation clarifies our interpretation of the relationship between these two complementary, but fundamentally different, methods.

      b) The authors state that "Our results showed that the origins are predominantly located in the intergenic regions within the PTUs (Figure 2C)'. This finding cannot be discerned from this figure, which does not show 'strand switch regions' (SSRs; transcription start/stop sites), where MFA-seq predicts all origins to localise. The authors need to acknowledge this difference and must show a comparison of SNS-seq data, including peak amplitude, around all SSRs (whether predicted by MFA-seq to act as origins or not, since all appear to bind ORC1/CDC6).

      We have now provided the metaplots showing the overlap between stranded SNS-seq origins and SSRs (see Supplementary Figure 8D). This difference has been acknowledged and discussed in the revised manuscript.

      c) Finally, the authors' interpretation that around 30-55% of SNS-seq peaks overlap with MFA-seq 'origins' is highly questionable. MFA-seq peaks are regions of increased DNA content in replicating cells relative to non-replicating cells, and so the entire region under the MFA-seq peak is not necessarily an origin, but is likely to be a more discrete locus (eg, the SSR, where ORC1/CDC6 mainly localises). They should correct the wording and discuss what significance they see in this overlap; for instance, do they think SNS-seq 'clusters' are more pronounced within the MFA-seq peaks and, if so, what might this mean, and why does it not correlate with ORC1/CDC6 localisation?

      As the reviewer notes, ‘MFA‑seq peaks are regions of increased DNA content, and so the entire region under the MFA-seq peak is not necessarily an origin but is likely to be a more discrete locus’. This is exactly why MFA‑seq is inappropriate for identifying discrete/individual origins: within these replicated domains, multiple origins can fire, as revealed both by stranded SNS‑seq mapping.

      Regarding the overlap between SNS‑seq origins and MFA‑seq peaks, we agree with the reviewer that this overlap should not be interpreted as validating MFA‑seq “origin positions.” Instead, we now describe it more accurately as the proportion of discrete SNS‑seq origins that fall within broader MFA‑seq replication domains. This is expected, because SNS‑seq identifies individual initiation events, whereas MFA‑seq identifies S‑phase replication domains averaged across a population. Our stranded SNS‑seq data do not show enhanced origin accumulation within MFA-seq regions, and we find no correlation with TbORC1/CDC6 positions. This is now discussed.

      Regarding SSRs, we do not share the view that they should be considered privileged initiation sites. After remapping the TbORC1/CDC6 ChIP‑on‑chip dataset (see above) to the T. brucei Lister 427–2018 genome (Supplementary Fig. 8A), we observed that TbORC1/CDC6 binding is distributed throughout the chromosomes, not restricted to SSRs. To quantify this, we analyzed the overlap between TbORC1/CDC6 sites and all annotated SSR classes (dSSRs, cSSRs, and head‑to‑tail regions, as defined in Kim et al. 2009). The results show that:

      Only 10% of TbORC1/CDC6 binding sites fall within 40% of all SSRs.

      At the level of individual SSR types:

      - TTS: 3.3% of TTS overlap with 0.3% of TbORC1/CDC6 sites.

      - TSS: 67% of TSS overlap with 6.1% of TbORC1/CDC6 sites.

      - Head‑to‑tail regions: 54.2% overlap with 3.6% of TbORC1/CDC6 sites.

      These analyses demonstrate that most TbORC1/CDC6 sites are not located at SSRs, contradicting the idea that SSRs represent primary or exclusive origin sites.

      Author response image 1.

      Overlap between TbORC1/CDC6-12Myc binding sites (Tiengwe 2012, Cell Reports) and strand‑switch regions (SSRs). Venn diagram showing the overlap of 990TbORC1/CDC6-12Mycbinding sites (Retrieved from TritrypDB filtered at score 22 to achieve a number of binding sites similar to the one (953 binding sites) published in Tiengwe 2012, Cell Reports) and SSR sites in the genome (Kim 2018, NAR). The intersection shows that 10.3% of Orc1/CDC6 binding sites overlap with 41.8% SSRs. The intersection is subdivided into TSS (orange), TTS in (blue) and HT in (green).

      (3) A key objection to the data presentation is the decision to limit SNS-seq mapping to the intergenic regions. In addition to overlooking the SSRs (see above, 2), so-called subtelomeres, which account for nearly 50% of the T. brucei genome and are largely untranscribed, are not shown or discussed at all. Providing this data will improve clarity and also provide a key test of one of the predictions that the authors make: "most origins are localized in actively transcribed regions, which could lead to collisions between DNA replication and the transcription machinery. This spatial coincidence implies that transcription and replication must occur in a highly ordered and cooperative manner in T. brucei."

      We do not understand why this reviewer concluded that we took 'the decision to limit the mapping of SNS-seq to intergenic regions'. This is a factual error.

      To be clearer,

      (2) We now explicitly present the distribution of SNS‑seq origins across core and subtelomeric regions in the revised Figure 2D, making clear that origin mapping was performed genome‑wide.

      (2) And that SNS‑seq origins are also present in subtelomeric regions. We have revised the manuscript to avoid any implication that origin firing is restricted only to actively transcribed regions. Our data show that most SNS‑seq origins lie within intergenic regions of PTUs, but a minority are found outside these regions—including subtelomeres and SSRs. The revised text reflects this nuance and highlights that the spatial relationship between transcription and replication is strong but not exclusive.

      These additions undoubtedly ensure that the genomic-wide nature of SNS-seq analysis is transparent to the reader and should therefore remove this reviewer's “key objection”.

      a) The authors must show SNS-seq mapping to the subtelomeres (in addition to around the SSRs; see comment (2). If no SNS-seq peaks are detected in the subtelomeres, what do the authors conclude about how the genome is duplicated? If SNS-seq peaks are detected in the subtelomeres, do they correspond with the ordered nucleosomes in this part of the genome described by Maree et al (PMID: 28344657); if so, might SNS-seq signal localisation not be directed by transcription but chromatin?

      We have now presented the proportion of origins in subtelomeric regions (see Figure 2B).

      As illustrated in the metaplots in Author response image 2, the distribution of nucleosomes around the subtelomeric origins is similar to the distribution shown for all origins in the manuscript. We do not see the pattern of nucleosomes as described by Maree et al (PMID: 28344657) over ORC1/CDC6 binding sites in this part of the genome.

      Author response image 2.

      Metaplots showing the mean nuclesome signal over centred SNS-seq origins in subtelomeric regions. Two replicates from Maree et al 2019 (PMID: 28344657).

      We never claimed that transcription directs the localisation of the SNS-seq signal. We did not conduct experiments to address this issue. In contrast, we consider that the organisation of chromatin exerts a significant influence on the selection of active origins.

      (4) The major conclusion of the manuscript is that the SNS-seq signal corresponds very precisely to the locations of RNA-DNA hybrids (R-loops). Given all the limitations discussed above, can the authors rule out the possibility that SNS-seq is merely mapping DNA-DNA hybrids and is not, in fact, detecting origins?

      a) It is legitimate to speculate about the possibility that the very extensive overlap between SNS-seq and DRIP-seq signals within polycistronic transcription units (between ORFs) might suggest that DRIP-seq data detects nascent strands at replication origins, rather than R-loops at sites of pre-mRNA processing, as previously suggested by Briggs et al (PMID: 30304482). (eg, 'we disclosed for the first time a strong link between R-loop formation and DNA replication initiation'; 'The RNA:DNA hybrids are formed at initiation sites by RNA priming of SNS and Okazaki fragments'). However, the authors should acknowledge that alternative explanations for the localisation and potential functions of inter-CDS R-loops have been suggested,

      We do not find extensive overlap between stranded SNS-seq and DRIP-seq signal. We have observed only a minor proportion (1.7%) of the previously reported DRIP-seq signal to overlap with the origins detected by stranded SNS-seq. The RNA-primed SNS must form RNA:DNA hybrids during the initiation of DNA replication, and that an enrichment of these hybrids around the origins is expected. Therefore, we legitimately speculated that this minor proportion of RNA:DNA hybrids enriched around origin centres could be due to the origin activation.

      We agree that some of the DRIP-seq signals detected around the origins may be sites of pre-mRNA processing, as previously suggested by Briggs et al. (PMID: 30304482). Since there is no data proving implication of pre-mRNA processing into DNA replication initiation we prefer not to speculate about it.

      b) More importantly, the authors should provide experimental evidence that tests such a mechanistic prediction of R-loops and origins: for instance, have they attempted to remove R-loops, eg, by treatment with RNase H, and checked that the SNS-seq signal is unaltered? In the absence of such data, they cannot exclude the possibility that their work has revealed an overlooked problem with SNS-seq (which may not be limited to T. brucei; are matched DRIP-seq and SNS-seq datasets available to correlate these signals in a range of organisms?).

      We have not attempted RNase H treatment for a fundamental methodological reason: it seems highly improbable that RNA:DNA hybrids would persist through the multiple denaturation steps inherent to the SNS‑seq enrichment protocol. Published biophysical measurements show that RNA:DNA hybrids melt at ~95 °C (Roberts & Crothers, Science, 1992; PMID: 1279808), which is the temperature repeatedly applied during SNS isolation. Under these conditions, persistent RNA:DNA hybrids cannot remain intact and therefore cannot be responsible for the SNS‑seq peaks detected.

      We do not interpret our findings as revealing an “overlooked problem with SNS‑seq.” Instead, we consider that the enrichment of RNA:DNA hybrids around origins observed in DRIP‑seq is biologically meaningful and expected, given that replication initiation involves RNA‑primed nascent strands and that DRIP‑seq detects such structures.

      Reviewer #2 (Recommendations for the authors):

      I have some minor concerns that do not affect the main conclusions of the manuscript:

      (1) Figure 2B: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions.

      That is correct. The regions displayed in the heatmaps are genic and intergenic region sorted by size. We did not want to convey with this metaplot that the origins are accumulating at the centres of the intergenic region but mainly that genic regions are mostly devoid of origins and the intergenic regions enriched in origins.

      (2) Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding origins that are potentially wider. Nevertheless, the violin plot shows that the majority of origins are shorter than 500 nt. In the end, the size of regions detected as the origin is not important. What gives the resolution of stranded-SNS-seq is the ability to identify the centre of the origin between the minus and plus peaks.

      (3) Data in the manuscript were sometimes not presented in an easy-to-read manner. In some cases, this was due to benign things, such as missing labels for the mean frequency plots (e.g., Figure 2B, blue and green) or very small fonts for axes (Figure 2B). Sometimes, due to the plot types that were chosen, such as pie-charts (Figure 2C, see https://medium.com/analytics-vidhya/dont-use-pie-charts-in-data-analysis-6c005723e657), stacked bar plots (Figure 6B), or showing cumulative distributions (Figure 5C, and Figure 2D) it makes it difficult to judge the actual distribution.

      Wherever possible, the size of the small fonts was increased to the maximum. Missing labels were added to the mean frequency plots. We increased the font size for the axes in the frequency plots.

      However, we found cumulative distributions useful. If you have a more specific proposal for replacing cumulative distributions, we would be very grateful to hear it. We also hope that magnifying the figures in TIFF format with a higher resolution will improve visibility.

      (4) Figure 2B: This data would be better presented with all regions stretched to the same size (the reason is explained in the public review).

      We performed the scaled plots for the stranded SNS-seq origins over the genic and intergenic regions as the reviewer suggested (see Author response image 3), but we prefer to keep the unscaled versions in the manuscript.

      Author response image 3.

      Distribution of mapped origins in scaled genic and intergenic regions. Scaled heatmaps present the distribution of the mapped origins and shuffled controls within scaled genic and intergenic regions (± 2 kb).

      (5) Line 149: "The number of origins in both cells was 148 compared using normalised mapped reads": Supplementary Figure 2D mentions that conditions were subsampled to the same amount. I would mention that explicitly in the main text ("compared using normalized, subsampled mapped reads"), as 'normalizing' would not include 'subsampling' for me. Also, I could not find the methods section that the authors refer to here.

      Thanks for the suggestion. We changed the text to make this point clearer. In the methods section, the subsampling process was referred to as 'PCF down-sampling', but we changed now the name to 'Read sub-sampling' to be more consistent in the edited version of the manuscript.

      (6) Figure 2C: I struggled to understand what gDNA stands for. Maybe it could be replaced with something like distribution in genome?

      Thanks for this suggestion. It is changed to ‘distribution in genomic sequence’.

      (7) Figure 5C: I cannot see how a G4 30 kb from an origin could be relevant. This also does not fit the scale of the author's own model at all (Figure 8).

      The main goal of Figure 5C was to demonstrate the differences between origins and the nearest G4s compared to the shuffled controls. The graph shows that 50% of the origins have a G4 within 2010 bp, whereas the median for the shuffled control is 4154 bp in the case of non-stabilised G4s. Our model is based on Figure 5D, which illustrates the enrichment of G4s and poly(dA) around the centre of origins.

      (8) Figure 6B: could be made supplementary in my opinion. All relevant data is repeated in panel D.

      It is true that Figures 6B and 6C contain some repetition. However, we would prefer to keep Figure 6B because it provides a quantification of the six indicated categories, along with the statistical tests. Figure 6B only presents the three categories that changed significantly. Figure 6D shows distribution but does not contain quantified data.

      (9) Figure 6D: This plot is repeating a lot, within single figures (Figure 6A, top) but also between figures (e.g., Figure 5D, Figure 4B). I'd prefer it if the initial plots of each figure were expanded a bit (here Figure 6A, top) to include some information from the previous figures. Then all these summary plots could be combined into a single figure at the very end (maybe still as different panels to reduce the number of lines in a single plot). Otherwise, each summary plot repeats the tracks of the previous, which becomes very repetitive.

      Our model is based on these summary plots, and we calculated the relative distances between the different elements using them. Two elements were repeated in each plot: the positions of poly(dA) and G4s. These two elements served as reference points to determine the relative positions of the other elements. Following your suggestion would result again in repetitive summary plots at the end, as one combined summary plot would be overloaded with lines and difficult to understand.

      (10) Figure 6D & Figure 7C: Both show predicted G4s; however, on the plus strand, one prediction has a two-peaked shape, the other only a single peak. Is this a mistake?

      The graphs for the predicted G4s do not have the same shape in the two plots as they were performed in different reference genomes for T. brucei. Figure 6C is in the 427-reference genome as the MNase-seq data set was analysed in this reference genome and we re-did the SNS-seq analysis and the G4 prediction in this reference genome to be able to compare them directly. In Figure 7C we are comparing origins DRIP-seq and predicted G4s, in this case all datasets could be compared in the 427-2018 reference genome.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, they show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However, mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption, and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages - from larval to adult zebrafish - the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

      Weaknesses:

      The study uses quantitative tracer assays with multiple molecular weight dyes to evaluate blood-brain barrier (BBB) permeability. The study normalizes the intensity of tracer signals (e.g., 10 kDa, 70 kDa dextrans) in the brain parenchyma to the vascular signal of a 2000 kDa dextran tracer (assumed to remain within vessels). Intensity normalization is used to control for variations in tracer injection efficiency or vascular density. This method doesn't directly assess the absolute amount of tracer present in the parenchyma, potentially underestimating leakage severity. As the lack of BBB impairment is a "negative" finding, more rigorous controls or other methods might be needed to corroborate it.

      In response to these and comments from other reviewers, we have now performed further carefully controlled analysis to test leakage of tracers using molecular weights ranging from 1 to 2000 kDa. We have performed additional normalisation approaches (new data in Fig. 2a–d) imaging tracer extravasation together with vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this transgenic reporter for normalisation (as suggested by Reviewer #2). The results of these experiments all supported our initial conclusions (revised Extended Data Fig. 3a–d) further validating the reliability of our method. Furthermore, as suggested by the reviewer analysis of the raw tracer intensity amounts in the parenchyma were also performed with no normalization at all (see Author response image 1). This also supports our conclusion that the BBB is intact in young animals. Finally, we now use our methods to demonstrate that we can detect an immature leaky BBB at 3 dpf and a mature functional BBB at 7 dpf (Fig. 2e-f), a suitable positive control to show that our methods and analyses are reliable.

      Author response image 1.

      Raw intensity values from the parenchyma confirm findings in Figure 2 and Extended Data Figure 3.a–d, Raw mean fluorescence intensity values of extravasated tracers in the midbrain.(a–b) show unnormalized values corresponding to Extended Data Fig. 3a–d, and (c–d) show unnormalized values corresponding to Fig. 1a–d. Unpaired t-tests for 70 and 10 kDa at 14 dpf in (a–b), for 10 kD at 7 dpf, and for 70 kDa at 14 dpf in (c–d). Mann-Whitney tests for 70 and 10 kDa at 7 dpf in (a–b), for 70 kDa at 7 dpf, and for 10 kDa at 14 dpf (c–d), due to non-normal distribution. These data were all generated in genotype blind assays, display variance in signal that is generated between embryos due to injection differences and show no difference between the genotypes analysed in BBB integrity. Comparison of this to normalised data using 2000 kDa tracer or kdrl expression in endothelial cells (Fig. 2 and Extended Data Fig. 3) confirms that normalisation improves the analysis, effectively controlling for embryo-to-embryo differences in delivery of tracer and imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092). The study by Ando et al, 2021 did not report experiments assessing BBB leakage in pdgfrb mutants but in the review article by Ando et al (PMID: 34685412) it is stated that "indicating that endothelial cells can produce basic barrier integrity without pericytes in zebrafish."

      We thank the reviewer for their comments and pointing out literature that we had not cited (this has been corrected in our revised manuscript).

      As noted by other reviewers, our study goes beyond simply confirming previous literature. The quoted section by the reviewer from Ando et al 2021 regarding intact barrier integrity in pdgfrb mutants is a conclusion based on apparent lack of haemorrhages in pdgfrb mutants[1]. Our work shows haemorrhages in older animals and as such is in line with these previously published results, but it also extends previous work, for the first time reporting detailed functional analysis to assess BBB integrity. Our study uses definitive tracer assays (now including extensive revisions) to identify intact the BBB in pdgfrb mutants in live animals. This has not been previously described and is important because it offers a new perspective on the evolutionary conservation (or otherwise) of pericyte control of BBB function. Furthermore, our study investigates the nature of hotspot leakage and haemorrhages in more detail than in previous work.

      Weaknesses:

      (1) The authors should avoid using violin plots, which show distribution. Instead, they should replace all violin plots in the figures with graphs showing individual data points and standard deviation. For Figure 2f specifically, the standard deviation in the analyzed cohort should be shown.

      This is a good point and we have replaced the violin plots with individual data points and shown all data as mean±SEM.

      (2) The authors have not shown the reduced PDGFRB protein or the effect of mutation on mRNA level in their zebrafish mutant.

      Our pdgfrb<sup>uq30bh</sup> mutant allele introduces a mutation predicted to generate a truncated protein very similar to previously validated alleles (see detail in revised Extended Data Fig. 1a and methods). Our pdgfrb<sup>uq30bh</sup> mutant also phenocopies previous pdgfrb mutants (sa16389 and um148 alleles)[2,3], displaying mural cell loss with multiple markers (Fig. 1a, new data in Extended Data Fig. 1b–c, Fig. 3b–c; Extended Data Fig. 4c–d) and the same typical morphological defects and survival rates (new data in Extended Data Fig. 1d–f). Thus our mutant phenocopy gives confidence it is most likely a null allele, in line with previous papers studying presumed null alleles[1].

      We believe this provides sufficient confidence in this allele of pdgfrb. Moreover, considering that our manuscript focusses on loss of mural cells and we show definitively that this mutant has robust loss of mural cells in the brain, our mutant is suitable for this study.

      (3) Statistical data analysis: Did the authors perform analyses to investigate whether the data has a normal distribution (e.g., Figures 1d, e)?

      We thank the reviewer for raising this and apologise for this oversight. All data have now been assessed for normality using Shapiro-Wilk test and further statistical analyses have been performed accordingly. The specific quantifications referred to by the reviewer in Extended Data Fig. 3a–d (previously Fig. 1d-e), have normal distribution except for quantification measuring 70 kDa extravasation at 7 dpf, therefore Mann-Whitney test has been used for this comparison. Further information can be found in figure legends and methods.

      (4) Analysis of tracer extravasation. The use of 2000 kDa dextran intensity as an internal reference is problematic because the authors have not provided data demonstrating that the 2000 kDa dextran signal remains consistent across the entire vasculature. The authors have not provided data demonstrating that the 2000 kDa dextran signal in vessels exhibits acceptable variance across the vasculature to serve as a reliable internal reference. The variability of this signal within a single animal remains unknown. The presented data do not address this aspect.

      We thank the reviewer for their comment and agree that analysis was needed for showing 2000 kDa dextran as a reliable normalization signal.

      We now show the data in the following Figures that demonstrate the consistency of signal throughout the vasculature using this 2000-kDa tracer: Extended Data Fig. 2b, Extended Data Fig. 3a and c, Extended Data Fig. 5a, Extended Data Fig. 6. In fact, we observe that this 2000 kDa tracer provides a very reliable marker of large and small calibre vessels in larval, juvenile and adult animals, even in fixed and cleared whole tissues and animals (e.g. Extended Data Fig. 2d-e, Extended Data Fig. 5 and 6).

      Our further experiments and analysis support the use of this tracer as an ideal way to normalise for variation between animals and coupled with improved masking of vessels using transgenic labels (e.g. Extended Data Fig. 2b) we can quantify across whole vascular networks to reduce the concern about variation within individual animals. We also find 2000 kDa shows negligible leakage through the brain vessels Extended Data Fig. 2b–c (new data) at 2 hours post-injection (hpi) and provided images in Extended Data Fig. 6b–b′′ showing detectable signals even at 6 hpi. Finally, results generated with this approach, normalisation to transgenic markers or even raw parenchymal values of tracer intensity, generate the same conclusions. In addition, we point the reviewer to a recent pre-print that further validates this method from our team[4].

      Overall, we find the use of this tracer an ideal way to normalise for differences in injection volumes between animals and we recommend the use of this method to other groups assessing BBB leakage in zebrafish.

      Additionally, it's intriguing that the signal intensity in the parenchyma of the tested tracers presents a substantial range, varying by 20-30% in the analysed cohort (Figure 1g, Extended Figure 1e). Such large variability raises the question of its origin. Could it be a consequence of the normalization to 2000 kDa dextran intensity which differs between different fish? Or is it due to the differences in the parenchymal signal intensity while the baseline 2000 kDa intensity is stable? Or is the situation mixed?

      This is a good point raised by the reviewer.

      To address this, we have used the following approaches:

      (1) We provide additional experiments and normalisation methods that support the utility of our tracer studies (new data in Fig 2a–f and Extended Data Fig. 2b–c), discussed in detail below.

      (2) We provide graphs of the raw parenchymal distribution of tracer not normalised at all (also requested by reviewer 1). This is provided in Author response image 1 and further supports all our conclusions, showing that our normalisation methods generate meaningful data.

      Overall, the range of parenchymal intensity that we see after tracer injection and live imaging shows variations introduced during microinjection. However, these ranges are in-line with previous publications using similar methods (see studies by O’Brown et al 2019 and 2023)[5,6], allow reliable statistical comparisons to be drawn between control and mutants and allow us to detect both immature and functional BBB states during zebrafish development (new data in Fig. 2e-f).

      Of note, the variability we see is likely introduced during the injection process into tiny larval blood vessels and is the reason why we perform normalization of parenchymal tracers to a vascular dextran signal that doesn’t leak from brain vessels. In our studies, 2000-kDa dextran has been co-injected with the smaller size tracers, therefore any potential differences in injection volumes as well as imaging conditions (however consistent) should be reduced by this method.

      An alternative and potentially more effective approach would be to cross the pdgfrb mutant line with a line where endothelial cells are genetically labeled to define vessels (e.g. the line kdrl used in acquiring data presented in Figure 2a). Non-injected controls could then be used as a baseline to assess tracer extravasation into the parenchyma.

      We thank the reviewer for this suggestion.

      In response, we have performed new tracer leakage experiments at 7 and 14 dpf in siblings and pdgfrb mutants and quantified parenchymal tracer extravasation by normalizing to vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). The results were in-line with the previously presented and independent experiments and showed indistinguishable phenotypes between siblings and pdgfrb mutants (new data, Fig. 2a–d). We also used uninjected controls to assess baseline and saw consistent values approaching zero in these images and did not include this in the revised paper.

      Furthermore, we have also used this approach in wild-type larvae at 3 dpf (immature BBB) and 7 dpf (functional BBB)[5]. We detected significantly higher parenchymal extravasation of 10 and 70 kDa tracers at 3 dpf compared to 7dpf, demonstrating that our method can detect leakage (new data, Fig. 2e–f).

      We believe that both normalization approaches have advantages (as discussed above), therefore showing the same results with these two different approaches has further strengthened our findings.

      How is the data presented in Figure 3e generated? How was the dextran intensity calculated? It looks like the authors have used the kdrl line to define vessels. Was the 2000 kDa still used as in previous figures? If not, please describe this in the Materials and Methods section.

      We have moved this data to Fig. 4e (previously Fig. 3e).

      Previously, we had plotted raw data due to the nature of the experiment being conducted on a vibratome sectioned tissue. The 2000 kDa tracer was not used. In response to this query and to be consistent with the new approach suggested by the reviewer, we have revised the quantification by normalizing the 10 kDa tracer extravasation to Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) for this and the new experiments on juveniles (Fig. 5h–i). Please see the corresponding figure legends or revised methods (lines 464–472).

      (5) The authors state that both controls and mutants show extravasation of 1 kDa NHS-ester into the parenchyma. However, the presented images do not illustrate this; it is not obvious from these images (Extended Data Figure 1c). Additionally, the presented quantification data (Extended Data Figure 1e) do not show that, at 7 dpf, the vasculature is permeable to this tracer. Note that the range of signal intensity of the 1 kDa NHS-ester is similar to the 70 kDa dextran (Figure 1g and Extended Figure 1e). Would one expect an increase in the ratio in case of extravasation, considering that the 2000 kDa dextran has the same intensity in all experiments? Please explain.

      We thank the reviewer for raising this important point.

      To clarify, we have never claimed that “2000-kDa dextran has the same intensity in all experiments”. On the contrary, vascular 2000 kDa normalization has been used to account for potential differences caused by injection, as stated in the submitted supplementary materials and now made more clear in the revision.

      In response to this query, we conducted more detailed analysis on tracer extravasation patterns based on molecular weight (new data, Extended Data Fig 2b–c). This analysis showed that 1- and 10-kDa tracers have much higher extravasation rate compared to 70- and 2000-kDa tracers. Interestingly, we did not find a significant difference between 1 and 10 kDa extravasation. Therefore, in the revised manuscript we used only 10 kDa in further experiments and have removed 1 kDa from the figures.

      To assess the tracers individually (new data in Extended Data Fig. 2c), parenchymal extravasation of individual tracers was normalised to their own vascular signal (eg. Mean intensity of 10 kDa in midbrain/mean intensity of 10 kDa in vasculature), to account for potential differences in injection volume. This provides a suitable method to assess leakage in wild-type animals and is now in line with how previous studies have analysed such tracer injections[5,6]. Please see revised figure legends and supplementary materials for details.

      (6) The study would be strengthened by a more detailed temporal analysis of the phenotype. When do the aneurysms appear? Is there an additional loss of VSMC?

      We thank the reviewer for this suggestion, and we have now performed staged imaging of the pdgfrb mutants and siblings between 7 and 21 dpf using TgBAC(acta2:EGFP)<sup>uq17bh</sup> transgene (new data, Fig. 3b-c; Extended Data Fig. 4a–d). Consistent with previous results, acta2:EGFP-positive cells surrounding the middle mesencephalic central arteries (MMCtA) were missing in pdgfrb mutants. At 21 dpf, we have also observed a mild dilation of these vessels, likely the earliest changes to generate aneurysms (new data, Fig. 3c).

      To extend the number of stages analysed in this study, we have also performed new tracer leakage experiments in juveniles (30 dpf) and found that aneurysms can be detected at this age when the 10 kDa tracer is used (new data in Fig. 5b–b′). Consistent with the adult stage phenotype, aneurysms were limited to the larger calibre vessels (arteries) in the brain. We have also observed hotspots, and upon quantification, we found fewer numbers in juveniles compared to adults, suggesting that severity of aneurysms and hotspots increase with age.

      Taken together, our results show that the aneurysms in pdgfrb mutants start appearing at late larval/early juvenile stages (~21 dpf) with observable dilations. By 30 dpf, aneurysms accompanied by small numbers of hotspots are observed, which exhibits significantly increased numbers by adulthood. This also correlates with reduced development and survival rate of pdgfrb mutants after 30 dpf (new data, Extended Data Fig. 1d–e).

      (7) The authors intended to analyze the BBB at later stages (line 128), but there is not a significant time difference between 2 months (Figure 2) and 3 months (Figure 3) considering that zebrafish live on average 3 years. Therefore, the selection of only two time-points, 2 and 3 months, to analyze BBB changes does not provide a comprehensive overview of temporal changes throughout the zebrafish's lifespan. How long do the pdgfb mutants live?

      Respectfully, zebrafish transition from juvenile stages to adulthood between 2 and 3 months and there are many significant differences in the physiology of this organism at these two ages. At 2 months, zebrafish are still juveniles undergoing metamorphosis with rapid growth and ongoing skeletal and vascular development. By 3 months, they are sexually mature adults and have much more developed cranioskeletal and vascular systems. Having said that, we take the reviewers important point that further temporal resolution would improve the study.

      We have performed new experiments in 1-month-old animals and provided comprehensive analysis of the vascular phenotypes occurring in pdgfrb mutants. These were very informative experiments analysing leakage using 10-kDa tracer injections and have significantly improved the study. We had previously provided experiments at 5-month-old adults as well (previously Fig. 4a–b and Extended Data Fig. 4a) and so now the study includes larval stages (7, 14 dpf), juveniles at 1 and 2 months and adults at 3 and 5 months. While the additional timepoints did not offer up any new conclusions, they significantly enhanced the body of work overall.

      Of further note, we provided survival data up to 90 dpf where survival of the pdgfrb mutants is significantly reduced compared to siblings (Extended Data Fig. 1e). We believe this is associated with the severity of the aneurysms and haemorrhages which probably lead to lethality in these mutants.

      (8) Why is there a difference in tracer permeability between 2 and 3 months (Figures 2 and 3)? Are hemorrhages not detected in 2-month-old zebrafish?

      In response to this and other queries, we have added new additional experiments that provide more detailed temporal analysis on tracer accumulation (new data in Fig. 5b–c, Fig. 5f–g).

      In short, we do not see obvious haemorrhages in 1- or 2-month fish at a gross level during dissections (not shown). We find that using 10-kDa tracer, we can detect small hotspots at aneurysms as early as 1 month, likely representing the earliest loss of integrity. We do not see obvious hotspots in 2-month-old animals when we use the 70-kDa tracer, this suggests to us that it is less sensitive for hotspot detection (in line with new Extended Data Fig. 2c). Finally, we find that the number of hotspots increases dramatically from Juvenile to Adult stages in our datasets, which we take as indicative of a progressive phenotype.

      Overall, tracer size matters for detecting hotspots and they become more apparent in older animals - we have added a note in the main text to cover these points (lines 200–205)

      (9) Figure 3: The capillary bed should be presented in magnified images as it is not clearly visible. Figure 3e shows that in the pdgfb mutant the dextran intensity is higher also in regions 6-10. How do the authors explain this?

      We thank the reviewer for raising this important point.

      Firstly, we now include enlarged views of the capillary beds for this experiment (Fig. 4d′) and new experiments mentioned below.

      Secondly, in relation to why there is higher tracer in lateral locations and not just medial sites of haemorrhage, we believe that this is most likely due to the progressive spread of tracer from the medial hotspots. To test if this is likely, we performed additional experiments and tested tracer accumulation at 2 different timepoints in brains collected at 0.5 or 6 hpi (new data in Fig. 5f–g, Extended Data Fig. 6a–b′′). Tracer accumulation at 0.5 hpi was very minimal and was primarily limited to hotspots and nearby regions new data in (Fig. 5h), whereas a higher tracer accumulation in brains was observed across medial to lateral regions at 6 hpi (new data in Fig. 5i) in pdgfrb mutants. Comparing the data in Figure 4 (2 hpi) and new data in Figure 5i (6 hpi), the 10 kDa-tracer appears to have spread to more lateral locations given the increased time allowed post injection.

      We cannot formally exclude the possibility that tracer leakage does occur slower through capillaries than at major hotspots, which might fit with the proposed model of slow leakage via increased EC transcytosis[7-9]. However, considering that we cannot detect increased tracer accumulation in pdgfrb mutants that lack aneurysms and haemorrhages at 7 and 14 dpf, such a scenario would require capillary transcytosis to be active at later juvenile and adult stages but not in larval and late larval animals. Thus, we believe the most plausible explanation is that aneurysm/haemorrhage associated leakage is the primary cause of the vascular integrity defects in zebrafish pdgfrb mutants.

      We have added discussions addressing this in the revised manuscript (lines 220–230, 300–302).

      (10) In general, the manuscript would benefit from a more detailed description of the performed experiments. How long did the tracer circulate in the experiments presented in Figures 2, 3, and 4?

      We thank the reviewer for this suggestion and have now ensured that this is clearly described for in figure legends and methods (lines 391–395).

      (11) How do the authors explain the poor signal of the 70 kDa dextran from the vasculature of 5-month-old zebrafish presented in Extended Data Figure 3?

      We agree that the dextran signal was reduced compared to the other experiments in that Figure. This is likely due to sample preparation and clearing causing reduced fluorescence. Upon consideration of the presented data and the additional experiments using 10 kDa tracers providing further validations for our claims, we decided to remove this data from the paper.

      (12) The study would benefit from a clear separation of the phenotypes caused by the loss of VSMC. The title eludes that also capillaries present hemorrhages which is not the case. How do vascular mural cells differ from mural cells? Are there any other mural cells?

      We take the reviewers point and have now updated the title as "Mural cells protect the adult brain from haemorrhage but do not control the blood-brain barrier in developing zebrafish."

      (13) I have a few comments about how the authors have interpreted the literature and why, in my opinion, they should revise their strong statements (e.g., the last sentence in the abstract).

      Scientists have their own insights and interpretations of data. However, when citing published data, it should be clearly indicated whether the statement is a direct quote from the original publication or an interpretation. In the current manuscript, the authors have not correctly cited the data presented in the two published papers (references 5 and 6). These papers do not propose a model where pericytes suppress "adsorptive transcytosis" (lines 73-76). While increased transcytosis is observed in pericyte-deficient mice, the specific type of vesicular transport that is increased or induced remains unknown.

      Similarly, lines 151-152 refer to references 5 and 6 and use the term "adsorptive transcytosis," but the authors of both papers did not use this term. Attributing this term to the original authors is inaccurate. Additionally, lines 152-153 do not accurately represent the findings of references 5 and 6. These papers do not state that there is an induction of "caveolae" in endothelial cells in pericyte-deficient mice. In the absence of pericytes, many vesicles can be observed in endothelial cells, but these vesicles are relatively large. It is more likely that there is some form of uncontrolled transcytosis, perhaps micropinocytosis. Please refer to the original papers accurately.

      We thank the reviewer for these comments. We take the point and have rewritten the manuscript carefully to improve accuracy and avoid misrepresenting any previous claims made in specific papers.

      Also, the authors have missed the fact that in mice, the extent of pericyte loss correlates with the extent of BBB leakage. To a certain extent, the remaining pericytes, can compensate for the loss by making longer processes and so ensure the full longitudinal coverage of the endothelium. This was shown in the initial work of Armulik et al (reference 5) and later in other studies.

      We certainly did not miss this important point (as we are also working with these mouse models) and we now include reference to this in our expanded discussion. Of note, we do think it would be worthwhile assessing if the extent of BBB leakage and pericyte coverage also correlates with the presence of microhaemorrhages in these hypomorphic mouse models, although this is more challenging to do in mice than in zebrafish.

      The bold assertion on lines 183 -187 that a lack of specific BBB phenotype in pdgfrb zebrafish mutant invalidates mouse model findings is unfounded. Despite the notion that zebrafish endothelium possesses a BBB, I present a few examples highlighting the differences in brain vascular development and why the authors' expectation of a straightforward extrapolation of mouse BBB phenotypes to zebrafish is untenable.

      In mice Pdgfrb knockout is lethal, but in zebrafish, this is not the case. In marked contrast to mice, however, zebrafish pdgfrb null mutants reach adulthood despite extensive cerebral vascular anomalies and hemorrhage. Following the authors' argumentation about the unlikely divergence of zebrafish and mice evolution, does it mean that the described mouse phenotype warrants a revisit and that the Pdgfrb knockout in mice perhaps is not lethal? Another example where the role of a gene product is not one-to-one, which relates to pericyte development, is Notch3. Notch3-null mice do not show significant changes in pericyte numbers or distribution, suggesting a less prominent role in pericyte development compared to zebrafish.

      Although many aspects of development are conserved between species, there are significant differences during brain vascular development between zebrafish and mice. These differences could reveal why the BBB is not impaired in zebrafish pdgfrb mutants. There is a difference in the temporal aspect when various cellular players emerge. The timing of microglia colonization in the brain differs. In mice, microglia colonization starts before the first vessel sprouts enter the brain, while in zebrafish, microglia enter after. Additionally, microglia in zebrafish and mice have a different ontogeny. In mice, astrocytes specialize postnatally and form astrocyte endfeet postnatally. In zebrafish, radial glia/astrocytes form at 48 hpf, and as early as 3 dpf, gfap+ cells have a close relationship with blood vessels. Thus, these radial glia/astrocyte-like cells could play an important role in BBB induction in zebrafish. It's worth noting that in Drosophila, the blood-brain barrier is located in glial cells. While speculative, these cells might still play a role in zebrafish, while the role of pericytes does not seem to be crucial. Pericytes enter the brain and contact with developing vasculature (endothelium) relatively late in zebrafish (60 hpf). In mice, the situation is different, as there is no such lag between endothelium and pericyte entry into the brain. I suggest that the authors approach the observed data with curiosity and ask: Why are these differences present? Are all aspects of the BBB induced by neural tissue in zebrafish? What is the contribution of microglia and astrocytes?"

      Another interesting aspect to consider is the endothelial-pericyte ratio and longitudinal coverage of pericytes in the zebrafish brain, and how this relates to what is observed in mice. How similar is the zebrafish vasculature to the mouse vasculature when it comes to the average length of pericytes in the zebrafish brain? Does the longitudinal coverage of pericytes in the zebrafish brain reach nearly 100%, as it does in mice?

      Based on the preceding arguments, it is recommended that the authors present a balanced discussion that provides insightful discussion and situates their work within a broader framework.

      Overall, we agree with most of the points made by the reviewer above. As we have now extended the format of this paper to be a full article, we have space to provide an extended discussion and introduction. We now try to capture many of the points made by the reviewer and we think that this has significantly improved the paper. We thank the reviewer for this contribution.

      We do want to point out that we did not state that our findings using zebrafish pdgfrb mutants invalidate mouse model findings. We suggest that a deeper analysis to understand the nature of the hotspots in mural cell deficient mammalian models could be very interesting in light of the zebrafish observations. We hope that the revised discussion better reflects this.

      Reviewer #3 (Public review):

      This manuscript examines the role of pdgfrb-positive pericytes in the establishment and maintenance of the blood-brain barrier (BBB) in the zebrafish. Previous studies in PDGFB- or PDGFRB-deficient mice have suggested that loss of pericytes results in disruption of the BBB. The authors show that zebrafish pdgfrb mutant larvae have an intact BBB and that pdgfrb mutant adult fish show large vessel defects and hemorrhage but do not exhibit substantial leakage from brain capillaries, suggesting loss of pericytes is not sufficient to "open" the BBB. The authors use beautiful and compelling images and rigorous quantification to back up most of their conclusions. The imaging of the adult brain is particularly nice. The authors rigorously document the lack of BBB leakage in pdgfrbuq30bh mutant larvae and large vessel phenotypes (eg, enlargement and rupture) in pdgfrbuq30bh mutant adults. A few points would help the authors to further strengthen their findings contradicting the current dogma from rodent models.

      We appreciate the reviewer's comments on the manuscript overall and agree that addressing the raised points was needed to strengthen our findings. We have addressed the main points below and believe that this revision greatly improves this study.

      Major point:

      The authors document pericyte loss using a single TgBAC(pdgfrb:egfp)ncv22 transgenic line driven by the promoter of the same gene mutated in their pdgfrbuq30bh mutants. Given their findings on the consequences of pericyte loss directly contradict current dogma from rodent studies, it would be useful to further validate the absence of brain pericytes in these mutants using one of several other transgenic lines marking pericytes currently available in the zebrafish. This could be done using pdgfrb crispants, which the authors show nicely phenocopy the germline mutants, at least in larvae. This would help nail down the absence of any currently identifiable pericyte population or sub-population in the loss of pdgfrb animals and substantially strengthen the authors' conclusions.

      We thank the reviewer and agree that examination of pdgfrb<sup>uq30bh</sup> mutants using another transgenic line labelling pericytes would further validate the absence of brain pericytes. We generated a transgenic line, TgBAC(abcc9:abcc9-T2A-mCherry)<sup>uom139</sup>, to visualise pericytes and validated the absence of brain pericytes in the pdgfrb mutants (revised Extended Data Fig. 1b). The loss of brain pericytes matched our findings using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line as well as previously published data by Ando et al 2016-2021, where the brain pericytes except for metencephalic artery were missing[2,3].

      Other issues:

      The authors should provide more information about the pdgfrbuq30bh mutant and how it was generated (including a diagram in a supplemental figure would be useful).

      We thank the reviewer for this suggestion. In addition to the explanations provided in supplementary materials, we have added a schematic, provided sanger sequencing results showing the mutation as well as predicted effect of the mutation on the protein domains (Extended Data Fig. 1a).

      It would be helpful to show some data on whether mutants show morphological phenotypes or developmental delay at 7 and 14 dpf, to provide some context to better assess the reduced branching and vessel length vascular phenotypes (see Figures 1c-e).

      We thank the reviewer for this suggestion. We have provided further details on body length and survival of the pdgfrb mutants until 90 dpf. As reported by Ando et al 2021, we did not observe any distinguishing feature until about 30 dpf[1,3]. The adult anatomy of our mutant allele matches that of previously described null mutants and is now shown (Extended Data Fig. 1f).

      If available, it would be helpful to have a positive control for the tracer leakage experiments - a genetic manipulation that does cause disruption of the BBB and leakage at 2 hours post-tracer injection (see Figures 1f and g).

      We thank the reviewer for this suggestion and agree that a positive control would validate reliability of our method. We have performed new experiments at 3 dpf when BBB integrity is not yet established and at 7 dpf when BBB is functional in zebrafish[5], testing both 10 and 70 kDa tracers (new data in Fig. 2e–f). We detected significantly higher tracer accumulation at 3 dpf, showing that our methods can detect tracer leakage in the brain.

      Quantification of the findings in Figure 4c, d would be useful, as would the use of germline fish for these experiments if these are now available. If this is not possible, it would be helpful to document that the crispants used in these experiments lack pdgfrb:egfp pericytes at adult stages (this is only shown for 5 dpf larvae, in Extended Data Figure 4b).

      We thank the reviewer for this comment. Using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line, we have imaged coronal brain sections collected from 10-week old pdgfrb crispants and uninjected siblings (age-matched animals used in Fig. 5d–e, previously Fig. 4c–d). We have now included data showing that adult pdgfrb crispants lack brain mural cells, phenocopying pdgfrb<sup>uq30bh</sup> mutants (new data, Extended Data Fig. 6f). These particular crispants are very reliable in our hands and nicely reproduce stable mutant phenotypes, giving us confidence to use the faster F0 approach in this experiment.

      Adult mutants clearly show less dye leakage in the more superficial capillary regions than WT siblings, but dextran intensity is a bit higher, although this could well be diffusion from more central brain regions where overt hemorrhage is occurring. Along similar lines though, the authors' TEM data in Extended Data Figure 4d hints that there may be more caveolae in mutant brain capillaries, although the N number was lower here than for the measurements from TEM of larger central vessels (Figure 4g). It would be useful to carry out additional measurements to increase the N number in Figure 4d to see whether the difference between wild-type sibling and mutant capillary caveolae numbers remains as not significant.

      We thank the reviewer for these raising important points and suggestions.

      Firstly, in relation to signal in capillary regions and likely diffusion from hotspots, please see the response to reviewer 3 point 9 above.

      Secondly, we have imaged and analysed more capillaries in both pdgfrb mutants and siblings (Extended Data Fig. 7a–b, previously Extended Data Fig. 4d). The results showed no significant difference between these groups, suggesting that capillary EC transcytosis is unchanged in our pdgfrb mutants.

      It might be helpful to include some orienting labels and/or additional descriptions in the figure legends to help readers who are not used to looking at zebrafish brain vessels have an easier time figuring out what they are looking at and where it is in the brain.

      We thank the reviewer for this suggestion and agree that adding further information in the figure legends and illustrations about orientation would make it easier for readers. In addition to the information provided in the figure legends in the submitted version, we have added an illustration, more labels on the revised figures, extended the descriptions in figure legends, main text and methods.

      We have added a schematic depicting the tracer leakage assay workflow, orientation of live imaging and analysed region of interest (Extended Data Fig. 1a–b).

      All figure legends have been updated with the anatomical position and microscopy view.

      Additional labels on figures have been added to understand the referenced vessel names (new data in Fig. 3c and Extended Data Fig. 4a–b′).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study uses the intensity of tracer signals within the vessels to analyze BBB permeability, potentially underestimating leakage severity. The dye intensity is measured 2 hours after injection, however, other studies have already observed leakage after 30 Minutes, by imaging directly in the brain parenchyma. The overall intensity should also decrease through leakage from the other vessels of the body, e.g. in the trunk and tail. Probably the loss of intra-vascular dye intensity from leakage in barrier-free vessels is already so high (after 2 hours) that the smaller amount of leakage across the BBB cannot be observed.

      We thank the reviewer for this comment and suggestion. We agree that small sized tracers leak from vasculature, particularly through fenestrated vessels in the trunk and tail. We have based our timing on previous studies and our own experience. In zebrafish, the study by O’Brown et al 2019 also used 2 hpi[5] for detection of leakage in mfsd2aa mutants, which also has been proposed to regulate BBB integrity by controlling EC transcytosis. Therefore, we believe that performing experiments at 2 hpi is appropriate to investigate roles of pericytes in BBB integrity. Our data would suggest that this timing works.

      In response to this and other comments, we performed further experiments and analyses to test leakage of tracers testing molecular weights ranging from 1 to 2000 kDa individually. We showed that these tracers can reliably be detected in brain parenchyma and vasculature when imaged at 2 hpi. In another study, we showed that medium size tracers such as 40 kDa Dextran can be reliably detected in the vasculature in similar timepoints[10]. Considering we have performed experiments using 10 and 70 kDa tracers do detect parenchymal tracer accumulation and tracer still within the vessels, we believe this timepoint is appropriate for assessing BBB integrity in zebrafish.

      In addition to these experiments, see our tracer leakage experiments in 1-month-old animals, at 0.5 and 6 hpi to test leakage pattern described above (Fig. 5 and Extended Data Fig. 6).

      Therefore, the authors will need to validate their method of choice, showing an impairment of the BBB, caused by other agents (known to affect the BBB), and at 48hpf, when the BBB is not tightened yet. One example for BBB impairment can be found in O'Brown et al (2019), eLife 8e47326. doi: 10.7554/eLife.47326

      We thank the reviewer for this suggestion. As shown by O’Brown et al 2019, we have performed experiments at 3 dpf when BBB integrity is not mature and at 7 dpf when BBB is functional[5], testing both 10 and 70 kDa tracers. We detected significantly higher tracer accumulation at 3 dpf, showing our new additional method (see below) can detect tracer leakage in the brain (new data in Fig. 2e–f).

      Ideally, the authors would also supplement the method with additional approaches in the younger developmental stages to validate their findings.

      The validation of the method and the findings is particularly important for the claims of lack of BBB impairment in the absence of mural cells, as this is a "negative" finding.

      In response to this and comments from other reviewers, we performed additional tracer leakage experiments (new data in Fig. 2a–d) where we imaged 10 and 70 kDa tracers with a vascular reporter (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this reporter for normalisation. Both this approach as well as the experiments provided in the first submission (updated as Extended Data Fig. 3a–d) showed that pdgfrb mutants at 7 and 14 dpf have indistinguishable BBB integrity compared to siblings. See also Author response image 1 that further addresses this.

      I also strongly suggest to rephrase and downtown the claim that vascular mural cells do not control the blood-brain barrier in developing zebrafish.

      As a negative finding cannot be proven completely and lots of the previously shown effects on murine BBB impairment are rather weak (when caused by single agents such as Claudin5 deficiency or Sphingosine-phosphate receptor1 knockout), it might be important to only claim that in zebrafish no strong impairment (as observed in the mural cell-deficient mouse) could be observed. Or rephrase it to "no impairment as severe as/comparable to ... could be observed" and then provide an impairment control for the developmental stages.

      We thank the reviewer for this comment and agree that negative findings are very challenging to prove. However, we find no evidence of leakage of the BBB in animals lacking mural cells at 7 and 14 dpf and believe that our data is robust on this point. As such, we believe we show that a vertebrate with a largely conserved EC BBB, can have intact barrier function in the absence of mural cells.

      We have as suggested revised our claims throughout the manuscript to provide more further nuanced discussion of this, but we do not want to water down our claims too much as we believe they are important. We hope that the reviewer will appreciate our carefully worded and expanded discussion section.

      Additional items of interest to the readers and therefore suggestions to improve the manuscript could be

      (1) To include more molecular analysis: while the study identifies caveolae induction and basement membrane thickening as potential contributors to focal leakage, the exact molecular mechanisms linking mural cell loss to these structural changes are not deeply investigated.

      (2) Also, the study primarily associates BBB disruption in the adult with aneurysms. Therefore other subtle or diffuse changes to BBB permeability that might occur even without overt vascular lesions are potentially underrepresented.

      However, following up experimentally on these might exceed the scope of the manuscript.

      We thank the reviewer for these suggestions and agree with both points. However, as stated by the reviewer, these experiments are beyond the scope of the manuscript and represent future directions for our lab and others.

      Reviewer #2 (Recommendations for the authors):

      (1) Mouse genes should be written as follows: Pdgfb, Pdgfrb and be in italics. See line line 70: it should be written "Pdgfb and Pdgfrb (italics)" and not "PdgfB and Pdgfrβ".

      We have updated the text according to the reviewer’s suggestion.

      (2) Please state the age of the fish analyzed in Figure 1f and 1g.

      We have moved this data to Extended Fig. 3a–d (previously Fig. 1f-g) and have placed age information on the images and in the figure legends.

      (3) Is the reduced vascular complexity in pdgfb mutant due to reduced angiogenesis or due to excessive pruning?

      This is a good question, and we do not know at this stage. We have unpublished data that suggest pericytes secrete angiogenic growth factors, but this question warrants a thorough investigation that we believe is beyond the scope of this current study.

      (4) Please check that the figure legends state the correct number of fish analysed. For example, Figure 1 d, e N=8 but there seem to be 9 data points per group - 14dpf.

      We apologise for this mistake and thank the reviewer for raising this. We have updated the graphs and figure legends accordingly.

      (5) Please indicate in the figures the genotypes (wt, het) of a sibling presented alongside a pdgfb mutant.

      Wild-type and heterozygous mutants are commonly used together in zebrafish research as a collective control group termed siblings. Since we didn’t see any difference between wild-type and pdgfrbuq30bh/- groups in any experiments, we reported these groups together. This is now stated in the supplementary materials.

      One exception to this was examination of the growth and survival rates where we show the genotypes separately (new data in Extended Data Fig. 1b-f).

      (6) Please explain clearly what region is shown in Figure 2B. I do not understand the explanation "approximate location of dotted line". Is the image in the panel "a" top view of a brain?

      We have moved this data to Fig. 3a′ (previously Fig. 2b) and replaced the dotted line in Figure 3a (previously Fig. 2a) with a white box indicating the location of the restricted region in the whole brain image.

      We have revised the text as below:

      “Subset of z-slices from the whole brain imaging in (a) and (b) (white boxes) indicating mural cell loss and abnormal capillary network patterning. 100-μm-thick maximum intensity projections (MIP) were generated using the continuation of the left middle mesencephalic central artery (MMCtA, arrow) as an anatomical landmark.”

      In addition, we have updated all our figure legends clearly stating the view and anatomical position of the imaged sample.

      (7) Figure 2e: Note that- the dotted areas do not correspond to the areas magnified. Please adjust.

      We have moved this data to Extended Data Fig. 5a (previously Fig. 2e–e′) and updated the location of the white box in 5a shown in enlarged view in 5a′.

      (8) Lines 112 and 114 - Should the indicated figure be Figure 2b-d and Figure 2c-d, respectively, and not Figure 1?

      We thank the reviewer for pointing out this mistake. All the figure legends are now referred to appropriately in the revised manuscript.

      (9) Data presented in Figure 2 and Figure 3 can be consolidated and presented as one Figure.

      We thank the reviewer for this suggestion. After addition of new data and revising the manuscript we have decided to keep these data presented separately.

      (10) Note that Figure 2a,b shows 5-month-old fish, not 2-month-old fish. Additionally, Extended Data Figure 3 shows 5-month-old fish, not 3-month-old fish.

      The stages noted by the reviewer were correctly indicated.

      (11) Figure 2d: Please clarify the definition of a "large vessel".

      We have observed normal morphology in capillaries and noted aneurysms and hotspots in large calibre vessels such as arteries, which become more severe over time. We have revised this across the manuscript accordingly.

      (12) Figure 4a, b: Please explain how the hotspots of leakage were defined based on the extravasated tracer.

      Hotspots of leakage are scored when fluorescent tracer aggregates are clearly observed outside the vessels. Vessel borders were defined using the transgenic lines (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). We have added a clear description in the methods section (lines 473–475).

      Figure 4c: Why were Pdgfrb crispants used and not the mutant line?

      They were used as pdgfrb crispants phenocopy the lack of brain mural cells (Extended Data Fig. 5e, previously Extended Data Fig. 4b) and mutant phenotype reliably and for practical reasons, because they allow faster experiments and reduce fish usage.

      Figure 4e: The magnification of the electron microscopy images does not make it possible to clearly identify caveolae. What was the magnification of the collected images for caveolae analysis? How did the authors ensure that they quantified only caveolae and not other types of vesicles?

      Respectfully, we disagree that the magnification is insufficient as our images were captured and analysed consistent with previous ultrastructural descriptions[11,12]. We based our quantification of caveolae on the size of vesicles observed and define them as circular profiles of less than 100 nm in diameter and were scored as luminal or abluminal based on proximity to each surface membrane (within 500 nm of each surface or in a thin-walled vessel the caveolae closest to each surface) (lines 398–409). Importantly, comparable analyses at similar magnifications have been independently validated in multiple caveola-deficient zebrafish genetic models[4,13]. Interestingly given the reviewers comments above, we do see increased vesicular structures that are larger than caveolae, but we only provide quantification of the caveolae here.

      Reviewer #3 (Recommendations for the authors):

      Congratulations to the authors on their really beautiful imaging and rigorous quantitative documentation of phenotypes - this is a really nicely done study, and could be very important to the field with just a few additional experiments to buttress the key conclusions.

      We thank the reviewer for their kind comments.

      In addition to the comments noted in the public review, I would only point out that there are two mislabeled call-outs in the text (Lines 112 and 114; says Figure 1, should say Figure 2).

      We thank the reviewer for this point and have now revised the text accordingly.

      (1) Ando, K., Ishii, T. & Fukuhara, S. Zebrafish Vascular Mural Cell Biology: Recent Advances, Development, and Functions. Life (Basel) 11 (2021). https://doi.org/10.3390/life11101041

      (2) Ando, K. et al. Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development 143, 1328-1339 (2016). https://doi.org/10.1242/dev.132654

      (3) Ando, K. et al. Conserved and context-dependent roles for pdgfrb signaling during zebrafish vascular mural cell development. Dev Biol 479, 11-22 (2021). https://doi.org/10.1016/j.ydbio.2021.06.010

      (4) Lim, Y. W. et al. Trans-Endothelial Trafficking in Zebrafish: Nanobio Interactions of Polyethylene Glycol-Based Nanoparticles in Live Vasculature. ACS Nano (2026). https://doi.org/10.1021/acsnano.5c21042

      (5) O'Brown, N. M., Megason, S. G. & Gu, C. Suppression of transcytosis regulates zebrafish blood-brain barrier function. Elife 8 (2019). https://doi.org/10.7554/eLife.47326

      (6) O'Brown, N. M. et al. The secreted neuronal signal Spock1 promotes blood-brain barrier development. Dev Cell 58, 1534-1547 e1536 (2023). https://doi.org/10.1016/j.devcel.2023.06.005

      (7) Armulik, A. et al. Pericytes regulate the blood-brain barrier. Nature 468, 557-561 (2010). https://doi.org/10.1038/nature09522

      (8) Daneman, R., Zhou, L., Kebede, A. A. & Barres, B. A. Pericytes are required for blood-brain barrier integrity during embryogenesis. Nature 468, 562-566 (2010). https://doi.org/10.1038/nature09513

      (9) Mae, M. A. et al. Single-Cell Analysis of Blood-Brain Barrier Response to Pericyte Loss. Circ Res 128, e46-e62 (2021). https://doi.org/10.1161/CIRCRESAHA.120.317473

      (10) Lim, Y.-W. et al. A Standardized Protocol to Investigate Trans- Endothelial Trafficking in Zebrafish: Nano-bio Interactions of PEG-based Nanoparticles in Live Vasculature. bioRxiv, 2025.2007.2023.666282 (2025). https://doi.org/10.1101/2025.07.23.666282

      (11) Parton, R. G. & Simons, K. The multiple faces of caveolae. Nat Rev Mol Cell Biol 8, 185-194 (2007). https://doi.org/10.1038/nrm2122

      (12) Parton, R. G. & del Pozo, M. A. Caveolae as plasma membrane sensors, protectors and organizers. Nat Rev Mol Cell Biol 14, 98-112 (2013). https://doi.org/10.1038/nrm3512

      (13) Lim, Y. W. et al. Caveolae Protect Notochord Cells against Catastrophic Mechanical Failure during Development. Curr Biol 27, 1968-1981 e1967 (2017). https://doi.org/10.1016/j.cub.2017.05.06

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the mechanisms underlying Kupffer cell death in metabolic-associated steatotic liver disease (MASLD). The authors propose that KCs undergo massive cell death in MASLD and that glycolysis drives this process. However, there appears to be a discrepancy between the reported high rates of KC death and the apparent maintenance of KC homeostasis and replacement capacity.

      Strengths:

      This is an in vivo study.

      Weaknesses:

      There are discrepancies between the authors' observations and previous reports, as well as inconsistencies among their own findings.

      Before presenting the percentage of CLEC4F<sup>+</sup>TUNEL<sup>+</sup> cells, the authors should have first shown the number of CLEC4F<sup>+</sup> cells per unit area in Figure 1. At 16 weeks of age, the proportion of TUNEL<sup>+</sup> KCs is extremely high (~60%), yet the flow cytometry data indicate that nearly all F4/80<sup>+</sup> KCs are TIMD4<sup>+</sup>, suggesting an embryonic origin. If such extensive KC death occurred, the proportion of embryonically derived TIMD4<sup>+</sup> KCs would be expected to decrease substantially. Surprisingly, the proportion of TIMD4<sup>+</sup> KCs is comparable between chow-fed and 16-week HFHC-fed animals. Thus, the immunostaining and flow cytometry data are inconsistent, making it difficult to explain how massive KC death does not lead to their replacement by monocyte-derived cells.

      We thank the reviewer for the insightful comment and the opportunity to clarify this important point. To ensure consistency between our methodologies, we replaced Clec4f staining with TIM4 staining results as requested by the reviewer. We first showed the number of TIM4<sup>+</sup> cells per unit area in Figure 1B. The results showed a significant and progressive loss of TIM4<sup>+</sup> cells per unit area in the liver parenchyma, decreasing from approximately 60 cells/FOV at baseline (0w) to nearly 50 at 4w and further to about 30 at 16w post-HFHC diet. This finding is fully consistent with our flow cytometry data. The percentage of the embryonically derived KC population (CD11blow F4/80hi TIM4hi) among CD45<sup>+</sup> cells dropped from 30.2% (0w) to 24.3% (4w) and 17.6% (16w) (Revised Figure 1C). The absolute number per gram of liver decreased from roughly 12 x 10<sup>5</sup> (1w) to 9 x 10<sup>5</sup> (4w) and 5 x 10<sup>5</sup> (16w) (Revised Figure 1D).

      These data suggest that despite the reported high rate of cell death among CLEC4F<sup>+</sup>TIMD4<sup>+</sup> KCs, the population appears to self-maintain, with no evidence of monocyte-derived KC generation in this model, which contradicts several recent studies in the field.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the remaining embryonic KCs(EmKCs) are maintained through self-renewal, as the proportion of Ki67<sup>+</sup>TIM4<sup>+</sup> cells remains low at all time points (Revised Figure S2D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure S2B, S2C), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMF expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4-KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (Ref. 1,2). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      Moreover, there is no evidence that TIM4<sup>+</sup>CLEC4F<sup>+</sup> KCs increase their proliferation rate to compensate for such extensive cell death. If approximately 60% of KCs are dying and no monocyte-derived KCs are recruited, one would expect a much greater decrease in total KC numbers than what is reported.

      Thank you for raising this point, which allows for an important clarification. The interpretation that approximately 60% of KCs are dying is correct, but this refers to the proportion of the remaining KC population at 16 weeks that is TUNEL<sup>+</sup>, not to 60% of the original KC pool. Since our data show that over half of the EmKCs are lost by 16 weeks (Revised Figure 1B), the 60% of dying cells at this late time point corresponds roughly to only 25-30% of the total original KC population at baseline. This distinction reconciles the high rate of apoptosis observed late in disease with the overall progressive depletion of the EmKC pool.

      It is also unexpected that the maximal rate of KC death occurs at early time points (8 weeks), when the mice have not yet gained substantial weight (Figure 1B). Previous studies have shown that longer feeding periods are typically required to observe the loss of embryo-derived KCs.

      We appreciate the reviewer’s insightful observation. We think KC death is a continuous event during MASLD. To induce MASH, previous studies typically assess the loss of EmKCs after longer feeding periods, which might leave us an impression that longer feeding periods are required to observe substantive loss of embryonically derived KCs. In our HFHC model, the proportion of dying KCs was already elevated by 8 weeks, and this high rate was sustained through the 16-week endpoint. In a separate MCD dietary model characterized by rapid MASLD progression, a high rate of KC death was detectable as early as 6 weeks (Revised Figure 1F). Collectively, these data suggest that the onset of significant KC death is dependent on the pace of MASLD pathogenesis, more likely an early-initiated event that is through MASLD progression.

      Furthermore, it is surprising that the HFD induces as much KC death as the HFHC and MCD diets. Earlier studies suggested that HFD alone is far less effective than MASH-inducing diets at promoting the replacement of embryonic KCs by monocyte-derived macrophages.

      We appreciate the reviewer’s insightful comment. In our study, we observed significant KCs death under both HFD and HFHC feeding for 20, 16 weeks, respectively. Moreover, both HFHC and HFD induced similar stages of MASLD (characterized by significant lipid accumulation without fibrosis development) by these time points (Authir response image 1). Therefore, these data support that the onset of substantial KCs death may be an early MASLD event, before the progression to MASH. Additionally, this finding aligns with existing literature showing that 16 weeks of HFD feeding alone is sufficient to cause a marked reduction in the TIM4<sup>+</sup>KCs population (Ref. 1).

      Author response image 1.

      Detection of liver fibrosis in MASLD mouse models. Male wild-type C57BL/6J mice were fed a high-fat, high-cholesterol (HFHC) diet for 16 weeks or a high-fat diet (HFD) for 20 weeks to induce MASLD. Mice fed a normal chow diet (NCD) served as controls. (A) Sirius Red staining of liver sections was performed to assess collagen deposition and fibrosis during MASLD progression. Scale bar, 20 μm. (B) Western blot analysis of liver tissue lysates showing α-smooth muscle actin (α-SMA) expression as a marker of hepatic stellate cell activation and liver fibrosis.

      In Figure 2D, TIMD4 staining appears extremely faint, making the results difficult to interpret. In contrast, the TUNEL signal is strikingly intense and encompasses a large proportion of liver cells (approximately 60% of KCs, 15% of hepatocytes, 20% of hepatic stellate cells, 30% of non-KC macrophages, and a proportion of endothelial cells is also likely affected). This pattern closely resembles that typically observed in mouse models of acute liver failure. Given this apparent extent of cell death, it is unexpected that ALT and AST levels remain low in MASH mice, which is highly unusual.

      Thank you for this important feedback. To address concerns about the clarity of our imaging, we have provided high-resolution split-channel raw images for Figure 2D (Revised Figure 2D), which distinctly show the localization of TIM4, TUNEL, and GS. These confirm the progressive reduction of TIM4<sup>+</sup>KCs and the increase in TUNEL<sup>+</sup> TIM4<sup>+</sup>cells over time. We agree that the high proportion of TUNEL<sup>+</sup>cells seems at odds with the modest ALT/AST elevation. This discrepancy might be explained by the distinct nature of cell death in MASLD. Unlike the acute necrosis with membrane rupture seen in acute liver failure—which causes massive, rapid enzyme release— obesity-related liver injury is a chronic process dominated by apoptosis (Ref. 4,5). Apoptosis preserves membrane integrity until late stages (Ref. 6), with dying cells packaged into apoptotic bodies for efficient phagocytic clearance by neighboring macrophages (Ref. 7,8). This controlled disposal system minimizes the leakage of intracellular enzymes. Therefore, the coexistence of widespread apoptosis (high TUNEL signal) with limited enzyme release (low ALT/AST) is a recognized feature of chronic MASLD pathogenesis.

      No statistical analysis is provided for Figure 5D, and it is unclear which metabolites show statistically significant changes in Figure 5C.

      We thank the reviewer for raising this statistical problem. We have now included statistical analysis in Revised Figure 5D.

      In addition, there is no evaluation of liver pathology in Clec4f-Cre × Chil1flox/flox mice. It remains possible that the observed effects on KC death result from aggravated liver injury in these animals. There is also no evidence that Chil1 deficiency affects glucose metabolism in KCs in vivo.

      We thank the reviewer for these important points. We previously characterized the liver pathology of Clec4f<sup>ΔChil1</sup> mice in detail (preprint: eLife 2025, DOI: 10.7554/eLife.107023.1, Fig. 2). On a normal chow diet, these mice showed no differences in body weight, hepatic lipid deposition, metabolic parameters, or glucose tolerance compared to controls. However, on an HFHC diet, Clec4f<sup>ΔChil1</sup> mice developed significantly worse metabolic and histological phenotypes. Crucially, our in vitro data demonstrate that recombinant Chi3l1 directly reduces KC death (preprint, Fig. 6E-F), indicating that the aggravated MASLD in knockout mice is a consequence of increased KC loss, not its cause.

      Regarding glucose metabolism, we have previously shown that Chi3l1 deficiency leads to increased glucose uptake by KCs in vivo using the fluorescent glucose analog 2-NBDG. This effect was reversed by supplementing knockout mice with recombinant Chi3l1 (preprint Fig. 6G-H). This provides direct evidence that Chi3l1 modulates glucose uptake in KCs in vivo.

      Finally, the authors should include a more direct experimental approach to modulate glycolysis in KCs and assess its causal role in KC death in MASH.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) in the HFHC-induced MASLD model (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for four weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity during active disease development. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, He et al. set out to investigate the mechanisms behind Kupffer Cell death in MASLD. As has been previously shown, they demonstrate a loss of resident KCs in MASLD in different mouse models. They then go on to show that this correlates with alterations in genes/metabolites associated with glucose metabolism in KCs. To investigate the role of glucose metabolism further, they subject isolated KCs in vitro to different metabolic treatments and assess cleaved caspase 3 staining, demonstrating that KCs show increased Cl. Casp 3 staining upon stimulation of glycolysis. Finally, they use a genetic mouse model (Chil1KO) where they have previously reported that loss of this gene leads to increased glycolysis and validate this finding in BMDMs (KO). They then remove this gene specifically from KCs (Clec4fCre) and show that this leads to increased macrophage death compared with controls.

      Strengths:

      As we do not yet understand why KCs die in MASLD, this manuscript provides some explanation for this finding. The metabolomics is novel and provides insight into KC biology. It could also lead to further investigation; here, it will be important that the full dataset is made available.

      Weaknesses:

      Different diets are known to induce different amounts of KC loss, yet here, all models examined appear to result in 60% KC death. One small field of view of liver tissue is shown as representative to make these claims, but this is not sufficient, as anything can be claimed based on one field of view. Rather, a full tissue slice should be included to allow readers to really assess the level of death.

      Thank you for raising this point regarding data presentation. We analyzed full tissue slices and found that including a view of the entire slice at a standard magnification makes individual KC difficult to resolve (Author response image 2). To clearly represent the extent and distribution of KCs death across the liver tissue slice, we now include lower-magnification images that provide a wider field of view, allowing readers to assess the pattern across a larger tissue area (Revised Figures 1, 2, 6F).

      Author response image 2.

      Assessment of KCs death on full liver tissue slice. (A) Immunofluorescence staining was performed to detect Kupffer cell (KC) death in liver sections from mice fed an MCD diet for 6 weeks. Cell death was assessed by TUNEL staining (green), and KCs were identified by TIM4 staining (red). Nuclei were counterstained with DAPI (blue). Representative whole-tissue view is shown. Scale bars, 1mm.

      Additionally, there is no consistency between the markers used to define KCs and moMFs, with CLEC4F being used in microscopy, TIM4 in flow, while the authors themselves acknowledge that moKCs are CLEC4F+TIM4-. As moKCs are induced in MASLD, this limits interpretation. Additionally, Iba1 is referred to as a moMF marker but is also expressed by KCs, which again prevents an accurate interpretation of the data. Indeed, the authors show 60% of KCs are dying but only 30% of IBA1+ moMFs, as KCs are also IBA1+, this would mean that KCs die much more than moMFs, which would then limit the relevance of the BMDM studies performed if the phenotype is KC specific. Therefore, this needs to be clarified.

      We thank the reviewer for the constructive comments. For consistency, we have standardized our KC marker to TIM4 for all immunostaining data, aligning it with our flow cytometry analysis (Revised Figures 1, 2D, 6F). We have also clarified that IBA1 is expressed by hepatic macrophages (both KCs and MoMFs)(Revised Figure 2C, Revised manuscript, page 5, lines 182-183). Moreover, we also included the clarification that 60% of TIM4<sup>+</sup> KCs are TUNEL<sup>+</sup> versus 30% of total IBA1<sup>+</sup> cells further supports that KCs undergo death more readily than MoMFs (Revised manuscript, page 5, lines 186-189). We also acknowleged the limitation of BMDM studies in the Revised manuscript, page 8, line 332-340.

      The claim that periportal KCs die preferentially is not supported, given that the majority of KCs are peri-portal. Rather, these results would need to be normalised to KC numbers in PP vs PC regions to make meaningful conclusions.

      We thank the reviewer for this important point. We included the normalized data. At 8 weeks, the normalized death rate was significantly higher in periportal versus pericentral regions (p = 0.041), supporting increased periportal KC susceptibility during early MASLD. By 16 weeks, proportional death rates became comparable between zones (Revised Figure 2D, Revised manuscript, page 6, lines 194-201).

      Additionally, KCs are known to be notoriously difficult to keep alive in vitro, and for these studies, the authors only examine cl. Casp 3 staining. To fully understand that data, a full analysis of the viability of the cells and whether they retain the KC phenotype in all conditions is required.

      We appreciate the reviewer’s suggestions. To confirm the identity and health of isolated KCs in our in vitro studies, we showed that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      Finally, in the Cre-driven KO model, there does not seem to be any death of KCs in the controls (rather numbers trend towards an increase with time on diet, Figure 6E), contrary to what had been claimed in the rest of the paper, again making it difficult to interpret the overall results.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Additionally, there is no validation that the increased death observed in vivo in KCs is due to further promotion of glycolysis.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity in KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments support a contributory role for excessive glycolytic activity in promoting KCs death in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. Still, several interpretations require moderation to avoid overstatement, and certain experimental details, particularly those concerning flow cytometry and population gating, need further clarification.

      Strengths:

      (1) The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      (2) The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      (3) The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      (4) The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      (1) The novelty is questionable. The presented work has considerable overlap with a study by the same lab, which is currently under review (citation 17), and it should be considered whether the data should not be presented in one paper.

      We appreciate the reviewer for the opportunity to clarify the relationship between the two studies. In our previous work (citation 17), we focused on the transcriptional metabolic differences between Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs) and identified Chi3l1 as a selective protective factor that limits glucose uptake and shields KCs from metabolic stress–induced cell death, with minimal effects on MoMFs. That study directly motivated the current work. The observation that KCs are uniquely protected from metabolic stress led us to hypothesize that excessive glycolytic activation itself may be a primary driver of KCs death, which forms the central question of the present study. Accordingly, the current manuscript shifts the focus from Chi3l1-mediated protection to the mechanistic role of hyperglycolysis in driving KCs mortality, using distinct experimental approaches and addressing a different biological question. Because the two studies address conceptually distinct aims—one defining a protective regulator of KCs survival and the other dissecting glycolysis-driven KCs death mechanisms—we believe they are best presented as separate manuscripts. Combining them into a single study would dilute the mechanistic depth and clarity of each story.

      (2) The authors report that 60% of KCs are TUNEL-positive after 16 weeks of HFHC diet and confirm this by cleaved caspase-3 staining. Given that such marker positivity typically indicates imminent cell death within hours, it is unexpected that more extensive KC depletion or monocyte infiltration is not observed. Since Timd4 expression on monocyte-derived macrophages takes roughly one month to establish, the authors should consider whether these TUNEL-positive KCs persist in a pre-apoptotic state longer than anticipated. Alternatively, fate-mapping experiments could clarify the dynamics of KC death and replacement.

      We thank the reviewer for this astute observation. As shown in revised Figure 2D, the proportion of TIM4<sup>+</sup>TUNEL<sup>+</sup>KCs peaks at 8 weeks after HFHC feeding and remains elevated at 16 weeks. However, examination of the corresponding single-channel TIM4 staining during this period reveals that the overall density of TIM4<sup>+</sup> KCs does not undergo abrupt or synchronous depletion. This temporal dissociation between sustained TUNEL positivity and relatively gradual KCs loss suggests that TUNEL-positive KCs do not undergo immediate clearance. Based on these observations, we agree with the reviewer that a substantial fraction of TUNEL-positive KCs likely persists in a prolonged pre-apoptotic or stressed state rather than undergoing rapid cell death. This interpretation is consistent with the absence of extensive KCs depletion or compensatory monocyte infiltration at these time points. Importantly, previous studies (Ref. 1,2) indicate that KCs are eventually lost as MASLD progresses, supporting the notion that KC death is a gradual process that unfolds over an extended time frame rather than acutely.

      (3) The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity of KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      (4) The study does not address the polarization or ontogeny of KCs during early MASLD. Given that pro-inflammatory macrophages preferentially utilize glycolysis, such data could provide valuable insight into the reason for increased KC death beyond the presented hyperreliance on glycolysis.

      We thank the reviewer for this insightful comment. Regarding KCS ontogeny, flow cytometry analysis (Revised Figure 1C) shows that KCs remain uniformly TIM4<sup>hi</sup> during early MASLD, indicating that monocyte-derived KCs (TIM4<sup>low</sup>) have not yet emerged at these stages. To address KCs polarization, we assessed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in KCs isolated from WT mice fed a HFHC diet for 0, 8, and 16 weeks. As shown in revised Figure S5A, M1 markers progressively increase over time, whereas M2 markers remain unchanged or slightly decrease. This polarization shift is consistent with the increased glycolytic activity observed in KCs during early MASLD. Together, these data indicate that embryonically derived KCs undergo a pro-inflammatory polarization accompanied by enhanced glycolytic metabolism during early MASLD, providing mechanistic context for their increased susceptibility to metabolic stress–induced cell death beyond hyperreliance on glycolysis alone (Revised manuscript, page 7-8, line 307-321).

      (5) The gating strategy for monocyte-derived macrophages (moMFs) appears suboptimal and may include monocytes. A more rigorous characterization of myeloid populations by including additional markers would strengthen the study's conclusions.

      We thank the reviewer for raising this important point. To improve the rigor of our analysis, we adopted gating strategies established in previous studies (PMID: 41131393; PMID: 32562600). Specifically, Kupffer cells were defined as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> cells, while monocyte-derived macrophages (MoMFs) were defined as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/−</sup> cells, thereby excluding contaminating neutrophils and minimizing inclusion of circulating monocytes. Using this refined gating strategy, we observed a progressive reduction of KCs accompanied by a corresponding increase in MoMFs in WT mice during HFHC feeding (Revised Figures 1C and S2B–C), (Revised manuscript, page 4, line 154-163).

      (6) While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself.

      We thank the reviewer for this important question. To determine whether Chi3l1 deficiency affects macrophage differentiation, we analyzed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in Kupffer cells isolated from WT and Chil1<sup>-/-</sup> mice fed a HFHC diet for 0, 8, and 16 weeks. At baseline (0 weeks), Chi3l1 deficiency was associated with elevated expression of multiple M1 markers, whereas M2 marker expression was comparable between WT and Chil1<sup>-/-</sup> KCs. During MASLD progression, the pro-inflammatory signature in Chil1<sup>-/-</sup> KCs was further enhanced, while anti-inflammatory marker expression became dysregulated (revised Figure S5C). Together, these data indicate that Chi3l1 deficiency does not impair macrophage differentiation per se but biases KCs toward a partially pro-inflammatory, M1-like phenotype, providing additional context for the enhanced glycolytic flux observed in Chi3l1-deficient macrophages (Revised manuscript, page 7-8, line 307-321).

      (7) The authors use the PDK activator PS48 and the ATP synthase inhibitor oligomycin to argue that increased glycolytic flux at the expense of OXPHOS promotes KC death. However, given the high energy demands of KCs and the fact that OXPHOS yields 15-16 times more ATP per glucose molecule than glycolysis, the increased apoptosis observed in Figure 4C-F could primarily reflect energy deprivation rather than a glycolysis-specific mechanism.

      We thank the reviewer for highlighting this important point. We agree that KCs are highly metabolically active and that perturbations of OXPHOS can influence overall cellular energy balance. As noted in our response to comment #3, we further performed glycolysis inhibition assay by 2-DG in vivo, the protection of KCs observed following 2-DG in vivo (Revised Figure 4E-G) further provides evidence that increased glycolytic flux is not merely correlated with, but functionally contributes to KCs loss in

      MASLD.

      (8) In Figure 1C, KC numbers are significantly reduced after 4 and 16 weeks of HFHC diet in WT male mice, yet no comparable reduction is seen in Clec4Cre control mice, which should theoretically exhibit similar behavior under identical conditions.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the concerns raised in the public review, the authors should:

      (1) Reassess their conclusions using the same panels in flow and microscopy, e.g., the combination of CLEC4F, TIM4, and IBA1. This will allow resKCs (CLEC4F+TIM4+IBA1+), moKCs (CLEC4F+TIM4-IBA1+), and moMFs (CLEC4F-TIM4-IBA1+) to be accurately defined and hence their viability and numbers correctly assessed.

      We thank the reviewer for this insightful suggestion. In our flow cytometry analysis, we did not detect a CD45<sup>+</sup>CD11b<sup>low</sup>F4/80<sup>hi</sup>TIM4<sup>low</sup> population, indicating that monocyte-derived KCs (moKCs) have not emerged in our model at this stage. To more accurately quantify resident KCs (resKCs) in the current study, we replaced CLEC4F with TIM4 staining and enumerated TIM4<sup>+</sup>as well as TIM4<sup>+</sup>TUNEL<sup>+</sup> cells. These data were highly consistent with CLEC4F<sup>+</sup>TUNEL<sup>+</sup>cell counts, confirming that moKCs are not involved in KCs death during early MASLD (Revised Figure 1A,B,E,F).

      (2) Investigate why the number of KCs in controls and MASLD are so distinct between Figures 1 and 6.

      We appreciate the reviewer’s suggestions. Like we explained above, Cre insertion promotes KCs self-renewal (Revised manuscript, Figure S8). This enhanced proliferative capacity likely accounts for the relative preservation of KCs numbers in Clec4f-Cre mice during HFHC feeding, explaining the apparent discrepancy with WT mice (Revised manuscript, Figure 6D-E).

      (3) Normalise the tunel+ cells based on the number of KCs in PP vs PC regions.

      After normalizing KCs death to KCs numbers in periportal (PP) versus pericentral (PC) regions, we found the proportion was significantly higher in PV regions compared to CV regions at 8 weeks of HFHC feeding. We have therefore revised our texts. (Revised manuscript, page 5, lines 194-201).

      (4) Demonstrate the viability of KCs in vitro across conditions.

      To confirm the identity and health of isolated KCs in our in vitro studies, we show that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      (5) Confirm previous studies demonstrating different degrees of KC loss depending on the model of MASLD.

      We thank the reviewer for highlighting this point. Consistent with previous studies, KCs loss has been reported to varying degrees depending on the MASLD model used, reflecting the heterogeneity of hepatic macrophages, marker choice, mouse husbandry, and diet regimen. For example, in a 6-week MCD feeding model, ~10% of CLEC4F<sup>+</sup> KCs were TUNEL<sup>+</sup> (Figure 4A, Ref. 9). Another 6-week MCD study reported a drop from 66% to 26% TIM4<sup>+</sup> KCs (Figure 2A, Ref. 12). In an HFD model, TIM4<sup>+</sup> KCs decreased by ~20% after 16 weeks (Figure 1G, Ref. 1). In a Western diet model, TIM4<sup>+</sup>KCs decreased by >50% at 36 weeks (Figures 1J and 2C, Ref. 2). Together, these studies underscore the model-dependent nature of KCs loss and highlight the importance of experimental context and marker selection when assessing KCs dynamics in MASLD. We have included these studies in our discussion section (Revised manuscript, page 9-10, line 393-402)

      (6) Demonstrate in vivo that loss of CHIL1 drives further glycolysis in KCs.

      In Figure 6G-H of our previous study, we showed that Chi3l1 deficiency leads to more glucose uptake by KCs in vivo whereas suppelementing KO mice with recombinant Chi3l1 will significantly reduced glucose uptake by KCs through treating mice with a fluorescent glucose analog 2-NBDG. We included the related figure here as Author response image 3.

      Author response image 3.

      Chi3l1 limits glucose uptake by Kupffer cells in vivo. (A) Measurement of 2-NBDG (a fluorescent glucose analog) uptake by KCs in vivo. WT and Chil1<sup>-/-</sup> mice, either untreated or supplemented with rChi3l1, were injected intraperitoneally with 12 mg/kg 2-NBDG. After 45mins, KCs were isolated and glucose uptake assessed by spectrophotometry. (B) Representative immunofluorescence images of liver sections stained for TIM4 (red) and 2-NBDG uptake (green) to visualize glucose uptake by KCs in situ. Scale bar = 10 µm (zoom). Quantification is shown as the percentage of TIM4<sup>+</sup> cells that are also 2-NBDG<sup>+</sup>. Representative images were shown in B. One-way ANOVA was performed in A, B. P value is as indicated.

      (7) There is no mention of the publication of the metabolomics dataset; this should be released with the manuscript.

      We included the raw metabolomics dataset as Table S1 and S2 now.

      Reviewer #3 (Recommendations for the authors):

      (1) Methods: Reconsider which methods are described in the main text versus the Supplementary Information to improve readability and consistency.

      Thank you for your valuable suggestion. We have reevaluated and adjusted the placement of the methods section between the main text and the supplementary materials.

      (2) Line 34: Check for grammar issues.

      L34 has been revised as follows : Additionally, using Chi3l1-deficient mice, we further demonstrated that increased glucose utilization accelerates KCs death in vivo.

      (3) Lines 101, 110: Explicitly reference the corresponding Supplementary Methods sections.

      We have included the references for these two methods sections (Revised supplementary materials and methods, Line 30, 65, respectively).

      (4) Figure 2: Iba1 marks all macrophages, not only monocyte-derived macrophages; both figure and text (line 205) require correction.

      We have corrected Iba1 represent hepatic macrophages including both KCs and MoMFs (Revised Figure 2C, manuscript page 5, line 182).

      (5) Line 218-219: Avoid overinterpretation, as only KCs, hepatocytes, and hepatic stellate cells were assessed - not all hepatic populations.

      We appreciate the reviewer’s valuable suggestion and rephrased our description accordingly (Revised manuscript, page 5, line 186-189).

      (6) Line 262: Use abbreviations consistently throughout the manuscript.

      We have gone through the whole manuscript and double checked the abbreviations.

      (7) Line 264: Include the palmitic acid (PA) concentration used.

      We included 800 µM PA in the revised manuscript (Revised manuscript, page 6, line 250).”

      (8) Lines 316-317: Check for grammar errors.

      Grammar errors are checked (Revised manuscript, page 8, line 340-341).

      (9) Line 337-338: See comment above on gating strategy.

      We updated gating strategy accordingly (Revised manuscript, page 9, line 361-362).

      (10) Line 343-344: Note that Chi3l1 is not exclusively expressed by KCs.

      We rephrased our words accordingly (Revised manuscript, page 9, line 374-378).

      (11) Lines 355-358: The statement that "sustained glycolytic hyperactivation culminates not in sustained activation, but in apoptotic cell death" is unsupported by data or literature, as macrophage polarization was not analyzed in this study.

      We removed the statement from the revised manuscript.

      (12) Lines 375-379: Rephrase to clarify that while KCs are metabolically active and glucose-demanding, excessive glycolytic flux accelerates apoptosis.

      We have rephrased to clarify (Revised Manuscript, page 10, lines 405-407).

      (13) Lines 375-385 & 387-397: Consolidate overlapping statements for conciseness and coherence.

      We have consolidate the overlapping statements (Revised manuscript, page 10, lines 405-425).

      Reference

      Daemen, S. et al. Dynamic Shifts in the Composition of Resident and Recruited Macrophages Influence Tissue Remodeling in NASH. Cell Rep 34, 108626, doi:10.1016/j.celrep.2020.108626 (2021).

      Remmerie, A. et al. Osteopontin Expression Identifies a Subset of Recruited Macrophages Distinct from Kupffer Cells in the Fatty Liver. Immunity 53, 641-657.e614, doi:10.1016/j.immuni.2020.08.004 (2020).

      Ozer, J., Ratner, M., Shaw, M., Bailey, W. & Schomaker, S. The current state of serum biomarkers of hepatotoxicity. Toxicology 245, 194-205, doi:10.1016/j.tox.2007.11.021 (2008).

      Malhi, H. & Gores, G. J. Molecular mechanisms of lipotoxicity in nonalcoholic fatty liver disease. Semin Liver Dis 28, 360-369, doi:10.1055/s-0028-1091980 (2008).

      Ibrahim, S. H., Hirsova, P. & Gores, G. J. Non-alcoholic steatohepatitis pathogenesis: sublethal hepatocyte injury as a driver of liver inflammation. Gut 67, 963-972, doi:10.1136/gutjnl-2017-315691 (2018).

      Kerr, J. F., Wyllie, A. H. & Currie, A. R. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. British journal of cancer 26, 239-257, doi:10.1038/bjc.1972.33 (1972).

      Poon, I. K., Lucas, C. D., Rossi, A. G. & Ravichandran, K. S. Apoptotic cell clearance: basic biology and therapeutic potential. Nat Rev Immunol 14, 166-180, doi:10.1038/nri3607 (2014).

      Krenkel, O. & Tacke, F. Liver macrophages in tissue homeostasis and disease. Nat Rev Immunol 17, 306-321, doi:10.1038/nri.2017.11 (2017).

      Tran, S. et al. Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.e625, doi:10.1016/j.immuni.2020.06.003 (2020).

      O'Neill, L. A. & Pearce, E. J. Immunometabolism governs dendritic cell and macrophage function. J Exp Med 213, 15-23, doi:10.1084/jem.20151570 (2016).

      Vander Heiden, M. G. & DeBerardinis, R. J. Understanding the Intersections between Metabolism and Cancer Biology. Cell 168, 657-669, doi:10.1016/j.cell.2016.12.039 (2017).

      Zhang J, Wang Y, Fan M, Guan Y, Zhang W, Huang F, Zhang Z, Li X, Yuan B, Liu W, Geng M, Li X, Xu J, Jiang C, Zhao W, Ye F, Zhu W, Meng L, Lu S, Holmdahl R. Reactive oxygen species regulation by NCF1 governs ferroptosis susceptibility of Kupffer cells to MASH. Cell Metab. 2024 Aug 6;36(8):1745-1763.e6. doi: 10.1016/j.cmet.2024.05.008. Epub 2024 Jun 7. PMID: 38851189.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodilatory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Ca<sub>v</sub> nanodomains mimicking physiological conditions and functional proof-of-concept validation in mouse aortic rings.

      We thank the reviewer for acknowledging the strength of the different aspects investigated in our study.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      In the attempt to improve structural insight for the presented mutagenesis data, we have used Alphafold3 (AF3; Abramson et al., 2024) to generate models of the I308A, L312M and A316P substitutions and repeated the docking for each (Fig. R1). According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift).

      Author response image 1.

      Alphafold3 models of BK I308A, L312M, and A316P with α-Mangostin docked to the mutant structures. The upper row shows an overview of the mutant pore helices (AF3 models) used for molecular docking. The lower row shows the binding region with the wildtype structure overlaid in gray. Only 3 helices are shown for clarity.

      Although these results provide interesting tentative explanations for the effect of the mutations and conclusions from AF3 models become increasingly robust, we think that definitive statements of their mechanistic contributions would require experimental studies of mutant channels, i.e., cryo-EM or crystallography, that are beyond our means. Therefore, we have decided not to include this data in the manuscript; however, it is accessible for the interested reader within the public review. Hopefully, as cryo-EM structures have been obtained for the wildtype channel, there will be studies on mutations of this gating-relevant S6 segment in the future.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      We are not sure if a global elevation of cellular calcium concentration would be informative. We rather expect that the relevant local Ca<sup>2+</sup> elevation would occur as sparks in the BK-Ca<sub>v</sub> nanodomains, close to the membrane. We would anticipate a change in spark duration, as the Ca<sup>2+</sup> inward current would be stopped faster by the enhanced repolarization via a-Mangostin activated BKα/β1 channels. This would require fast Ca<sup>2+</sup> imaging acquisition speed to capture spark activity. We concur that this would be an informative experiment to investigate a more native situation. However, we would have to accomplish such methodologically challenging measurements in a separate project, which could fruitfully be combined with a more extensive characterization of aortic contraction as also suggested in the following remark (3).

      (3) The work has an impact on ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter, however, are preliminary in nature.

      We completely agree with the reviewer that there is ample room for further studies that could characterize different tissues important in blood pressure regulation (such as resistance arteries), elucidate even more physiological detail (such as modulatory effects of the endothelium), or look deeper into the pharmacology using chemically altered Mangostin derivatives. While we very much like this to happen in future projects, in this study we focused on the functional aspects of a-Mangostin in BK channel gating. We present our tension recordings as a proof-of-concept to underline the activity of a-Mangostin in native tissues, and we clearly show the importance of the BK channel by using iberiotoxin as a specific inhibitor which impressively abolished relaxation.

      References:

      Abramson, J. et al. (2024) “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, 630(8016), pp. 493–500. Available at: https://doi.org/10.1038/s41586-024-07487-w.

      Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion through the effect of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 sununits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates the contracture promoted by noradrenaline. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      Strengths:

      The set of results combining electrophysiological measurements, mutagenesis, and molecular docking reveals α-mangostin as a potent activator of BK channels and the putative location of the α-mangostin binding site. Moreover, experiments conducted on aortic preparations from mice suggest that α-mangostin can aid in developing drugs to treat a myriad of diverse diseases involving the BK channel.

      We thank the reviewer for pointing out the significance of our study.

      Weaknesses:

      Major:

      (1) Although the results indicate that α-mangostin is modifying the closed-open equilibrium, the conclusion that this can be due to a stabilization of the voltage sensor in its active configuration may prove to be wrong. It is more probable that, as has been demonstrated for other activators, the α-mangostin is increasing the equilibrium constant that defines the closed-open reaction (L in the Horrigan, Aldrich allosteric gating model for BK). The paper will gain much if the authors determine the probability of opening in a wide range of voltages, to determine how the drug is affecting (or not), the channel voltage dependence, the coupling between the voltage sensor and the pore, and the closed-open equilibrium (L).

      We would like to take the opportunity to clarify this potential misunderstanding. In our manuscript, we have discussed three mechanistic explanations for the Mangostin activation: (1) an electrostatic effect at the selectivity filter, (2) structural and electrostatic changes of S6 that facilitate the opening of a putative lower gate, and (3) hydrophobic gating, i.e., counteracting dewetting of the pore. All possibilities would impact S6 and lower the free energy for pore opening, and we concur that therefore Mangostin most likely affects the closed-open equilibrium (L) of the BKα channel.

      The sentence at the original lines 470-471, “(…) caused by an enhanced shift of the closed-open equilibrium toward the open state, such as the stabilization of the voltage sensor in an active conformation” refers to the observation that the presence of the β1 subunit enhances this closed-open shift. The stabilization of the voltage sensor domain was mentioned as one example of how it achieves this. We recognize that this example was an unfortunate choice, as β1 rather facilitates Ca<sup>2+</sup>-dependent allosteric pore opening unrelated to the discussed mechanisms of Mangostin. We have therefore removed this statement.

      As to the suggestion to dissect the effect of Mangostin on C, D, and L, we agree with the reviewer that this would surely add to a full biophysical characterization. However, in our project, we strove towards including more experiments showing the physiological implications of Mangostin activation to emphasize the implication for vasodilation. We hope the reviewer understands that, with limited resources, this came at the expense of a full investigation of the different gating components, which could pose a separate project by itself.

      (2) Apparently, the molecular docking was performed using the truncated structure of the human BK channel. However, it is unclear which one, since the PDB ID given in the Methods (6vg3), according to what I could find, corresponds to the unliganded, inactive PTK7 kinase domain. Be as it may, the apo and Ca2+ bound structures show that there is a rotation and a displacement of the S6 transmembrane domain. Therefore, the positions of the residues I308, L312, and A316 in the closed and open configurations of the BK channel are not the same. Hence, it is expected that the strength of binding will be different whether the channel is closed or open. This point needs to be discussed.

      We apologize for the typing error and thank the reviewer for indicating this erroneous PDB ID. (“6vg3”). It should have read PDB ID 6v3g as in the legend to Fig. 4B. The reviewer appropriately points out that there are differences in the S6 segment addressed in our study between the two available cryo-EM structures obtained in the presence (PDB ID 6v38) and absence of Ca<sup>2+</sup> (PDB ID 6v3g) (Tao and MacKinnon, 2019).

      We had actually performed the docking with both structures, but chosen to show the Ca<sup>2+</sup>-free structure to better visualize the I308 position. a-Mangostin is found in the same S6 region in both, not obstructing the K<sup>+</sup> conduction pathway. The binding energies of the favored poses are very similar; the binding energy in the best-ranking conformational cluster in the Ca<sup>2+</sup>-bound structure even was slightly lower (-8.64 kcal mol<sup>-1</sup>) than in the docking with the Ca<sup>2+</sup>-free channel (-8.58 kcal mol<sup>-1</sup>; Fig. 4B), which may not be a relevant difference.

      We compared the residue interactions in both dockings (Author response table 1). S317 and Y318, which did not reduce the shift in V<sub>½</sub> upon substitution, were not predicted to contact a-Mangostin in either structure. In both structures, L312 and F315 were predicted to interact in virtually all poses analyzed. In the docking to the Ca<sup>2+</sup>-free state, also I308 was predicted to interact in 17/20 poses, while contacts to A316 occurred in 5/20 poses. In the Ca<sup>2+</sup>-bound state, predicted interactions shifted from I308 (which is expected as it is buried in the protein) to A316, and the isoprenyl moiety close to I308 rotated downwards. This could indicate that a-Mangostin adopts a more horizontal position following the upward reorientation of S6 in the Ca<sup>2+</sup>-bound state when the channel moves from one to the other conformation (Fig. S4).

      Author response table 1.

      Number of interactions of S6 residues in 20 analyzed α-Mangostin poses in the molecular dockings to the Ca2+-free and Ca2

      These docking results are consistent with our functional measurements. Recent structures of the BK/γ1 complex showed that the VSD and Ca<sup>2+</sup>-bowl are stabilized in an active-like conformation that corresponds to the conformation seen in the Ca<sup>2+</sup>-bound state (Kallure et al., 2023; Yamanouchi et al., 2023; Redhardt, Raunser and Raisch, 2024), indicating that very likely the Ca<sup>2+</sup>-bound and Ca<sup>2+</sup>-free structures indeed represent open and closed conformations of the channel. We observed that α-Mangostin can bind to both of these states to activate the channel (Fig. 3C, D), showing the presence of a binding site in both conformations. Further, α-Mangostin induced a left-shift in V<sub>½</sub> also in higher Ca<sup>2+</sup> concentration (Fig. 2D), indicating that it still binds to and activates the channel after the conformational change in S6. As we could not determine affinity for the mutants due to limited solubility, we have no information on the nature of the contribution of the substitutions, i.e., reduced binding or allosteric effect. As I308 is buried in the Ca<sup>2+</sup>-bound state, its contribution is likely mostly allosteric. We have also proposed dewetting as possible activation mechanism, which we expect to be less sensitive to the exact pose of a molecule (as shown for NS11021, Nordquist et al., 2024). Therefore, α-Mangostin could, e.g., change solvent accessibility of the I308 sidechain, energetically favoring the buried (open) state.

      We have now included both dockings and Author response table 1 in Fig. S4, and we have added passages to the results section (starting at line 373) and discussion section (starting at lines 544, 588).

      Minor:

      (1) From Figure 3A, it is apparent that the increase in Po is at the expense of the long periods (seconds) that the channel remains closed. One might suggest that α-mangostin increases the burst periods. It would be beneficial if the authors measured both closed and open dwell times to test whether α-mangostin primarily affects the burst periods.

      We thank the reviewer for this valuable suggestion, which we have implemented. In our single channel measurements shown in our original Fig. 3 we have not observed burst behavior of the BKɑ channels. This can be explained by the fact that we measured in resting condition (100 nM free Ca<sub>i</sub></sup>2+</sup>) and with rather mild depolarisation (+40 mV) where Po was very low. We have therefore analyzed measurements in 5 µM free a<sub>i</sub></sup>2+</sup> where we recorded sufficient burst activity also in the basal state.

      The burst analysis showed that ɑ-Mangostin indeed prolongs bursts and shortens the interburst closures. Within bursts, both closed times and open times were increased, and we recorded a higher number of opening events per burst. We conclude that ɑ-Mangostin acts in both the closed and the open state, where it slows open-closed transitions resulting in less flicker, and stabilizes the open state via longer open times and a higher probability for closed-open transitions.

      We now show this data in Fig. 3D-F and Table S8, and have accordingly added passages to the results section (starting at line 285), the discussion (line 510), and the methods section (starting at line 746).

      (2) In several places, the authors make similarities in the mode of action of other BK activators and α-mangostin; however, the work of Gessner et al. PNAS 2012 indicates that NS1619 and Cym04 interact with the S6/RCK linker, and Webb et al. demonstrated that GoSlo-SR-5-6 agonist activity is abolished when residues in the S4/S5 linker and in the S6C region are mutated. These findings indicate that binding of the agonist is not near the selectivity filter, as the authors' results suggest that α-mangostin binds.

      We will gladly clarify our ideas concerning the binding sites of other activators and ɑ-Mangostin. We first hypothesized that ɑ-Mangostin may share characteristics and mode of action with the class of negatively charged activators (NCA) that we have described before (Schewe et al., 2019). NCA were found to occupy a common fenestration site that is located close to the selectivity filter in TREK K2P channels, and in this manuscript we have shown by THexA competition and mutagenesis experiments that ɑ-Mangostin also binds in this fenestration region in TREK-1 channels (Fig. S3).

      The existence of this common NCA binding site was also proposed for BK channels, as a docking placed the NCA NS11021 in an equivalent binding region, and, among others, NS11021 and GoSlo-SR-5-6 competed with THexA for binding in the pore (Schewe et al., 2019). These results were indeed not fully in agreement with the proposed binding site of GoSlo-SR-5-6 in Webb et al. (2015), although the most effective (double) mutants were located at S317 and I323, at the intracellular end of the cleft between neighboring S6 segments. In this manuscript, we have shown that α-Mangostin is present in the pore of BK channels by molecular docking, a THexA competition assay, and two mutations that reduced the shift in V<sub>½</sub> induced not only by ɑ-Mangostin but also by GoSlo-SR-5-6 (Fig. 4). While the docking was rather a starting point, both functional tests argue against a binding site in the S4/5 linker/S6C region; however, allosteric mechanisms could still reduce activation also in mutants in the S4/5 linker/S6C region far from the pore binding region proposed by us in the 2019 study and the present manuscript.

      To summarize, we did not mean to imply that all BK activators should bind to this site, especially if they are not part of the NCA class (as NS1619, Cym4, as well as BC5, whose different binding site enabled us to use it as a control in our THexA competition assay). However, the cleft close to gating relevant S6 residues may well pose a region especially susceptible to modulator binding (as BL-1249, GoSlo-SR-5-6, and ɑ-Mangostin). We have moved, respectively separated, the initial GoSlo references from the reference to the pore binding site in the paragraph (lines329, 358) to improve clarity.

      (3) The sentence starting in line 452 states that there is a pronounced allosteric coupling between the voltage sensors and Ca2+ binding. If the authors are referring to the coupling factor E in the Horrigan-Aldrich gating model, the references cited, in particular, Sun and Horrigan, concluded that the coupling between those sensors is weak.

      We are grateful for the opportunity to improve this passage. We intended to express that observed effects (in this case the shift in V<sub>½</sub>) are pronounced around 1 µM Ca<sup>2+</sup>. As the reviewer states, the coupling factor between the voltage and calcium sensors (E; 2.4) is weak compared to the coupling of Ca<sup>2+</sup> (C; 8) and voltage (D; 25) to the pore in the Horrigan-Aldrich model. However, the shape of the Ca<sup>2+</sup>-dependence of V<sub>½</sub> cannot be completely described when E is neglected, with the highest difference around 1-2 µM Ca<sup>2+</sup> (Horrigan and Aldrich, 2002). Deletion of the gating ring underlines the allosteric sensor coupling (Clay, 2017). This together with the steep Ca<sup>2+</sup>-dependence in this concentration range (meaning high Po changes upon occupancy increase; Cui, Cox and Aldrich, 1997) explains the higher apparent activation, visible as the higher shift in V<sub>½</sub> observed at the 1 µM Ca<sup>2+</sup>. Speaking with the model of Sun and Horrigan (2022), the suppressing “molecular logic gate” is already relieved by the presence of intermediate Ca<sup>2+</sup>, and the direct “gating lever” pathway via voltage acts synergistically and achieves the observed higher V<sub>½</sub> shift upon depolarization. We have adapted the sentence and separated the citations for better understanding (lines 503-507).

      References:

      Clay, J.R. (2017) “Novel description of the large conductance Ca2+-modulated K+ channel current, BK, during an action potential from suprachiasmatic nucleus neurons,” Physiological Reports, 5(20), p. e13473. Available at: https://doi.org/10.14814/phy2.13473.

      Cui, J., Cox, D.H. and Aldrich, R.W. (1997) “Intrinsic Voltage Dependence and Ca2+ Regulation of mslo Large Conductance Ca-activated K+ Channels,” Journal of General Physiology, 109(5), pp. 647–673. Available at: https://doi.org/10.1085/jgp.109.5.647.

      Horrigan, F.T. and Aldrich, R.W. (2002) “Coupling between voltage sensor activation, Ca2+ binding and channel opening in large conductance (BK) potassium channels,” The Journal of General Physiology, 120(3), pp. 267–305. Available at: https://doi.org/10.1085/jgp.20028605.

      Kallure, G.S. et al. (2023) “High-resolution structures illuminate key principles underlying voltage and LRRC26 regulation of Slo1 channels.” bioRxiv, p. 2023.12.20.572542. Available at: https://doi.org/10.1101/2023.12.20.572542.

      Nordquist, E.B., Jia, Z., Chen, J., 2024. “Small Molecule NS11021 Promotes BK Channel Activation by Increasing Inner Pore Hydration.” J. Chem. Inf. Model. 64, 7616–7625. https://doi.org/10.1021/acs.jcim.4c01012

      Redhardt, M., Raunser, S. and Raisch, T. (2024) “Cryo-EM structure of the Slo1 potassium channel with the auxiliary γ1 subunit suggests a mechanism for depolarization-independent activation,” FEBS Letters, 598(8), pp. 875–888. Available at: https://doi.org/10.1002/1873-3468.14863.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Sun, L. and Horrigan, F.T. (2022) “A gating lever and molecular logic gate that couple voltage and calcium sensor activation to opening in BK potassium channels,” Science Advances, 8(50), p. eabq5772. Available at: https://doi.org/10.1126/sciadv.abq5772.

      Tao, X. and MacKinnon, R. (2019) “Molecular structures of the human Slo1 K+ channel in complex with β4,” eLife 8, p. e51409. Available at: https://doi.org/10.7554/eLife.51409.

      Webb, T.I. et al. (2015) “Molecular mechanisms underlying the effect of the novel BK channel opener GoSlo: Involvement of the S4/S5 linker and the S6 segment,” Proceedings of the National Academy of Sciences, 112(7), pp. 2064–2069. Available at: https://doi.org/10.1073/pnas.1400555112.

      Yamanouchi, D. et al. (2023) “Dual allosteric modulation of voltage and calcium sensitivities of the Slo1-LRRC channel complex,” Molecular Cell, 83(24), pp. 4555-4569.e4. Available at: https://doi.org/10.1016/j.molcel.2023.11.005.

      Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protective properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protective effects.

      Strengths:

      The authors present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio-protective effects.

      We sincerely thank the reviewer for appraising the achievements of our study.

      Weaknesses:

      The identification of the binding site is not the strongest point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produced by a-mangostin. However, these experiments do not demonstrate binding to these sites, and could be explained by allosteric effects on gating induced by the mutations themselves.

      We are aware that our functional data are unfortunately not sufficient to clearly distinguish between effects due to affinity loss or due to allosteric mechanisms. Our attempts to generate complete dose–response curves for the mutants to determine accurate apparent IC<sub>50</sub> values were unfortunately limited by the solubility of the compound. Consequently, we have avoided making claims about affinity loss in the mutant analysis, and have instead only reported the reduction in potency, expressed as the shift in V<sub>½</sub>. To reduce confounding effects from the mutations themselves, we selected substitutions that preserved the most wildtype-like GV-relationships, based on the extensive mutagenesis work of (Chen, Yan and Aldrich, 2014). We address this matter also in our answer to Recommendation (6) below, and we have replaced the word “binding” in the title of the manuscript. Nevertheless, we consider the proposed binding region to be well supported by the THexA competition experiments in combination with molecular docking, even though the specific mechanistic contributions of individual residues cannot yet be resolved.

      Reviewer #3 (Recommendations for the authors):

      (1) Natural xanthones as α-Mangostin induce vasorelaxation via binding to key gating residues in the S6 domain of BK channels.

      (2) If α-Mangostin occupies a similar binding site to quaternary ammoniums, what is the explanation for not observing a reduction in the single-channel current (fast blocking effect)? The α-Mangostin site proposed here is in a region of the channel that should occlude ion permeation. The authors should discuss possible explanations for this apparently contradictory observation.

      As the reviewer states, we indeed have not observed a reduced single channel amplitude in any measurement. The THexA competition assay showed that ɑ-Mangostin is present in the pore cavity and interferes with THexA access to its binding site. However, we do not think that their binding sites are similar, as QA ions bind directly below the filter entrance to block permeation, while our studies suggest that ɑ-Mangostin binds in the upper portion of the cleft between S6 helices. In this position, it would clearly overlap with the QA binding site and hinder access, but not block permeation. We would therefore not expect to see an amplitude reduction by intermittent α-Mangostin block. Consistently, all binding poses in our dockings were close to the cavity wall, without interfering with the central ion conduction pathway. To better illustrate this, we have added updated intracellular views of the dockings in the Ca<sup>2+</sup>-free and Ca<sup>2+</sup>-bound state (which we have also now included as suggested by another reviewer) to the supplementary information (Fig. S4A).

      (3) In Figure 2D, it is difficult to appreciate the differences between the symbols representing the G-V relationships of BKa channels at different intracellular Ca concentrations, before and after activation with 10 μM a-Mangostin. A clearer distinction between the curves would help to interpret the data more easily.

      We thank the reviewer for the suggestion to improve figure accessibility. We have changed the line appearance for better discrimination of the overlying portions.

      (4) Both THexA and TPA block BK channels through voltage and state-dependent mechanisms. Therefore, their apparent affinity could change if a-Mangostin simply increases open probability or alters dwell times rather than physically blocking access to the binding site.

      The reviewer addresses valid limitations that can affect the meaningfulness of competition experiments under certain conditions. However, we think that this does not apply to our results:

      Previous studies have shown that the voltage dependence of quaternary ammonium blockers up to C<sub>10</sub> is rather weak in BK channels, and only a slight increase in block is present in the voltage range +30 mV to +100 mV (Li and Aldrich, 2004; Thompson and Begenisich, 2012). Hence, THexA voltage dependence has already reached a plateau in the competition assay (at +40 mV), and its voltage dependence would have little effect on our results.

      Controversy exists about the nature of the state dependence of different quaternary ammonium blockers, but TBA is often recognized as an open channel blocker of BK channels, which probably also applies to THexA (Wilkens and Aldrich, 2006; Tang, Zeng and Lingle, 2009; Thompson and Begenisich, 2012; Posson, McCoy and Nimigean, 2013). Assuming such an open-channel block, apparent IC<sub>50</sub> values would be inversely proportional to Po. The THexA IC<sub>50</sub> was about 80 nM in the basal state, when Po is very low (0.024 at +40 mV as derived from the GV-relationship); an increase of open dwell times, respectively Po, in the presence of α-Mangostin to, e.g., 0.3 would therefore lead to a ≈10-fold decrease in apparent IC<sub>50</sub>. However, the apparent THexA IC<sub>50</sub> strongly increased rather than decreased (more than 20-fold to around 1.6 µM). This cannot arise from Po change and must reflect the altered access of THexA to its binding site caused by α-Mangostin. Assuming a pure closed channel block where apparent IC<sub>50</sub> would correlate with the closed times, an increase of about 1.4-fold is expected. However, we recorded a much stronger 20-fold increase. Therefore, we are convinced that we have conclusively shown that α-Mangostin is present in the BK pore irrespective of the state dependence of THexA block.

      (5) The pH dependence of the V1/2 shift supports the idea that α-Mangostin becomes more negatively charged at higher pH (enhancing its effect.) However, although the data are consistent with this interpretation, additional controls such as using a non-ionizable analog or assessing solubility changes with pH would be needed to confirm that the shift is caused specifically by ionization of α-Mangostin and not by indirect pH effects on channel gating.

      We agree with the reviewer that the pH experiment by itself is not sufficient to clearly tie the existence of a charge to a possible activation mechanism. We still think that this is an interesting observation and should be made known, as we have investigated the mechanism of negatively charged activators in different K<sup>+</sup> channel families before (Schewe et al., 2019). Unfortunately, we do not have access to uncharged derivatives mimicking the 3D conformation. From the commercially available substances, the bare xanthone backbone is completely insoluble in water. We have therefore tested the derivative 3-hydroxyxanthone as example with a minimal number of hydroxyl substituents (Author response image 2, Author response table 2 ). The 3-hydroxyxanthone indeed shows reduced activation compared to α-Mangostin. The shift in V<sub>½</sub> induced by 10 µM 3-hydroxyxanthone was only 14.99 ± 5.67 mV (≈50 mV for α-Mangostin). This supports that the presence of several (potentially) charged substituents is important for the activation mechanism. However, we have no knowledge about the efficacy of the compound or the local pK<sub>a</sub> of the different hydroxyl groups. As the reviewer stated, systematic chemical modifications would be necessary to elucidate the importance of the charged substituent number and positions, which is not within our capabilities.

      Author response image 2.

      Activation of BKα by 3-hydroxyxanthone. (A) GV-relationship before and after application of 10 µM 3-hydroxyxanthone. (B) V<sub>½</sub> before and after application of 10 µM 3-hydroxyxanthone compared to α-Mangostin and the resulting difference in V<sub>½</sub> (ΔV<sub>½</sub>). Measurements were conducted as described in the main manuscript with 100 nM free Ca<sub>i</sub><sup>2+</sup>.

      Author response table 2.

      Comparison of the V<sub>½</sub> ± SEM and ΔV<sub>½</sub> ± SEM before and after activation by 10 µM α-Mangostin or 10 µM 3-hydroxyxanthone in BKα channels. Unpaired t-test, two-tailed P values (α=0.05)

      (6) The reduced V1/2 shifts observed in the I308A, L312M, and A316PP mutants may result from intrinsic gating alterations rather than a true loss of a-Mangostin binding. The GoSlo-SR-5-6 control is informative, but the persistence of activation in A316P does not fully resolve this. A more convincing test would be employing double or triple mutants.

      As stated above, we acknowledge that our functional data do not allow us to definitively separate effects arising from a true loss of binding affinity from those due to potential allosteric effects. We tried to minimize intrinsic gating alteration brought by substitutions by not conducting a pure alanine or cysteine scanning mutagenesis. Instead, substitutions were chosen to be closest to the wildtype GV-relationship in (Chen, Yan and Aldrich, 2014) where possible. While L312M was virtually identical to the wildtype, A316P showed a change in slope in high Ca<sup>2+</sup> concentrations, which could indicate a changed voltage sensitivity. Additionally, A316P completely abolished α-mangostin activation. We therefore also used A316G to ensure that the channel is functional and retains voltage sensitivity, even if its V<sub>½</sub> was shifted stronger. As we have conducted paired measurements and assessed the V<sub>½</sub> before and after activation, we are confident that we can attribute a reduced shift to the reduced action of α-mangostin.

      Following the reviewer’s suggestion, we have generated and measured the double mutants I308A/L312M, I308A/A316G, and L312M/A316G (the triple mutant I308A/L312M/A316G did not produce measurable currents). The mutants I308A/L312M and I308A/A316G showed a moderate energy-additive effect and reduced the shift in V<sub>½</sub> by further ≈7 mV compared to the single mutation with the stronger shift. The combination L312M/A316G, however, did not further reduce the shift seen in the single mutations and did not even produce the shift induced by A316G alone.

      Author response image 3.

      Double Mutants I308A/L312M, I308A/A316G and L312M/A316G compared to the single mutations in the main manuscript. The V½ before and after activation with 10 µM α-Mangostin, the resulting shift in V½, and the GV-relationships are shown (n=6-7), measurements were made as in Fig. 4.

      Author response table 3.

      Summary of the V<sub>½</sub> before and after Mangostin activation and the resulting shifts in V<sub>½</sub> for the double mutants compared to the single mutants shown in the main manuscript.

      Following a suggestion by another reviewer, we have generated Alphafold3 (AF3) models for I308A, L312M and A316P and repeated the Mangostin docking. We learned that the mutations are all predicted to substantially impact the structure of the S6 helix, therefore altering the binding region, and A316P especially impacted the nature of residue interactions. This could be an explanation why the double mutants do not show a clear and consistent additive effect.

      Unfortunately, this outcome is not conclusive and the double mutants do not reveal further information compared to the single mutants. We have therefore decided not to include these measurements in the manuscript.

      As we do not know if our answers will be sent to all reviewers, we repeat the relevant part about the AF3 models here:

      (…) According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift). (…)

      (7) The subtraction approach used to isolate BK currents (difference before and after a-Mangostin) assumes that the compound affects only BK channels. However, a-Mangostin could also modulate Cav currents directly, as reported for other polyphenolic compounds. No vehicle (DMSO) control is shown.

      We agree with the reviewer that α-Mangostin could also modulate Ca<sub>v</sub> currents; however, this would not interfere with the conclusions drawn from this nanodomain experiment. We intended to show the overall current modulation by ɑ-Mangostin in the voltage range relevant for Ca<sub>v</sub>-BK coupling, as this would be the determinant for the membrane potential mediating the vasoactive effect. In native tissue, BK and Ca<sub>v</sub> channels (among others) would likewise contribute to the net membrane conductance, with BK channels being a major contributor when activated. In fact, a concomitant inhibition of Ca<sub>v</sub> channels could act synergistically in favor of vasodilation. This could therefore be a subject for the further investigation of potential ɑ-Mangostin targets. However, the fact that iberiotoxin prevented relaxation in aortic preparations conclusively showed that BK channels are the major player in native tissue.

      We have reformulated some sentences to prevent misunderstandings that we refer to isolated BK currents instead of α-Mangostin activated currents.

      DMSO controls were conducted and did not impact BK or Ca<sub>v</sub>1.2 currents or the aortic tissue contraction. We have added representative measurements as Fig. S6 and stated the DMSO concentration in the Methods section (line 655).

      (8) Most kinetic fits were obtained at strong depolarizations (around +100 mV), which limits how well these results can be extrapolated to physiological voltages. Although the BK-Cav experiments show facilitation between -50 and +50 mV, providing plots for activation and deactivation in that range would strengthen the physiological relevance.

      We thank the reviewer for this valuable suggestion. We now additionally show that the impact of ɑ-Mangostin on activation is high at lower depolarisation, indeed underlining its physiological relevance. To address the activation time course in a more physiological voltage range, we have used our measurements of BKɑ channels in 10 µM Ca<sub>i</sub></sup>2+</sup> (where the V<sub>½</sub> shift induced by ɑ-Mangostin is equal to 100 nM ca<sub>i</sub><sup>2+</sup>+; Fig. 2D). The outward currents already present in the lower voltage range under these conditions allowed us to fit a monoexponential function to the traces of 0 mV to 100 mV prepulses. The τ of activation decreased from 29.6 ± 3.1 ms at 0 mV to 2.4 ± 2 ms at +100 mV. After ɑ-Mangostin activation, the time course was accelerated, with a τ of activation of 9.5 ± 4.7 ms at 0 mV to 2 ± 0.6 ms at +100 mV. This faster activation was particularly effective in the lower voltage range far from high Po, e.g., ɑ-Mangostin caused a decrease of more than half of the τ of activation at +20 mV (from 12.2 ± 0.6 ms to 4.98 ± 1.6 ms).

      Our data consists of families of different prepulse voltages and a fixed repolarisation step (to -50 mV for 100 nM free Ca<sub>i</sub><sup>2+</sup>, and to -100 mV for 10 µM free Ca<sub>i</sub><sup>2+</sup>). Thus, we are not able to add plots for the voltage-dependence of deactivation in the same way as for activation. However, we can present the deactivation time constants of lower prepulse voltage steps that produce outward currents in symmetrical ion conditions with 10 µM free Ca<sub>i</sub></sup>2+</sup>. For -20 mV and +20 mV prepulse voltages, which better reflect physiological depolarisation, the deactivation time constant shows a 3-to 5-fold increase after ɑ-Mangostin activation.

      We now show the plot for the voltage dependence of activation in Fig. S2A and a bar graph for activation/ deactivation time constants at +20 mV as Fig. S2B; data are summarized in Table S5. We hope this adds to illustrating the effect of ɑ-Mangostin under physiological conditions.

      (9) Minor: In several parts of the paper, induced shifts to negative voltages are referred to "leftward shifts". It would be useful to be consistent and employ a more specific reference to negative or positive directions.

      We thank the reviewer for the careful reading and have harmonized the terminology.

      References

      Chen, X., Yan, J. and Aldrich, R.W. (2014) “BK channel opening involves side-chain reorientation of multiple deep-pore residues,” Proceedings of the National Academy of Sciences, 111(1), pp. E79–E88. Available at: https://doi.org/10.1073/pnas.1321697111.

      Li, W. and Aldrich, R.W. (2004) “Unique Inner Pore Properties of BK Channels Revealed by Quaternary Ammonium Block,” Journal of General Physiology, 124(1), pp. 43–57. Available at: https://doi.org/10.1085/jgp.200409067.

      Posson, D.J., McCoy, J.G. and Nimigean, C.M. (2013) “The voltage-dependent gate in MthK potassium channels is located at the selectivity filter,” Nature Structural & Molecular Biology, 20(2), pp. 159–166. Available at: https://doi.org/10.1038/nsmb.2473.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Tang, Q.-Y., Zeng, X.-H. and Lingle, C.J. (2009) “Closed-channel block of BK potassium channels by bbTBA requires partial activation,” The Journal of General Physiology, 134(5), pp. 409–436. Available at: https://doi.org/10.1085/jgp.200910251.

      Thompson, J. and Begenisich, T. (2012) “Selectivity filter gating in large-conductance Ca2+-activated K+ channels,” Journal of General Physiology, 139(3), pp. 235–244. Available at: https://doi.org/10.1085/jgp.201110748.

      Wilkens, C.M. and Aldrich, R.W. (2006) “State-independent block of BK channels by an intracellular quaternary ammonium.,” The Journal of General Physiology, 128(3), pp. 347–364. Available at: https://doi.org/10.1085/jgp.200609579.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful reading of our manuscript and thoughtful comments on it. We appreciate the overall positive opinion on our manuscript and helpful comments and suggestions from the reviewers. Overall, the main points identified by reviewers were 1) further broadening of the system to a range of inputs as well as the construct types that can be generated with the system and 2) Further consideration of any off-target joining or off-target effects on genes/proteins and the limits to the expandability of the kit. To address these concerns, we have added new data in Figure 6, illustrating the generation of a new construct using PCR and dsDNA fragments, new constructs for mpeg1.1 and for CRISPR gRNA expression and have revised the text to further address concerns and limitations of the toolkit. We thank the reviewers and editors for these suggestions and feel that they have substantially improved the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Furthermore, the modular design ensures expandability, enabling researchers to customize constructs for diverse experimental designs. The validation provided in the manuscript is solid, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      We appreciate the overall positive evaluation of our toolkit and the time and effort in evaluating it.

      Weaknesses:

      While the toolkit's technical capabilities are well-demonstrated, there are several areas where additional validation and examples could enhance its impact. One limitation is the lack of data showing whether the toolkit can be directly used for rapid cloning and testing of enhancers or promoters, particularly cloning them directly from PCR using PaqCI overhangs without needing an entry vector. Similarly, the feasibility of cloning genes directly from PCR products into the system is not demonstrated, which would significantly increase the utility for researchers working with genomic elements.

      This is an excellent point. Given the increased use of gene synthesis and dsDNA fragments, we also thought it was good to demonstrate incorporation of these as well. We have added a new figure, Figure 6, which demonstrates generation of two new transgene constructs constructed by direct cloning of three PCR products along with a synthetic dsDNA fragment into a Tol2 flanked backbone plasmid as an alternative, rapid approach to generation of transgenes. The resulting plasmids, encoding the mpeg1.1. promoter, a separate p2a, and a tdTomato fluorescent protein along with either wildtype or dominant negative rac2 were properly assembled and in transient transgenic zebrafish injected with these constructs, dominant negative rac2 prevented macrophage recruitment to tail wounds, indicating that this approach worked for the generation of functional transgenes. These results are discussed in new text (lines 304-391) describing this new experiment and the finding that both PCR products and synthesized dsDNA could be efficiently incorporated in constructions generated with our approach as well as in the discussion (lines 494-499).

      The authors discuss potential applications such as using the toolkit for tissue-specific knockout applications by assembling CRISPR/Cas9 gRNA constructs. However, they do not demonstrate the cloning of short fragments, such as gRNA sequences downstream of a U6 promoter, which would be an important proof-of-concept to validate these applications. Furthermore, while the manuscript focuses on macrophage-specific promoters, the widely used mpeg1.1 promoter is not included or tested, which limits the toolkit's appeal for researchers studying macrophages and microglia.

      Yes, in the new figure described above, we have now shown that this method works with shorter PCR fragments such as the p2a fragment cloned within the tdTomato-p2a-rac2 constructs described above. This fragment is ~70 bp and while this is somewhat longer than a simple gRNA targeting sequence (though smaller than a complete sgRNA), we believe that this indicates that smaller size fragments can still be incorporated within these constructs. We also agree with the general idea of increasing functionality to incorporate CRISPR/Cas9 and now include a 3E encoding the zebrafish U6 promoter. As CRISPR expression constructs frequently incorporate complex construction, for instance, expression of tagged Cas9 along with the U6 driven gRNA as in Zhou et al., 2018 or along with rescue constructs as in Wang et al., 2021, we have given these constructs the non-standard 5’ end O3c, to enable multiplexing in these complex constructs.

      We agree that it is important to include mpeg1.1, given the broad use of this promoter within the field, we’ve now included an 5E mpeg1.1 construct within the toolkit.

      Another potential limitation is the handling of sequences containing PaqCI recognition sites. Although the authors discuss domestication to remove these sites, a demonstration of cloning strategies for such cases or alternative methods to address these challenges would provide practical guidance for users.

      Absolutely, we have now included a new figure (Supplementary Figure 6) that illustrates one domestication approach using PCR and homology-based cloning as an easy approach to domestication. In addition, we have also mentioned alternative approaches for domestication in the discussion (lines 439-444).

      Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base pair overhang sequence in their final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      Strengths:

      The generation of several lines of transgenic zebrafish for the immunologic study demonstrates the feasibility of the ImPaqT in vivo. The lineage tracing of macrophages by LPS injection shows this approach's functionality, validating its usage in vivo.

      We appreciate the positive sentiments for our toolkit and the effort put into reviewing our manuscript.

      Weaknesses:

      (1) There is no quantitative data analysis showing the percentage of off-target based on these 4bp overhang sequences.

      While we agree that this is an important variable for the method, we feel that previous studies that have broadly tested off-target effects of all potential 4 bp overhang sequences have already given an effective overview of interactions between each of these overhangs (Potapov et al., 2018; Pryor et al., 2020). The results from these studies were incorporated into the NEB ligase fidelity viewer that we used to predict the overhangs that would have minimal off-target with each other: the tool also reports the expected off-target ligation of individual 4 bp overhangs. In all cases, we selected overhangs that would have minimal off-target efficiency, with each of the overhangs showing 1% or less off-target ligation with any of the other overhangs chosen. We have added new text, lines 119-124, that further clarifies that our selection for these ends.

      (2) There is no statement for the upper limitation of the expandability.

      Yes, we’ve been curious as well. While our cloning of 6 distinct fragments in Figure 5 and a new 5 fragment cloning added in revision seen in Figure 6, suggests that 5-6 fragments can be readily assembled, in the course of revisions we also attempted to generate a larger product of 11 fragments that ultimately failed. While the 11 fragment construct was unsuccessful, it is unclear whether this is due to the constructs chosen, the potential size of the plasmid or due to a failure of the technique/enzymes themselves. Given that published descriptions of PaqCI Golden Gate cloning approaches have found that PaqCI can assemble at least 32 fragments and can produce large sequences (e.g. in Sikkema et al., 2023, where they assemble the ~40 kbp T7 genome from 12, 24 and 32 distinct fragments using a PaqCI Golden Gate reaction), we suspect that our issues with the 11 fragment assembly are likely due to complications with the specific group of constructs that were combined, however, we have not been able to exhaustively test a range of constructs and assemblies of varying complexity levels. To recognize this, we have added additional text (lines 490-493) to the discussion describing that we have only combined 6 constructs, but that we think that this likely encompasses many of the applications that may be needed for this system, while recognizing that expansion beyond this number may be possible.

      (3) There is no data about any potential side effect on their endogenous function of promoter/protein of interest with the ImPaqT method.

      Absolutely, we have added new text (lines 457-470) to our discussion describing the potential side effects on protein function. For instance, the need to be aware of whether N- or C-termini of proteins can be modified and recognition of the potential for affecting/creating ectopic transcription factor binding sites as potential pitfalls to keep in mind.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The data presented in the manuscript is robust and well-supported. However, to fully demonstrate the broad applicability of the toolkit and strengthen its impact, a few additional experiments could be beneficial. Specific suggestions for these experiments and areas of improvement are outlined in the 'Weaknesses' section of the Public Review. Additionally, Figures 2-4 illustrate the same concept - cloning three fragments from entry vectors-which comes across as repetitive. Incorporating a more diverse range of use cases would better highlight the versatility of the toolkit.

      As we described in our replies to your public points above, we have now added new Figure 6 and new Supplementary Figure 6 addressing the cloning of PCR fragments, short fragments as well as a mechanism of domestication. We have also included the mpeg1.1 promoter within the toolkit. In addition, your point on the repetition of assay is fair and in our new Figure 6, we instead used wild type and dominant-negative Rac2 expression and failure of macrophage recruitment to the tail wound.

      Reviewer #2 (Recommendations for the authors):

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, it is interesting and potentially efficient, but I have a few concerns:

      (1) The author claimed that the ImPaqT system is more efficient than other existing systems. The authors should provide such data to support their claim.

      Our argument wouldn’t be that the ImPaqT system is strictly speaking more efficient, but rather that the combination of minimal added sequence, the ability to expand or contract the fragments used, and, in our new Figure 6, the ability to directly utilize PCR products and dsDNA fragments, while retaining the ability to combinatorially build constructs from a suite of existing sequences is the main point of the method. We now explicitly state that Golden Gate cloning isn’t more efficient than existing techniques in the text (lines 534-537), but rather the particular strength of the method is the flexibility and minimal added sequence.

      (2) The ImPaqT is theoretically less prone to have off-target effects than existing systems, the authors should provide such data to validate their claim.

      Good point, we have now searched the zebrafish genome for PaqCI sites as well as for BsaI and BsmBI which are the 6-base cutters most commonly used for Golden Gate cloning. We found that PaqCI cuts every ~17 kb in the zebrafish genome while BsaI and BsmBI cut every ~9 kb or ~13 kb respectively, further supporting that PaqCI sites are rarer in the genome and should generally require domestication less often. We have now added new text describing this in lines 129-132.

      (3) The authors should mention any potential side effects of this system on the endogenous function of the promoter/protein of interest, at least in their discussion part.

      Yes, this should absolutely be expanded, as we said in your public comments above, we have now added new text describing potential pitfalls that this method may have on promoter or gene expression.

      (4) The authors are suggested to provide a balanced discussion about the expandable usage of this system beyond the immune system.

      We agree, this is also a good point that we should have emphasized more. We’ve added new text (lines 537-541) recognizing that in principle, many of the components we’ve derived should be useful in non-immune systems, but we also recognize that adapting this to new tissues will require the development of new promoters within the Golden Gate system which can be combined with these already developed tools.

      References

      Potapov, V., Ong, J.L., Kucera, R.B., Langhorst, B.W., Bilotti, K., Pryor, J.M., Cantor, E.J., Canton, B., Knight, T.F., Evans, T.C., Jr., et al. (2018). Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA Assembly. ACS Synth Biol 7, 2665-2674.

      Pryor, J.M., Potapov, V., Kucera, R.B., Bilotti, K., Cantor, E.J., and Lohman, G.J.S. (2020). Enabling one-pot Golden Gate assemblies of unprecedented complexity using data-optimized assembly design. PLoS One 15, e0238592.

      Sikkema, A.P., Tabatabaei, S.K., Lee, Y.J., Lund, S., and Lohman, G.J.S. (2023). High-Complexity One-Pot Golden Gate Assembly. Curr Protoc 3, e882.

      Wang, Y., Hsu, A.Y., Walton, E.M., Park, S.J., Syahirah, R., Wang, T., Zhou, W., Ding, C., Lemke, A.P., Zhang, G., et al. (2021). A robust and flexible CRISPR/Cas9-based system for neutrophilspecific gene inactivation in zebrafish. J Cell Sci 134.

      Zhou, W., Cao, L., Jeffries, J., Zhu, X., Staiger, C.J., and Deng, Q. (2018). Neutrophil-specific knockout demonstrates a role for mitochondria in regulating neutrophil motility in zebrafish. Dis Model Mech 11.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. Since scav-5 and scav-6 are paralogs of scav-4, we analysed their functions in lipid accumulation using scav-5(ok1606) deletion mutants and scav-6 knockout alleles generated in this study through CRISPR/Cas9-mediated gene editing (Figure 4B). We found that when fed with JUb74, both scav-5(-) and scav-6(-) mutants had moderately reduced LD sizes, but not to the extent of scav-4(-) mutants (Figure 4E). Previous promoter reporter studies showed that scav-5 and scav-6 were expressed in the intestine.34 We constructed translational reporters for both genes and found weak or no signals for SCAV-5::TagRFP possibly due to low protein levels. The SCAV-6::TagRFP fusion protein was expressed in the intestine and was localized to the apical membrane (Figure 4C). From the fluorescent intensity, the scav-6 expression appeared to be weaker than the scav-4 expression. Moreover, scav-4(-) scav-6(-) double mutants had the same LD diameter as scav-4(-) single mutants (Figure 4F). The above results suggested that SCAV-4 may play a more significant role than the other two paralogs in intestinal lipid uptake.

      I'm surprised that the scav-5 and scav-6 paralogs were both able to reduce the large LD phenotype to the same extent as scav-4 (there doesn't appear to be significant difference between the mutants). To me this suggests either they each contribute a third of the BCFA uptake, or that they operate together to internalize BCFAs. The scav-4;scav-6 double mutant suggests the first idea isn't correct as you don't see a stronger effect there. Do you think its possible these transporters are working as a complex? I would be interested to see if you can rescue each of these mutants with scav-4 expression, or if rescue requires all receptors to be present.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we addressed Reviewer 3’s remaining concern on the potential confound between posterior probability and time in neuroimaging results. First, as suggested by the reviewer, we provided images of activations for the effect of Pt and delta Pt after controlling for intertemporal prior in GLM-2. Second, we compared the effect of Pt and delta Pt between GLM-1 (without intertemporal prior) and GLM-2 (with intertemporal prior) and showed the results in a new figure (Figure 4).

      Regarding issue on reliance on explicitly instructed probabilities, we wish to point out that most of the concerns such as response mode and regression to the mean were addressed in the original behavioral paper by Massey and Wu (2005). Please see our response to this point in detail in Weakness (2) posted by Reviewer 3.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we focused on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we presented whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we compared the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4) in the revised manuscript.

      As suggested by the reviewer, we also added slice-by-slice images of the whole-brain results on Pt and delta Pt in the supplement in addition to the Tables of Activation so that the activated brain regions can be clearly seen through these images.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      We thank the reviewer for this comment. We thank you for putting out that there are alternative models that can describe the over- and underreaction seen in the dataset. Massey and Wu (2005) dealt with this possibility in their original paper. Their concern was not so much about alternative ways of modeling their results, but in terms of alternative psychological processes. For example, asymmetric noise accounts have been posited in the judgment and decision making literature as possible accounts of phenomena like over-confidence. They addressed what might be crudely called “regression/attraction to the mean” in two ways. First, they looked at median responses as well as mean responses (because medians are less affected by the regressive effect) and found the same patterns of over- and underreactions. Second, they also generated sequences that matched particular posterior probabilities (so that over- and underreaction cannot be explained by regression to the mean) and still found under- and overreactions.

      We also wish to point out in the judgment and decision making literature starting from Edwards (1968), there is a long history of using normative Bayesian model as the starting model and subsequently develop quasi-Bayesian models (like the system-neglect model) to describe systematic deviations from the normative Bayesian.

      Finally, we want to clarify that our primary goal is not to engage in model fitting exercise that examines different possible models. To us, what is more important is that system neglect is a psychologically motivated hypothesis. It is built on the idea that the lack of sensitivity to the system parameters is due to the fact that people focus primarily on the signals and secondarily on the system parameters that generate the signals. Massey and Wu (2005) dealt with a host of other potential explanations through experimental manipulations and data analysis. In this paper, we built on Massey and Wu to examine the neurocomputational basis that gives rise to over- and underreactions.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we added a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also showed the results on intertemporal prior on vmPFC and ventral striatum from GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n = 30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of P<sub>t</sub>. First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results of Pt and delta Pt from GLM-2. We also compared the effect of Pt and delta Pt between GLM-1 and GLM-2. We found that the effect of Pt and delta Pt did not differ between GLM-1 and GLM-2. GLM-1 and GLM-2 differed on whether various task-related regressors contributing to Pt, including the intertemporal prior, were included in the model. In GLM-1, those task-related regressors were not included. In GLM-2, the task-related regressors were included in addition to Pt and delta P.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      Here is the section in the main text where we discussed the new Figure 4 on page 19-22:

      We further examined the robustness of P<sub>t</sub> and ∆P<sub>t</sub> representations in vmPFC and ventral striatum in three follow-up analyses. In the first analysis, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub> and ∆P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors. Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, , where q is transition probability and t = 1, …, 10is the period (Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. We found that the results of P<sub>t</sub> and ∆P<sub>t</sub> in the vmPFC and ventral striatum in GLM-2 were identical to those in GLM-1 (Fig. 4): Fig. 4A was meant to depict the results in slices identical to those shown in Fig. 3B for results based on GLM-1. For slice-by-slice results, see Fig. S7 in SI for results based on GLM-1 and Fig. S9 for GLM-2. For Tables of activations, see Tables S1-S3 in SI for GLM-1 and Tables S7-S9 for GLM-2. In a separate, independent region-of-interest (ROI) analysis on vmPFC and ventral striatum (Fig. 4BC; see Independent regions-of-interest (ROIs) analysis in Methods for details), we further compared the effect of both P<sub>t</sub> and ∆P<sub>t</sub> between GLM-1 and GLM-2. For P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.72, p = 0.47 in vmPFC, t(58) = −0.21, p = 0.83 in ventral striatum), while the effect of P<sub>t</sub> from GLM-1 (one sample t-test, t(29) = −3,82, p <.01 in vmPFC; t(29) = −3.06, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.69, p =.01 in vmPFC; t(29) = −2.50, p .02 in ventral striatum). For ∆P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.07, p =0.94 in vmPFC; t(58) = −0.14, p =0.88 in ventral striatum), while the effect of  from GLM-1 (one-sample t-test, t(29) = −3.12, p <.01 in vmPFC; t(29) = −4.14, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.92, p <.01 in vmPFC; t(29) = −3.59, p <.01 in ventral striatum). For the intertemporal prior, activity in both vmPFC and ventral striatum did not correlate significantly with the intertemporal prior (one-sample t-test, t(29) = −0.07, p =0.95 in vmPFC; t(29) = −0.53, p =0.60 in ventral striatum). All the t-tests described above were two-tailed. Taken together, these results suggest that vmPFC and ventral striatum represented P<sub>t</sub> and ∆P<sub>t</sub> regardless of whether the intertemporal prior and other task-related regressors contributing to P<sub>t</sub> were included in the GLM. We also did not find that vmPFC and ventral striatum to represent the intertemporal prior. In the second analysis, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, 1n (P<sub>t</sub>/(1 - P<sub>t</sub>)) (Fig. S10 in SI). In the third analysis, we implemented a GLM that examined P<sub>t</sub> separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S11 in SI). Each of these analyses showed significant correlation with P<sub>t</sub> in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      As suggested by the reviewer, we added slice-by-slice images showing the effect of Pt and delta Pt (Figure S9 in SI for GLM-2 and Figure S7 for GLM-1). The clusters in blue represent Pt effect, the clusters in orange represent delta Pt effect. As can be seen, both Pt and delta Pt are represented in the vmPFC and ventral striatum.

    1. Open a social media interface (not the one you’ve been working with) and choose a view (e.g., a list of posts, an individual post, an author page etc.). First identify as many pieces of information you can see the screen (without doing anything). For each piece of information: What data types might be used to represent that data on a computer? How is this data a simplification of reality? That is, what does it not capture? Who does it work best for, and who does it not work well for? Did the user(s) directly provide that data, or was it collected automatically by the social media site?

      TikTok only shows the number of likes as an integer data type, meaning it tells me how many people liked a video, but it does not show different emotions like Facebook, where users can react with various feelings. So we cannot really tell whether people truly enjoyed the video or just saved or liked it to share with others. It does not clearly reflect viewers’ real feelings, including mine. Another example is text data such as usernames and profile pictures which are based on users’ personal preferences and do not necessarily reflect who they are in real life. This is why there are many fake accounts on social media, created for different purposes. Sometimes when scrolling on TikTok, I wonder why I see unfamiliar videos that I have never searched for or talked about. I think this happens because the platform collects data from my followers, and if they like certain types of videos, similar content may also appear on my feed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors investigated the effect of nutritional stress (HSD and HFD) on cardiac function by assessing multiple parameters on adult flies. They next identified the adaptive transcriptomic changes in the heart in response to these nutritional stresses and screened for their roles under ND, HSD and HFD. They identified fit gene, encoding a satiety gene, expressed by cardiomyocytes and pericardial cells.

      I think the characterisation is thorough; however, the conclusion is not well supported by the evidence. My main concern is that in many graphs, the difference between control and experiment is subtle, and, secondly, the authors showed some conflicting results (e.g. one RNAi showed a reduction of one parameter, however, the other independent RNAi did not. In this case, I believe the authors shouldn't conclude that the RNAi is functionally required, since the RNAis are meant to confirm each other.

      First, we thank the reviewer for her/his constructive comments and suggestions. We obtained new results presented in the last version of the manuscript, which consistently support our conclusions and improve the study.

      High-Sugar and High-Fat Diets modified cardiac performance

      They assessed how HSD and HFD affect Adult fly heart performance. Instead of performing 3 weeks of dietary manipulation as has been done before by other groups, they put adult flies on HSD for 7 days and HFD for only 3 days.

      We would like to clarify the nutritional challenge used. Cardiac function of flies was assessed at 10 days after emergence. Flies were put either in ND or HSD during these 10 days (ND and HSD conditions), or in ND for 7 days then transferred on HFD for 3days (HFD condition). Finally, all the females spent 10 days in a diet before being imaged or before hearts/brains dissection.

      They found: HSD increases HP and SI, and reduces AI. The difference is too small and not consistent between different control lines. Also, when the difference is this small, p value does not tell much!

      They probably intentionally induced a milder effect so that they could assess adaptive transcriptomic changes to this nutritional stress. In Fig. 1D SI is increased under HSD with control-KK, In Fig. S1C, SI is not changed under HSD with control-GD and control-GFP. Instead, DI is increased, which is also opposite to what they showed in Fig. 1 C. HFD increased ESD, EDD, SV, FS and CO.(Hypertrophy). This is not true with control-GD and control-GFP lines though! Comments: They have assessed many parameters in live animals with many different control lines, which is thorough. However, it is hard to draw any conclusions based on these conflicting results. Are these effect KK line specific?

      Globally, we agree with the reviewer that the results, presented in the first version of the manuscript, for the control lines were difficult to understand due to the inconsistency of the phenotypes. In this revised version, we performed new results in Figure 1 and __S1 __regarding the effect of 10 days HSD and 3 days HFD exposure vs ND.

      105 to 187 flies were imaged for the 3 control conditions, in the 3 diets concomitantly, to increase the power of our analysis. As mentioned in the main text (page 3, line 30-35; page 4, line 1-5), both diets deteriorate cardiac function with HFD leading to consistent phenotypes on heart diameters and rhythm and HSD milder effects. Indeed, the 3 control lines were uniformly affected by HFD after 3 days exposure, whereas 10 days in HSD was not sufficient to quantify a significant effect despite consistent the trends on several phenotypes (EDD, ESD, DI, AI and CO. These results revealed a different sensitivity of the cardiac performance when exposed to sugar and fat.

      As described in the text, we were nevertheless confident that our approach would be good to investigate the early molecular dysregulations induced by sugar. This was the purpose of our analysis, presented in the follow-up of the manuscript.

      Regarding the small differences measured in the phenotypes in HSD and HFD compared to ND, we would like to outline that the values presented are normalized values to control. Normalization is done for every independent experiment, performed at different dates, and permits the graphical representation of pooled values. Statistical analysis is performed using non-parametric Kruskal-Wallis test accordingly. Values are presented with the X axis cutting the Y axis at 0, this graphical representation also contributes to flattening the differences and p-values indicate their significance.

      Analysis of the fly cardiac transcriptome upon nutritional stress

      RNA seq to detect differentially expressed genes under HSD and HFD vs ND. Most DE genes are downregulated, which prompts them to assess how the downregulation of these genes adapts the animals to this nutritional stress.

      High Sugar Diet downregulated 1c-metabolism and Leloir galactose pathways.

      In this revised manuscript, we first present RT-qPCR validating the downregulation of Gnmt, Sardh and Galk expressions in the heart of 10days old HSD-fed females compared to ND-fed ones (Figure S3A).

      We apologize for the confused explanations in the first version of the manuscript. We show new results in Figure 3 and __S3 __on the cardiac function of both Gnmt and Sardh, where following reviewer’s suggestion, both genes were knocked down in the heart in ND and Gnmt overexpressed in HSD. No available tools allowed us to test Sardh overexpression in HSD and we could not get some for Galk.

      GNMT is downregulated under HSD and HFD.

      In ND, GNMT knockdown increased ESD, EDD and CO. Sardh knockdown did the same? However, Sardh knockdown did not affect ESD significantly.

      We reanalyze our first data and added new ones, comparing only knockdown or overexpression to the corresponding controls performed in concomitant experiments. Results are now shown in Figure 3C-E; S3C-H. Knocking down Gnmt in the heart increased HP, EDD, ESD and CO, Sardh knockdown in ND resulted in milder phenotypes but inducing significant hypertrophy in ND as Gnmt does. In both cases, FS was not impacted.

      Both genes have been previously shown as beneficial to muscular function in time-restricted feeding context (Livelo et al., 2023, Nat.Comm.), illustrating that, even if both enzymes are involved in opposite reaction, their function has the same effect on organ/tissue function, as they did for heart diameters. The text corresponding to results and discussion were updated accordingly (pages 5, 11).

      The conclusion here is: GNMT knockdown induces hypertrophy, similar to the effect of HFD.

      In HSD, further knockdown of GNMT reduced (rescued) HP, suggesting downregulation of GNMT under HSD is adaptive. Should overexpress GNMT under HSD to see if this manipulation further increases HP, to claim GNMT downregulation is an adaptive change to high sugar stress.

      We thank the reviewer for her/his suggestion. We now used UAS-GnmtWT (from FlyORF) to assess the role of Gnmt on cardiac function in HSD.

      As shown in (Figure 3C-E; S3C,F), overexpressing Gnmt in the heart in HSD was sufficient to rescue some sugar induced phenotypes or to induce other dysfunctions, when compared to corresponding controls evaluated in the same experiments in ND and HSD. Notably, HP increase and CO decrease are rescued by Gnmt cardiac overexpression in HSD. Interestingly, the cardiac diastolic constriction induced by HSD is associated to increased FS and CO in this genotype in sugar diet. These new results strengthen the positive effect of Gnmt on cardiac function, improving it in HSD and preventing its deterioration in this diet.

      Of note, Gnmt overexpression in ND did not trigger cardiac dysfunctions (data not shown).

      The results and conclusions have been corrected.

      Interestingly, HSD itself tends to decrease AI, a further knockdown of GNMT further decreases AI. This indicates GNMT downregulation under HSD contributes to AI reduction. Together, GNMT downregulation under HSD prevents HP from going higher, while its downregulation causes AI going down.

      In the manscript, the authors claim that " Gnmt KD led reduced HP and AI, suggesting that it is able to counteract the effect of HSD observed in control flies on these phenotypes". This is not true according to the logic in Results section 1. As in section 1, the effect of HSD on AI is not significant, so the authors shouldn't say" HS tended to reduce AI".

      Our reanalyzes and new results showed no Gnmt impact on AI, so these Figure panels were removed.

      Why GNMT knockdown reduced FS under ND (Fig. S3C), while increasing FS under HSD (Fig. 3F)? If GNMT knockdown induces hypertrophy, I would expect it to increase FS.

      Gnmt overexpression did not affect cardiac diameters in HSD, but it nevertheless led to an increased contractile efficacy compared to HSD controls (Figure S3F).

      These new results strengthen the positive effect of Gnmt on cardiac function, preventing its deterioration in sugar diet. The text was modified accordingly.

      High Fat Diet modulated CD36-scavenger receptor and Glut8 orthologues

      In this revised manuscript, we present RT-qPCR validating the downregulation of Snmp1 expression and the slight upregulation of nebu’s in the heart of 10days old HFD-fed females compared to ND-fed ones (Figure S3B).

      HFD: Snmp1 gene is downregulated, however, both overexpression and knockdown of Snmp1 in ND induced some phenotypes.

      Indeed, as mentioned in the revised manuscript (page 6, lines 21-24), in heart of females fed ND, both Snmp1 knockdown (Snmp1KK) and overexpression (Snmp1WT) showed a reduction of EDD and ESD (Figure 3J; S3J) but FS is increased accordingly only in Snmp1KK.

      As notified in the text, both downregulation and overexpression of Snmp1 led to side-phenotypes (page 6, lines 24-28): Snmp1KK exhibited abdominal fat increase (Figure S3K) and ostial cells seemed clearly malformed in Snmp1WT (Figure 3M). This may explain why the heart shows the same type of functional impairment in both genotypes.

      We now discussed the hypothesis that these similar cardiac dysfunctions may result from Snmp1 being a regulator of organismal or cardiac lipid homeostasis. Indeed, increasing body fat content is deleterious as is increasing the import of fat in the cardiomyocytes. Finally, both affects cardiac cells’ health and functioning.

      HFD: nebu has a role in regulating cardiac function under ND.

      HSD and HFD revealed the secretory function of the heart

      They identified diet-regulated secreted proteins that are required for cardiac dysfunction.

      Cardiac Fit expression impacted Cardiac performance.

      The author used Hand-G4 to knock down Fit using KK and GD lines, KK line showed a reduction in HP (Fig. 5A), but not GD line (Fig. S5D). How did the author conclude that Fit is required for cardiac function? Also, with the positive data, the difference is too subtle.

      We apologize and agree that the contradictory or inconsistent results obtained with the two RNAi lines were confusing.

      For this revised version, we first assess the effect of the two RNAi lines (KK and GD) on fit expression in the dissected hearts. RT-qPCR for KK line is presented in Figure S5A. GD line did not show a significant reduction of fit expression when expressed in the heart with Hand>, which can explain the former results presented (not shown but data are available). So, we removed all results obtained with the GD line in this revised version.

      To confirm the KK effects, we used fit KO allele (fit81) and truncated version of fit, without its signal peptide (fitDeltaSP), which has a dominant negative effect, both previously published and validated (Sun et al. 2017, Nat. Comm.). These two mutants were used to investigate the cardiac function of fit in our analysis. Results presented in Figure 5 and S5 confirm the phenotypes already observed with the KK line when expressed with Hand> in the heart and with Lsp2> in the fat body.

      Our results validate the effect of fit decrease on rhythmicity and contractility, the reverse effects being consistently observed in fit overexpression. In conclusion, we are confident in the requirement of Fit in the regulation of cardiac performance.

      These new data are now included in the results section “Cardiac Fit expression impacted Cardiac performance” (pages 8-9)

      **Referee cross-commenting**

      i agree with the experiments proposed by reviewer 2.

      Reviewer #1 (Significance (Required)):

      The study aims to examine the effect of diet on cardiac function.

      The strength is that a lot of characterisations were done.

      the weakness is the functional data regarding fit could not be validated in two different RNAis, thus the evidence is not strong to support the conclusions.

      We again would like to thank the reviewer for her/his remarks and suggestions. She/He highlights the weakness of the first analysis and this was an important and constructive feedbacks for us. We strengthened our results by increasing samples, reanalyzing data and performing mandatory new experiments that are now included in this revised version.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Khamvongsa-Charbonnier et al. reported a RNA-seq analysis and RNA interference screening on high-fat and high-sugar-induced cardiomyopathy in Drosophila. The authors uncovered novel genes in 1C-metabolism, galactose metabolism, CD36-scavenger receptor and glucose transporter, as adaptative factors of cardiac function under high-fat and high-sugar treatment. The authors also identified a satiety hormone, Fit, as a cardiokine to control food intake and , expressed by dilp5 secretion. In summary, this study leverages the powerful genetic model Drosophila to uncover a number of new factors in regulating cardiac function under nutritional stresses and potentially offers new insights into molecular mechanisms underlying diet-related cardiac diseases. I have a few concerns, as listed below.

      First, we would like to thank the reviewer for her/his comments and suggestions that deeply help us to improve the take-home messages of our manuscript. Following her/his recommendations and suggestions, we can now present a revised and stronger version of our manuscript.

      1. Quantitative RT-PCR is required to validate the expression patterns of candidate genes identified from the RNAseq analysis.

      RT-qPCR have been performed on hearts dissected from 10 days old females fed ND, HSD or HFD. Gnmt, Sardh and Galk validated downregulation are presented in Figure S3A, Snmp1 downregulation and nebu upregulation (trend but non-significant) in Figure S3B, fit downregulation in Figure S5A.

      The authors state that the dysregulated gene expression patterns reflect acute adaptation to HSD and HFD stresses. Most of the candidate genes in this study were downregulated upon HSD and HFD. However, it is recommended that overexpression of these genes, rather than knockdown, is needed to confirm whether the downregulation of these candidate genes upon stresses is an adaptative response.

      We agree with the reviewer and followed her/his recommendation when tools were accessible for our analysis.

      For example, HSD feeding induces the heart period. Knocking down Gnmt, specifically in the heart, under the HSD feeding changes can reduce the heart period. This evidence is insufficient to suggest the protective role of Gnmt under the HSD diet. Gnmt has already been downregulated under the HSD. Further knockdown of Gnmt, instead of returning the Gnmt expression to normal levels, to protect cardiac contractile performance complicates the model.

      We thank the reviewer for her/his suggestion. We used UAS-*GnmtWT * (from FlyORF) to perform these experiments.

      As shown in (Figure 3C-E; S3C,F), knocking down Gnmt in the heart increased HP, EDD, ESD and CO. In the same Figure panels and in Figure S3F, we showed that overexpressing Gnmt with Hand> in HSD was sufficient to rescue some sugar induced phenotypes or to induce some, when compared to corresponding controls evaluated in the same experiments in ND and HSD. Gnmt overexpression in ND did not trigger cardiac dysfunctions (data not shown).

      HP increase and CO decrease are rescued by Gnmt cardiac overexpression in HSD. Interestingly, the cardiac constriction induced by HSD is not rescued by Gnmt overexpression, but it is enough to increase FS and CO in sugar diet. These new results strengthen the positive effect of Gnmt on cardiac function, improving it in HSD and preventing its deterioration in this diet.

      Sardh knockdown in ND, resulted in milder phenotypes but induced significant hypertrophy in ND as Gnmt does. No available tools allowed us to test its overexpression in HSD.

      Nevertheless, as mentioned and discussed in the manuscript (page 5, line 27-30; page 11, lines 11-14), such protective role of muscular function and integrity has already been characterized in fly IFM in time-restricted feeding experiments for Gnmt and Sardh (Livelo et al., 2023, Nat.Comm.). Our experiments show that both genes encounter the same role in cardiac function upon nutritional stresses. The text was modified accordingly.

      The authors suggest that the effect of nebu on heart contractility is not dependent on diet. However, based on the result from Figure 3O-P, the HFD treatment blocks the effect of nebu knockdown on heart contractility. The authors need to further explain these results and modify their conclusions accordingly.

      We completely agree with the reviewer. We did not correctly analyze these results. We reanalyze our data, taking into account only the experiments of nebu knockdown that were performed in ND and in HFD concomitantly. Results are shown in Figure 3O-P; S3L-N.

      As mentioned in the manuscript (page 7, lines 3-8), nebu knockdown led to identical HP decrease in both diets but its constrictive effect (reduction of heart diameters) in ND is abrogated by fat diet.

      We modified the text accordingly in the results and discussion (page 7, lines 8-11; page 12, lines 7-12).

      It is a bit confusing that knockdown of fit using Hand-Gal4 induced food intake, but knockdown of fit using tin-Gal4 or Dot-Gal4 did not significantly induce food intake (Fig 6A). The author did not provide any explanation of these results. What is even more confusing is that overexpressing fit using Dot-Gal4 decreased food intake, but overexpressing fit using Hand-Gal4 or tin-Gal4 did not significantly decrease food intake (Fig 6B). Why was the strong food intake phenotype not observed using Hand-Gal4 in both experiments? These confusing results lead to a question, which cell type is responsible for the production of cardiokine, Fit?

      We apologize for the misleading results presented in the initial manuscript. We hope that our revised version will clarify Fit function regarding its remote impact.

      Concerning the requirement of Fit function and the cell types that produces Fit, the results we obtained when evaluating cardiac performance strongly suggest that both cardiomyocytes and pericardial cells are important and recapitulate the effect of Hand> (Figure 5A-C; S5G-H).

      In the case of food intake measurements, we now present results with newly performed food intake experiments for the Hand>fitWT (Figure 6D). They show a significant reduction of food intake in this condition, corroborating the results obtained with Dot>. We add a clarification in the manuscript for this point (page 10, lines 11-16).

      When testing the role of cardiac Fit in Dilp5 secretion, the authors subjected flies to starvation stress. However, the main focus of the present study is on HSD and HFD. The RNAseq analysis showed that Fit expression was downregulated by both HSD and HFD. Can the authors show that Dilp5 secretion is reduced by both HSD and HFD? Most importantly, can the authors test whether overexpression of cardiac Fit blocks HSD- or HFD-reduced Dilp5 secretion?

      We understand the point raised by the reviewer. First of all, we wanted to correlate the measured impact on food intake, when manipulating fit expression in the heart, to the level of Dilp release, as it has been used and validated in (Sun et al. 2017, Nat. Comm.). In this purpose, we used the same approach and protocol and results are shown in Figure 6 E-F.

      As mentioned by the reviewer, fit expression is downregulated in both HSD and HFD (which we confirmed by RT-qPCR in Figure S5A). As suggested by the reviewer, we performed Dilp5 immunostaining on CNS from females that were fed HSD of HFD for 10 days. Our results, in Figure 6B (left panels) and corresponding quantifications in Figure 6C, show that both diets strongly induce a decrease in Dilp5 amount in the IPCs and that it was not due to an altered Dilp2 or Dilp5 expression in the CNS (Figure S6A). In this condition, overexpressing fit, which has a promoting effect on Dilp secretion (Figure 6B, right panels ND), may only have an additive effect. This is shown in Figure 6B-C.

      Reviewer #2 (Significance (Required)):

      In summary, this study leverages the powerful genetic model Drosophila to uncover a number of new factors in regulating cardiac function under nutritional stresses and potentially offers new insights into molecular mechanisms underlying diet-related cardiac diseases.

      We again would like to thank the reviewer for her/his remarks and suggestions. Her/His important and constructive feedbacks helped us to improve and strengthen our study. Despite the weak points of the first version, she/he had supportive feedback and we deeply thank her/him. This revised version had improved results and analysis, thanks to the use of new genetic tools that strengthen this analysis.

  2. Apr 2026
    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Despite this compelling data regarding the protective role of HSF1 in the febrile response, what remains unexplained and complicates the authors' model is the observation that losing LvHSF1 at 'normal' temperatures of 25 ℃ is not detrimental to survival, even though viral loads increase and nSWD is likely still subject to LvHSF1 regulation. These observations suggest that WSSV infection may have other detrimental effects on the cell not reflected by viral load and that LvHSF1 may play additional roles in protecting the organism from these effects of WSSV infection, such as perhaps, perturbations to protein homeostasis. This is worth discussing, especially in light of the rather complicated roles of hormesis in protection from infection, the role of HSF1 in hormesis responses, and the findings from other groups that the authors discuss.

      We are grateful for your unbiased advice by reviewer. And we have added the description about the role of HSF1 in hormesis responses in discussion in Lines 422-425 in the revised manuscript. Thank you.

      Reviewer #2 (Public review):

      Temperature is a critical factor affecting the progression of viral diseases in vertebrates and invertebrates. In the current study, the authors investigate mechanisms by which high temperatures promote anti-viral resistance in shrimp. They show that high temperatures induce HSF1 expression, which in turn upregulates AMPs. The AMPs target viral envelope proteins and inhibit viral infection/replication. The authors confirm this process in drosophila and suggest that there may be a conserved mechanism of high-temperature mediated anti-viral response in arthropods. These findings will enhance our understanding of how high temperature improves resistance to viral infection in animals.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and extended. Further investigation on how WSSV infection is affected by AMP would have strengthened the study.

      We are grateful for your unbiased advice by reviewer. We have provided additional experimental evidence and supplementary instructions in the revised manuscript. Thank you.

      Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved.

      We are grateful for the positive comments and the unbiased advice by reviewer. We have improved the logical flow of the paper and added corresponding instructions in the revised manuscript. Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: The analysis compares Group TW to Group W (not the other way around).

      Thank you very much. To uncover the molecular mechanisms by which high temperature restricts WSSV infection, two shrimp groups, Group TW and Group W, were cultured at 25 °C. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing (Figure 1A). RNA-seq was used to identify genes responsive to high temperature, particularly those encoding potential transcriptional regulators. Thank you.

      (2) The RNA-seq data in Figure 1 focus only on the TFs. The manuscript would benefit from showing all the RNA-seq data and the differentially expressed genes. In particular, are the AMPs upregulated at the same time point? This should not be the case if LvHSF1 were responsible for the transcription of the AMPs, given the time lag between transcription and translation.

      Thank you for your suggestion. In Author response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024).

      Additionally, we also analyzed the AMPs expression between Group TW and Group W, and the results show that some antimicrobial peptides such as Lysozyme and C-type lectin are upregulated between Group TW and Group W. Notably, we did not detect upregulated expression of SWD between Group TW and Group W. We agree with the reviewer's point of view that there is a time lag between transcription and translation. Supplementary experimental evidences show that the expression level of LvHSF1 is strongly induced by WSSV stimulation, and then the expression level of SWD begins to increase. We have added a description in Lines 136-138 in the revised manuscript.

      Author response image 1.

      The Figure of the heat shock proteins in Group TW and Group W

      Author response image 2.

      Transcriptional expression levels of HSF1 and SWD after WSSV stimulation

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (3) The data showing the tissue distribution of LvHSF1 and nSWD is a rigorous approach and adds to the manuscript. A similar approach to understanding the time course of expression of AMPs in relationship to LvHSF1 expression levels would strengthen the authors' conclusions that LvHSF1 induction in response to high temperatures and viral infection, in turn, upregulates SWD and other antibacterial genes.

      Thank you for your suggestion. As you good suggestion, we detected the transcriptional expression levels of HSF1 and SWD after WSSV stimulation for 0, 2, 4, 6, 8, 12, 16, 20, and 24 hours. The transcriptional expression level of SWD was set to 1.00 at 0 h, in the early stage of WSSV infection (0-12 h, except 6 h), the expression level of LvHSF1 is strongly induced, and then the expression level of SWD begins to increase. Theses results show that LvHSF1 induction in response to viral infection, in turn, upregulates SWD and other antibacterial genes. Thank you.

      (4) The data (Figures 3 and 4) show that LvHSF1 is necessary to survive WSSV infection at high temperatures but does not affect survival at lower temperatures, even though LvHSF1 limits VP28 levels, and viral load at both temperatures is confusing. Does this suggest that LvHSF1 is not primarily important for protection against the virus but instead, for protection from the heat-induced damage caused by high temperatures, which would not be surprising? The manuscript would benefit if the authors could address this point. How do the authors envision the protection conferred by LvHSF1 only at high temperatures?

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection.

      Notably, the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. When infected with WSSV, shrimp use behavioral fever to elevate their body temperature (~32 °C), thereby inhibiting WSSV infection (Rakhshaninejad et al., 2023; Xiao et al., 2024). And this temperature (~32 °C) will not cause heat-induced damage to the shrimp. Our results demonstrate that febrile temperatures induce HSF1, which in turn upregulates antimicrobial peptides (AMPs) that target viral envelope proteins and inhibit viral replication.

      Only at high temperatures, we observed that knockdown of HSF1 did not affect shrimp survival rate (Figure 4A). Thank you again for your valuable feedback.

      Reference:

      Rakhshaninejad, M., Zheng, L., Nauwynck, H., 2023. Shrimp (Penaeus vannamei) survive white spot syndrome virus infection by behavioral fever. Sci Rep 13, 18034.

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (5) Related to the previous comment, the authors do not clearly distinguish between basal effects of LvHSF1 or nSWD induction and heat-induced effects and the differences related to the requirement of LvHSF1 for protection. Simply increasing LvHSF1 levels can result in increased nSWD. SWD levels increase upon WSSV infection even at 25 ℃, and the knockdown experiments suggest that this could also occur through LvHSF1. It would be useful to explicitly differentiate between basal functions of HSF1 and induced functions.

      Thank you for your suggestion. In previous responses, we have distinguished between basal effects of LvHSF1 or nSWD induction and heat-induced effects.

      As your good suggestion, we injected GST or rHSF1 protein into shrimp, the results showed that recombinant protein HSF1 could significantly induced the expression level of SWD (Supplementary Fig. 5C). Further, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 253-255 and Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Two temperatures are used in the experiments of shrimp. It seems that HSF1 is also upregulated by WSSV infection at 25 ℃. However, this upregulation seems not to be able to protect the animals. The authors compare the infection at 25 and 32 ℃ but did not discuss the findings.

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection. We have added a discussion of this finding in Lines 461-464 in the revised manuscript. Thank you.

      (2) In the abstract the authors say that "These insights provide new avenues for managing viral infections in aquaculture and other settings by leveraging environmental temperature control." However, this point has not been discussed in the main text.

      We appreciated your comments. We have added a discussion about the environmental temperature control in Lines 512-514 in the revised manuscript. Thank you.

      (3) Line 142: "These results suggest that LvHSF1 may play a key role in enhancing shrimp resistance to WSSV at elevated temperatures." Although this type of conclusion has been made in many studies, I think it is impossible to see a "KEY role" based mainly on change in expression.

      Thank you for your suggestion. We have revised this conclusion in the revised manuscript. Thank you.

      (4) Section 2.1 Induction of Heat Shock Factor 1 in Response to WSSV at High Temperature

      Figure 1. Identification of HSF1 as a key factor induced by high temperature.

      The two titles are confusing. Whether the upregulation of HSF1 is a response to high temperature or WSSV infection? I think it is more likely a response to high temperature. Did the authors see the difference in HSF1 expression in shrimp with and without WSSV infection at high temperatures?

      Thank you for your comment. We have modified the title of Section 2.1 in the revised manuscript. As your good suggestion, we have measured the expression of LvHSF1 after WSSV challenge at high temperatures (32 ℃) in revised Figure 2F-2H in Line 122 in the revised manuscript. The results demonstrate that the expression of LvHSF1 is strongly induced by WSSV stimulation at high temperatures (32 ℃) in the revised manuscript. Thank you.

      (5) Figure 2. Upregulation of LvHSF1 in shrimp challenged by WSSV at both low and high temperatures. Results for WSSV challenge at high temperatures are not included in this figure.

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. The results demonstrate that the expression of LvHSF1 is strongly induced by Poly (I: C) and WSSV stimulation at high temperatures (32 ℃). And we have added a description in Lines 168-179 in revised manuscript. Thank you.

      (6) Section 2.2 Expression Profiles of LvHSF1 in Shrimp Under Varied Temperature Conditions and WSSV Challenge. Did the authors try poly IC and WSSV challenge at 32℃, and compare with the un-challenge group? Why were only low temperature was analyzed?

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. And we have added a description about the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in Lines 168-179 in revised manuscript. Thank you.

      (7) Figure 2: Please indicate the temperature used in C-E and F-H in the figure legend. Statistical significance: compared with which group? Please provide information in the legend or show it in the bar chart.

      Thank you for your suggestion. We have added the description of temperature used in revised Figures 2C-2E. The expression changes of HSF1 were compared with those of PBS control group at the corresponding time and we modified the comparison method of significance in revised Figures 2C-2E. Thank you.

      (8) Figure 3H: There are two groups (dsGFP+PBS; dsHSF1+PBS) showing with the same symbol (dot line).

      Thank you for your comment. The revised Figure 3H has used different symbols to distinguish the two groups. Thank you.

      (9) Line 205: qPCR

      Thank you for your careful checks. We have corrected this error in the revised manuscript. Thank you.

      (10) Figure 5d and f: Please indicate the sample in each row.

      Thank you for your suggestion. We have marked the samples in each row in the revised Figures 5d&5f.

      (11) Figure 3 and Figure 4: Why different tissues were analyzed in the two experiments? Low temperature: gill and hemocytes. High temperature: gill and muscle? It is better to use the same tissues so that they can be compared. Please indicate the tissue analyzed in D and d.

      Thank you for your suggestion. We have repeated the experiment to detect the copy number of WSSV in hemocyte at high temperature (32 °C) after LvHSF1 knockdown. The results showed that knockdown LvHSF1 showed increased viral loads in shrimp hemocyte (Figure 4C). We have supplemented the tissue information in Figure 4D&4d. Thank you.

      (12) Figure 2A The time for temperature treatment? hours or days?

      Thank you for your comment. Transcriptional expression of LvHSF1 in different tissues of healthy shrimp subjected to low (25 °C) and high (32 °C) temperatures for 12 hours. We have supplemented this information in the legend of Figure 2A in Lines 840-841 in revised manuscript. Thank you.

      (13) Line 249: purified by SDS-PAGE gel?

      Thank you for your comment. We have modified this description in Lines 272-274 in current manuscript. Thank you.

      (14) Line 258 "Next, to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperature". I think it is confusing to use "mediated" here. It seems that HSF1 is downstream of nSWD. Actually, HSF1 controls the expression of nSWD and thus regulates the anti-WSSV effect of shrimp at high temperatures.

      We appreciated your comments. We have modified this description in Lines 282-283 in current manuscript. Thank you.

      (15) Line 458 "The most probable anti-WSSV mechanism of nSWD is its direct interaction with WSSV envelope proteins VP24 and VP26, potentially inhibiting viral entry into target cells. I suggest the author analyze the entry of WSSV to see whether nSWD blocks this process.

      Thank you for your comment. In general, the antimicrobial mechanism of action of AMPs is thought to involve direct membrane disruption, especially for enveloped virus (such as WSSV) (Wilson et al., 2013).

      Thanks to the reviewers for their valuable comments. Our manuscript mainly focuses on the febrile temperature-inducible HSF in host antiviral immunity, and the role of HSF1 in regulating antimicrobial effectors (such as SWD). Due to the limitation of the manuscript's length, we will further investigate the functional mechanisms of SWD-specific anti-WSSV in future studies. Thank you.

      Reference:

      Wilson, S.S., Wiens, M.E., Smith, J.G., 2013. Antiviral Mechanisms of Human Defensins. Journal of Molecular Biology 425, 4965-4980.

      (16) Line 435-456 The author discusses the difference between two shrimp species. Did the two studies measure the same immune parameters? I wonder whether the different observation is due to true differences or different methods they used to evaluate the response. If no immune response was promoted in the previous study, what's the possible anti-viral mechanism?

      We appreciated your comments. Firstly, the shrimps in the two experimental groups have different adaptability to temperature. The optimal water temperature for M. japonicus growth ranges from 25 to 32 °C, and the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. Secondly, the experimental environmental factors are different in the two experimental groups. Ammonia is a key stress factor in aquatic environments that usually increases the risk of pathogenic diseases in aquatic animals, however, High temperatures (32°C) have been shown to inhibit the replication of WSSV and reduce mortality in WSSV-infected shrimp. Thirdly, the two studies tested different immune indicators. Ammonia-induced Hsf1 suppressed the production and function of MjVago-L, an arthropod interferon analog. In this study, our findings revealed the molecular mechanism through which the HSF-AMPs axis mediates host resistance to viruses induced by febrile temperature. Taken together, the benefits of HSF1 can be attributed to either the host or the pathogen, depending on the nature and context of the host-virus-environment interaction.

      (17) Line 472 "directly bind to WSSV envelope proteins and inhibit WSSV proliferation"

      I think it is confusing to use "proliferation" here. It seems that the binding of HSF affects the replication process. However, based on the authors' discussion, HSF may likely block viral entry.

      Thank you for your suggestion. We have modified this description in Lines 505-507 in the current manuscript. Thank you.

      Reviewer #3 (Recommendations for the authors):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved. Following are my specific concerns.

      Major comments

      (1) The study design is pretty good, but the logical flow is not. The following should be improved.

      (a) In Figure 1, the reason for selecting HSF1 as the focus of the study is not clearly explained.

      Thank you for your comment. In a previous study, we have revealed that heat shock proteins exerted a significant role in enhancing the resistance of shrimp to WSSV at elevated temperature (32 ℃) (Xiao et al., 2024). GO functional enrichment analysis of DEGs between group TW and group W, indicating that most DEGs were involved in biological processes such as protein refolding, chaperone-mediated protein folding, and heat response. Therefore, special attention has been paid to heat shock factor 1 (HSF1), the master regulator of the heat shock response. We have added the description in Lines 136-138 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (b) As the authors draw models in Figure 9, the established activation mechanism of HSF1 is via trimerization by the release of HSP90, which binds to misfolded proteins under stress conditions, such as heat shock. Therefore, the increase in the HSF1 mRNA level in Figure 1 is strange. The authors need to clarify this issue by explaining this established activation mechanism of HSF1 and also must provide the basis of upregulation of HSF1 by mRNA increase via citing papers in the Introduction.

      We appreciated your comments. Under non-stress conditions, HSF monomers are retained in the cytoplasm in a complex with HSP90. During the stress response, such as high temperature, HSF dissociates from the complex, trimerizes, and converts into a DNA-binding conformation through regulatory upstream promoter elements known as heat shock elements (HSEs) (Andrasi et al., 2021). Previous studies have demonstrated that the expression of HSF1 was remarkably induced by stress response, such as high temperature (Ren et al., 2025), virus infection (Merkling et al., 2015), and ammonia stress (Wang et al., 2024). Our results also showed that the expression of LvHSF1 was significant induced by WSSV infection and high temperature (Figure 2). Therefore, this is not surprising that the increase in the HSF1 mRNA level in Figure 1.

      In response, we have revised the proposed model to better reflect our experimental findings and the accompanying description. This revision ensures that the schematic is consistent with our data and accurately represents the proposed mechanism. We appreciate your careful review and constructive feedback.

      Reference:

      Andrasi, N., Pettko-Szandtner, A., Szabados, L., 2021. Diversity of plant heat shock factors: regulation, interactions, and functions. J Exp Bot 72, 1558-1575.

      Ren, Q., Li, L., Liu, L., Li, J., Shi, C., Sun, Y., Yao, X., Hou, Z., Xiang, S., 2025. The molecular mechanism of temperature-dependent phase separation of heat shock factor 1. Nature Chemical Biology.

      Merkling, S.H., Overheul, G.J., van Mierlo, J.T., Arends, D., Gilissen, C., van Rij, R.P., 2015. The heat shock response restricts virus infection in Drosophila. Sci Rep 5, 12758.

      Wang, X.X., Zhang, H., Gao, J., Wang, X.W., 2024. Ammonia stress-induced heat shock factor 1 enhances white spot syndrome virus infection by targeting the interferon-like system in shrimp. mBio 15, e0313623.

      (c) For RNA seq analysis in both in Figures 1 and 5, they need to provide changes in conventional HSF1 target chaperones (many HSPs) to validate their RNA seq data.

      Thank you for your suggestion. In Authopr response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024). We have added the description in Lines 136-138 in the revised manuscript.

      In Figure 5, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (d) In Figure 5, they did experiments by focusing on the changes by HSF1 knockdown at 32 ℃. However, the logical flow should be focusing on genes whose expression was increased by 32 ℃ compared with 25 ℃ (in figure 1), among them they need to characterize HSF1 target genes. Here as mentioned above, classical HSP genes must be included in addition to those AMP genes.

      Thank you for your suggestion. As your good suggestion, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      (e) What is the logical basis of just picking nSWD? It is another example of cherry-picking similar to picking HSF1 in Figure 1.

      We appreciated your comments. To determine how temperature-induced LvHSF1 restricts WSSV infection, RNA-seq was performed to identify target genes regulated by HSF1. By analyzing the differentially expressed genes (DEGs), we screened eight candidate proteins for immunity-effector molecules, including SWD, CrustinⅠ, C-type lectin, Anti-lipopolysaccharide factor (ALF), and Vago. CrustinⅠ has been shown to play an important role in antiviral immunity (Li et al., 2020); C-type lectin (CTL1) can bind to the VP28, VP26, VP24, VP19, and VP14, thereby inhibiting the infection of WSSV (Zhao et al., 2009); Anti-lipopolysaccharide factor (ALF3) performs its anti-WSSV activity by binding to the envelope protein WSSV189 (Methatham et al., 2017); Vago can inhibit WSSV infection by activating the Jak/Stat pathway in shrimp (Gao et al., 2021). However, the detailed regulatory mechanism of SWD against WSSV was unclear, and particular attention was paid to the SWD. We have added the description in Lines 215-220 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Li, S., Lv, X., Yu, Y., Zhang, X., Li, F., 2020. Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei, Marine Drugs 18, 361.

      Zhao, Z.Y., Yin, Z.X., Xu, X.P., Weng, S.P., Rao, X.Y., Dai, Z.X., Luo, Y.W., Yang, G., Li, Z.S., Guan, H.J., Li, S.D., Chan, S.M., Yu, X.Q., He, J.G., 2009. A novel C-type lectin from the shrimp Litopenaeus vannamei possesses anti-white spot syndrome virus activity. Journal of Virology 83, 347-356.

      Methatham, T., Boonchuen, P., Jaree, P., Tassanakajon, A., Somboonwiwat, K., 2017. Antiviral action of the antimicrobial peptide ALFPm3 from Penaeus monodon against white spot syndrome virus. Dev Comp Immunol 69, 23-32.

      Gao, J., Zhao, B.R., Zhang, H., You, Y.L., Li, F., Wang, X.W., 2021. Interferon functional analog activates antiviral Jak/Stat signaling through integrin in an arthropod. Cell Rep 36, 109761.

      (f) Likewise, choosing Atta in S2 cells needs logic.

      We appreciated your comments. Our manuscript revealed that febrile temperature inducible HSF1 confers virus resistance by regulating the expression of antimicrobial peptides (AMPs) in L. vannamei. Further, we want to know that whether HSF1 regulation of antimicrobial peptides is a conserved defense mechanism induced by elevated temperature in arthropods, and experiments were performed in an invertebrate model system (Drosophila S2 cells). Previous study showed that DmAMPs (such as Attacin A, Cecropins A, Defensin, Metchnikowin, and Drosomycin) exerted a significant role in the antiviral immunity in Drosophila (Zhu et al., 2013). Our results showed that the expression of Attacin A, Cecropins A and Defensin were remarkably induced by DmHSF, and the expression of Attacin A was the highest induced. Therefore, DmAtta was chosen as a representative to further demonstrate that DmHSF1 exerts its anti-DCV function by regulating DmAMPs. We have added the description in Lines 328-330 and Lines 361-364 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Zhu, F., Ding, H., Zhu, B., 2013. Transcriptional profiling of Drosophila S2 cells in early response to Drosophila C virus. Virol J 10, 210.

      (2) From Figure 6I to 6K, the authors aimed to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperatures. However, what they showed was just showing that nSWD plays anti-WSSV function downstream of HSF1. The authors should show additional data for dsControl+rnSWD.

      Thank you for your suggestion. As your suggestion, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      (3) For the physical interaction between nSWD and WSSV, it will be great if the authors perform Alphafold3 prediction analysis (Abramson et al PMID: 38718835).

      Thank you for your suggestion. As you suggestion, we performed Alphafold3 prediction analysis on SWD and WSSV (VP24 and VP26). The predicted template modeling (pTM) score measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. The Alphafold3 prediction results show that there is a possible interaction between SWD and WSSV. Notably, our manuscript demonstrated that rSWD could interact with VP24 and VP26 by pulldown assays and confocal analysis.

      Author response image 3.

      Alphafold3 prediction analysis of SWD&VP24 as follow (pTM = 0.64)

      Author response image 4.

      Alphafold3 prediction analysis of SWD&VP26 as follow (pTM = 0.53)

      Minor comments

      (1) In the Abstract and many other places, the authors need to specifically write "Drosophila S2 cells" instead of "Drosophila" because conventionally Drosophila implies fruit fly as an organism. We don't say cultured human cells as "human" or "Homo sapiens" in papers.

      Thank you for your suggestion. We have modified the description of Drosophila in the revised manuscript. Thank you.

      (2) Figure numbers can be reduced for better readability. I would combine Figures 1 and 2, and Figures 3 and 4. If the combined figures are too crowded, some can go to into supplementary figures.

      Thank you for your suggestion. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. However, we have added some experimental data to Figures 1, 2, 3, and 4. Therefore, we did not combine Figure 1 and Figure 2, and Figures 3 and 4. Thank you.

      (3) One of the best-understood roles of HSF1 in physiology other than heat shock response is longevity, in particular with C. elegans. The authors need to mention this in the Discussion by citing the following recent review paper (Lee PMID: 36380728).

      Thank you for your suggestion. We have supplemented the description of HSF1 regulating longevity and aging of organisms and cited the above reference in the revised manuscript (Lee and Lee, 2022). Thank you.

      Reference:

      Lee, H., Lee, S.V., 2022. Recent Progress in Regulation of Aging by Insulin/IGF-1 Signaling in Caenorhabditis elegans. Mol Cells 45, 763-770.

      (4) Please make your own label for small letter panels or transfer small letter panels to supplementary figures.

      Thank you for your suggestion. We have adjusted the relevant letter labels. The uppercase letters represent the main image of the Figure, and the small letter panels are the corresponding supplementary instructions in the revised manuscript. Thank you.

      (5) In the introduction part, I recommend changing the references for HSFs and HSR with recent ones.

      Thank you for your suggestion. We have added the latest references for HSFs and HSR in the Introduction part of the revised manuscript. Thank you.

      (6) In Figure 1, it is not intuitive to understand the name groups W and TW.

      We appreciated your comments. We have added the description of Group W and Group TW in revised Figure 1. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing. Thank you.

      (7) Please add some kinds of sequence comparisons of SWD and nSWD for readers to understand the homology.

      We appreciated your comments. We have added the multiple sequence alignment of SWD proteins in shrimp species in revised Supplementary Figure 3. Highly conserved amino acid residues and cysteine and residues are highlighted in red, indicating that LvSWD is a conserved antimicrobial peptide of the Crustin family. Thank you.

      (8) Naming nSWD with "newly identified" is strange as it will not be new anymore as time goes by. Please change the name.

      Thank you for your suggestion. We have modified the name of nSWD to SWD in the revised manuscript. Thank you.

      (9) Please write the full name for Lv (Litopenaeus vannamei), Dm (Drosophila melanogaster), ds (double-stranded) before using LvHSF1, DmHSF1, and dsLvHSF1.

      Thank you for your comments. We have added the full name of LvHSF1, DmHSF1, and dsLvHSF1 in the revised manuscript. Thank you.

      (10) In Figure 2, it will be better to transfer poly I:C data to supplementary figures.

      Thank you for your comments. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. Thank you.

      (11) The label for pGL3-nSWD-M12 is confusing. M1 and M2 are OK. Please change M12 with M1/2 or another one.

      Thank you for your suggestion. We have changed pGL3-nSWD-M12 with pGL3-nSWD-M1/2 in the revised manuscript. Thank you.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This article presents useful findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

      We thank the editor and reviewers for their consideration of our revised manuscript and for their constructive suggestions. In response to the editor’s guidance, we have ensured that: 1) the experimental design is clearly presented as physiological forcing, 2) the Solstice-as-Phenology-Switch concept is explicitly defined, limited, and framed as inferred, 3) conclusions are strictly aligned with the scope of the evidence, and limitations are acknowledged transparently.

      We hope these revisions fully address the remaining concerns and clarify both the conceptual framework and the appropriate scope of inference.

      Public Review:

      Reviewer #1 (Public review):

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      We agree that the concept of “flexibility” required clearer definition and a more explicit link to the experimental results. In the Introduction, we now explicitly define flexibility as the capacity for the effective timing of the phenological switch to shift earlier or later depending on developmental progression, rather than occurring at a fixed calendar date. This switch occurs at the compensatory point between the antagonistic influences of early-season development [ESD effect] and late-season temperature [LST effect](L92-98). We have extended and clarified our explanation of the summer solstice’s role in this framework (L69-90). We propose that the solstice acts as an environmental switch that initiates the LST effect, as declining daylengths signal trees to become responsive to late-season cooling (L92-94). The compensatory point then occurs where the advancing ESD effect is balanced by the delaying LST effect. This point should therefore not be fixed to a calendar date but instead vary with developmental progression each year (L75-95).

      In the Discussion, we clarify that flexibility is demonstrated experimentally by the observation that the magnitude of July cooling effects (LST effect) on autumn phenology depend on prior developmental rate (ESD effect) [3.4 times greater delay in late-leafing trees], indicating that the position of the compensatory point is development-dependent rather than fixed to June 21 (L398-410). We have made consistent edits throughout the Discussion, in particular in the ‘Support for the Solstice-as-Phenology-Switch Hypothesis’ subsection (L514-530).

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      We fully agree and have clarified this in the revised manuscript. In the Discussion, we now clearly state that the compensatory point is a conceptual node inferred from responses to cooling before the solstice (June), directly after it (July), or later in the growing season (August) rather than a directly observed phenological event (L352-358 & L405-406).

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency.

      This inconsistency reflects biological complexity. In the Discussion, we now expand our interpretation to note that terminal and lateral buds may differ in developmental status, resource allocation and hormonal context. We emphasize that bud-type effects are therefore expected to be context-dependent and to interact with wholeplant developmental state, which plausibly explains why effects differ across leafing groups and models (L390-396).

      In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

      We now discuss the explained and unexplained variance in more detail. We also make it clear that our experiment was designed to test specific mechanistic pathways rather than to fully explain all phenological variability or maximise predictive power L417-419).

      In the Discussion, we acknowledge that a substantial fraction of variation remains unexplained (L419-421). We discuss the possibility of other physiological mechanisms, such as photosynthetic assimilation, contributing to the unexplained variation (L421-427). However, large inter-individual variability is commonplace in autumn phenology. A low intra-class correlation coefficient (ICC = 0.26; see L276-280 for methods) suggests much of the remaining variation is attributable to individual-level differences rather than missing explanatory variables (L429-431). In line with the literature, we suggest that genetic and epigenetic differences likely contributed significantly to inter-individual variation, even within a single provenance population (L431-434). In this context of high individual variability, leaf-out timing (ESD effect) and summer cooling treatment (LST effect) together explaining 23.4% of variation in bud set timing is biologically meaningful and demonstrates the mechanistic importance of these processes (L438-441). For completeness, we also briefly discuss alternate sources of within-treatment variability (L434-437).

      Reviewer #2 (Public review):

      I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them.

      We appreciate this concern and have substantially revised the manuscript to clarify the experimental logic. In the Introduction, we now state explicitly that the study uses temperature regimes that were designed as strong physiological forcing treatments, intended to deeply constrain development and isolate mechanisms rather than to simulate natural or future climatic conditions (L113-115).

      In the Methods, we have enhanced our description of the non-linear effects of temperatures below 10°C on physiological processes (L154-158).

      At the start of the Discussion, we have added a dedicated paragraph clarifying the scope of inference: the experiment tests causality and constraint (i.e. whether specific physiological processes can drive phenological shifts), not quantitative responses under realistic climate scenarios (L346-363). Throughout the Discussion, we have revised language that could be read as scenario-based interpretation, replacing it with mechanistic phrasing.

      Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

      Given the large individual variation expected in phenological experiments, we used single experimental populations of single provenance beech saplings to minimise uncontrolled for variation arising from genetic differences (L358-360). This allowed us to elucidate mechanisms despite noisy biological heterogeneity associated with phenology.

      In the last round of revision, we toned down statements of generalisation. In the Discussion, we now go further to clarify what mechanistic understanding can be gleamed directly from our findings and then cautiously make suggestions how these mechanisms may play out in natural systems. We repeatedly state the intention of the study as mechanistic inference rather than predictive power, e.g. “However, extrapolations to more complex natural ecosystems should be made with caution as our experimental design prioritised mechanistic inference over generalisability and predictive power.” (L417-419). Alongside our previous calls for tests on other species, we now additionally call for tests on other provenances of beech (L511-512).

      I was also very concerned by the revisions.

      If this concern stems from the confusion regarding line-numbers and the two submitted versions of the manuscript (with tracked changes and without tracked changes; as required by eLife), then we hope that situation is now clarified. Otherwise, the authors do not understand why our previous revisions would be perceived as being concerning. Regardless, we have made every attempt to address the remaining comments comprehensively.

      Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-asPhenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      We appreciate this important conceptual point. The Solstice-as-Phenology-Switch hypothesis is central to our conceptual model and therefore requires clear explanation. In concert with our changes in response to Reviewer 1’s comment regarding flexibility, we have substantially revised and improved our description of this hypothesis (L69-108).

      Whilst the summer solstice is fixed to a calendar date (June 21), the timing of when trees change their autumn phenological responses to temperature is not (L88-90 & L515-517). This occurs when the compensatory point of two antagonistic effects is crossed. Higher early-season development rates (which are driven by temperature) have an advancing (negative) effect on autumn phenology, which we now refer to as the ESD effect (L71-78). Warmer late-season temperatures have a delaying (positive) effect because trees become phenologically susceptible to cooling, i.e. overwintering responses are induced in response to cooling, which we now refer to as the LST effect (L78-82). The point in time when these two effects balance each other out, i.e. the net effect = 0, is the compensatory point (L95-97 & L523-525). The reason this point occurs after the solstice, is because the LST effect only becomes active when days begin to shorten (L92-94 & L522-523). The solstice acts as an environmental switch, initiating trees’ susceptibility to cooling. Therefore, the solstice is referenced in the hypothesis because it forms a daylength barrier. In this framework, the compensatory point cannot occur earlier than the solstice because day lengths are still increasing (L517-519).

      In the Introduction and Discussion, we clarify that the solstice is referenced as a biologically meaningful photoperiodic cue, not as a fixed threshold date. We now emphasise that the hypothesis concerns a seasonal reversal in responses to temperature structured around photoperiod, whose effective timing depends on developmental state, rather than a reversal occurring precisely on June 21. To avoid confusion, we have reworded phrases such as “summer solstice effect reversal” to “reversal of phenological responses to temperature after the summer solstice” (L371). In accordance, we have also changed the title to “Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice”.

      The following comments stem from the first round of review. We have previously revised the manuscript in accordance with these comments. For most of these points we do not see further cause for changes except for any overlap with comments above. We therefore predominantly copy our previous responses in quotes for clarity, the exception being the comment regarding the framing of our results in relation to natural systems.

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper.

      “We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.”

      The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We appreciate the reviewer’s concern regarding the use of relatively extreme temperature treatments and the need to ensure that our conclusions are consistent with the motivation for using them. The manuscript was also revised in this regard in the previous round, and we copy the relevant responses at the bottom of this response. Despite this, we agree that further explanation of how our experimental treatments suited the aims of our study was still required.

      The aim of these treatments was not to reproduce typical ambient conditions, but to act as a mechanistic probe. Such mechanisms are not readily identifiable from observations or mild manipulations, because the expected effects are small relative to natural variability; stronger perturbations are therefore required to generate a diagnostic contrast. By strongly constraining development in the early-season, and by providing a robust cooling signal in the late-season, we sought to reveal the causal structure underlying the observed solstice-related reversal in temperature effects on autumn phenology.

      Temperatures below 10°C intensively slow down cell division and mitotic rates, these rates then rapidly and non-linearly approach 0 as temperatures drop towards 0°C (Körner, 2021). As reflected in L152-158 of the revised manuscript, we selected a spring cooling regime of 2–7 °C to strongly slow developmental processes while maintaining a clear thermal safety margin that eliminates the risk of frost damage. Although a milder cooling regime (e.g. 5–10 °C) would be less extreme, it would also be expected to produce only a comparatively small reduction in developmental rates, thereby substantially reducing our ability to generate distinct early- and late-developing individuals and to detect carry-over effects on autumn phenology. Applying strong cooling therefore increases signal-to-noise and allows us to detect the underlying mechanism, which would not be possible with temperature treatments that represent average contemporary climatic variation.

      The use of conditions out with the norm is a standard practice to elucidate mechanisms in ecology, where organisms are often pushed to their physiological limits or transplanted into environments fundamentally different to those which they are adapted (Somero, 2010; Berend et al., 2019). Experiments targeting autumn phenology have utilised a broad range of environmental conditions from moderate to extreme manipulations (Tanino et al., 2010). For example, to test the controls of growth cessation and dormancy induction in Prunus species, one study applied a range of treatments including constant 9°C temperature and 24 hour photoperiod between April and July (Heide, 2008).

      Our experimental design aimed to reduce rates of development, cell division and maturation. In the Methods, we describe this aim and clearly state that the experimental design was not intended to mimic natural climatic variation (L154-156 & L181-186). Importantly, our conclusions are framed at the level of direction, timing, and interaction of effects, rather than the magnitude expected under contemporary or future field conditions (L360-363).

      This framing intends to reflect the primary inference of this study, which concerns when and why temperature effects reverse around the solstice, and how this timing depends on developmental state and diel temperature exposure, rather than making quantitative predictions for present-day or future climates. This aligns our conclusions with the experimental design. We have further revised the Discussion to explain these aims and conclusions more clearly, including the addition of a subsection at the beginning titled “Experimental forcing and scope of inference” (L346-363). We have also set up this expectation in the Introduction (L113-115).

      Additionally, we have improved the Discussion in a number of related aspects.

      We explicitly separate mechanistic conclusions and any relation to natural systems, remaining cautious to not overgeneralise or overstate our findings (L417-419).

      We now include a dedicated paragraph explaining that, although these specific conditions are not likely to be found in beech’s range, analogous developmental constraints can arise during cold springs, late cold spells following budburst, or at high-elevation and continental sites where temperatures remain low despite increasing photoperiod (L540-545, L583-588). We further explain that because developmental progression integrates temperature cumulatively over time, even short episodes of strong cooling can exert lasting carry-over effects on seasonal timing, thereby linking the forced experimental responses to processes relevant under natural, fluctuating conditions (L545-550).

      We explicitly state that the decoupling of day and night temperatures was not intended to represent realistic meteorological states (L458-460). We explain that this design was used diagnostically to isolate inherently diel physiological processes (e.g. nocturnal growth, cell division and expansion versus daytime carbon assimilation), and that the observed responses demonstrate the importance of diel timing of temperature exposure rather than the realism of the imposed cycles (L460-468).

      Previous response:

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants. We have added text in the Methods to clarify this aim.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods and Discussion.

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions.

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that carbon assimilation is an important component of forest carbon dynamics. However, the primary aim of this study was to identify how developmental state and diel cycles mediate temperature effects on autumn phenology, rather than to quantify carbon assimilation per se. Assessing photosynthetic controls on autumn phenology would require a substantially different experimental design and is therefore beyond the scope of the present study.

      That said, we were able to include measurements of photosynthetic assimilation during pre-solstice cooling (now presented as Fig. S12 for all treatments). These data show that cooling strongly reduced assimilation across all treatments, despite their markedly different phenological outcomes. This supports our interpretation that variation in assimilation alone cannot explain the observed phenological responses, consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1, our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously and highlight the need for further research across species.

      And the referenced response to Reviewer one:

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”) and explicitly call for follow-up studies across species and forest contexts. At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and groundbased phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.”

      As described in responses above, we have further clarified what can be directly concluded from our study, avoiding overgeneralisation.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants. 

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker. On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, budset occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech”.

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. Photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      Following the addition of an analysis of leaf senescence data, we also revised the terminology in places (including the title) from “primary growth cessation/bud set” to the broader term “autumn phenology.” This term is intended to encompass two distinct but related physiological processes—bud set and leaf senescence—both of which are commonly used as markers of autumn phenology and the end of the growing season.

      Somewhat minor comments:

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9) and inferences are not altered. We also report the bud type effects for experiment 1 and experiment 2.

      (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

      Our responses to the main comments in this new round of revision have comprehensively covered this topic.

      References

      Berend K, Haynes K, MacKenzie CM. 2019. Common garden experiments as a dynamic tool for ecological studies of alpine plants and communities in northeastern North America. Rhodora 121: 174.

      Heide OM. 2008. Interaction of photoperiod and temperature in the control of growth and dormancy of Prunus species. Scientia Horticulturae 115: 309–314.

      Körner C. 2021. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Cham: Springer International Publishing.

      Somero GN. 2010. The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. Journal of Experimental Biology 213: 912–920.

      Tanino KK, Kalcsits L, Silim S, Kendall E, Gray GR. 2010. Temperature-driven plasticity in growth cessation and dormancy development in deciduous woody plants: a working hypothesis suggesting how molecular and cellular function is affected by temperature during dormancy induction. Plant Molecular Biology 73: 49–65.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.

      We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.

      We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:

      Page 32:

      “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.

      Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:

      Page 17:

      “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”

      Page 19:

      “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”

      Page 21:

      “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”

      (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.

      Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see our clarification below:

      Supplementary Page 4:

      “Supplementary Note 8: Clarification for FDR correction.

      In the clinical dataset we conducted a large number of inferential tests (χ<sup2\</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values.”

      (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.

      Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.

      To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S+ and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a×b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.

      We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:

      Page 15:

      “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants (500 females; age: 20.90±2.41) (46). One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”

      Page 19:

      “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”

      Page 26:

      “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”

      (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?

      Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups—which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.

      Pages 12-13:

      “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking (54,56). By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”

      Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      (1) Large sample size.

      (2) Replication of their own findings.

      (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.

      Weaknesses:

      I can't really see any major weakness, but I have a few questions:

      (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.

      Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).

      As suggested, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see our clarifications below:

      Supplementary Pages 2-3:

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Page 10:

      “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies (34,35).”

      (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].

      Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.

      (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?

      Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.

      However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see our clarifications below:

      Supplementary Pages 3-4:

      “Supplementary Note 6: integration of mood into choice models

      Although we modeled choice and mood separately to examine cognitive and affective mechanisms underlying increased risk behavior in adolescent suicidal patients, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model).

      Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2).

      Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. The mood bias parameters in neither cM2 nor cM3 reached significance (ps > 0.091), which may be due to the absence of a blocked design in our experiment, unlike in Vinckier et al. (2018) and Eldar and Niv (2015).”

      (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.

      Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.

      Please see our clarifications below:

      Page 26:

      “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”

      Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M₁), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Please see our revisions below:

      Page 17:

      “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”

      Pages 18-19:

      “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”

      Page 21:

      “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”

      (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Pages 3-4:

      (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      We apologise for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.

      It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.

      We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.

      (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:

      Pages 3-4:

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear)(31-33,39).”

      We have corrected grammatical errors throughout the manuscript.

      5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:

      Page 32:

      “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".

      Thank you for this suggestion. We have now corrected it.

      (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".

      Thank you for this suggestion. We have now corrected it.

      (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.

      Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.

      (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).

      Thank you for this suggestion. We have now corrected it.

      (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).

      Thank you for this suggestion. We have now corrected it. Please see below for our revision:

      Page 12:

      “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”

      (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.

      Thank you for this suggestion. We have now removed plots for model prediction.

      (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.

      Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings—predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB. We have clarified this point below:

      Page 32:

      “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”

      (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].

      Thank you for this suggestion. We have now corrected it to make it clear.

      (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).

      Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.

      As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see our revision below:

      Supplementary Page 4:

      “We also considered the traditional bias parameter (cM4), rather than approach/avoidance parameters. We limited the bias to the range of [-100, 100], which was in reward-equivalent units.

      However, model comparison did not support cM4 (Table S6).”

      (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.

      Thank you for this suggestion. We have now simplified it. Please see response to Reviewer 2, point 3.

      (11) It is not clear what equations are applied to mixed trials in cM3.

      Sorry for the confusion. We have now clarified this point.

      Page 12:

      “Approach/avoidance parameters are not applied to in mixed trials.”

      (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?

      Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.

      (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.

      Sorry for the confusion. We have now corrected this phrase.

      (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?

      Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:

      Page 15:

      “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”

      (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?

      Thank you for this suggestion. We have now corrected it to make it clear.

      (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.

      Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).

      (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.

      Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.

      (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)

      Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:

      Page 28:

      “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”

      (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?

      Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.

      For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.

      In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see our revisions below:

      Supplementary Pages 2-3:

      “Model recovery: We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Typos:

      (1) Line 90: original → originate

      (2) Line 596-598 - the same phrase is repeated twice.

      (3) Line 616: on the other word → hand.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.

      Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:

      Pages 4-5:

      “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood(25). The motivational–volitional model(26) and the three-step theory(27,28) similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals(8). Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance(29), providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”

      We have also refined the wording and corrected typos throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.

      Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:

      Page 2 (Abstract):

      “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”

      Page 3 (Introduction):

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.

      Thank you for this suggestion. We have now corrected it.

      (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.

      Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear)(31-33,39).”

      (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.

      Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:

      Page 4:

      “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”

      (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.

      Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities—expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.

      (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:

      a) Abstract: regardless the lotteries -> regardless of the lotteries'.

      b) Line 78: it remains whether.

      c) Line 80: can each -> each can.

      d) Line 90: may original from.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.

      Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).

      (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.

      Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:

      “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details); 3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”

      (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.

      Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point below:

      Supplement Pages 4-5:

      “Supplementary Note 9: Mood model comparison using subjective values.

      To identify whether mood modeling was based on objective or subjective values, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox (Daunizeau et al., 2014) to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000).”

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      We thank the editor and reviewers for their thoughtful and constructive feedback, which has enabled us to greatly strengthen the manuscript. We apologize for the delay in resubmitting this as we were dealing with a large turnover in the lab due to trainee graduations which has We have carefully revised the text, figures, and supplementary materials in response to these comments. Below, we summarize the key revisions made followed by a point-by-point response to the reviewers’ critiques.

      (1) Performed CUTS analyses in human neuronal system: In the revised manuscript, we included new data demonstrating that the CUTS system can be applied to additional cellular models, specifically neuronal cells (Figure 5, Figure S4). To address whether CUTS functions effectively in neuronal contexts, we generated stable CUTS-expressing lines in differentiated BE(2)-C and ReN VM–derived differentiated neurons (Figure 5A-D, Figure S4 A-C). To ensure this was neuronal expression, we developed a new Tet-On3G system construct where the Tet-On3G transactivating protein is driven by the SYN1 promoter to ensure neuron-specific inducible expression for these experiments.

      (2) Define the relationship between CUTS and endogenous/physiological cryptic exons inclusion: To evaluate how well the CUTS system reflects physiological cryptic exon regulation, we performed RT-PCR analysis of several cryptic exons previously reported by us and evaluated CUTS activation at the RNA level in parallel (Figure S2E) . CUTS is sensitive to low-mild reductions in TDP-43 levels, whereas the tested endogenous cryptic exons exhibit variable responses to TDP-43 knockdown.

      (3) Defining stress-induced TDP-43 loss of function: We included new data demonstrating that the CUTS system can detect TDP-43 loss of function induced by acute sodium arsenite (NaAsO₂) treatment in HEK cells (Figure 3D–I). We have also tested additional stressor as part of a separate ongoing study where this work will be expanded upon (Xie et al., 2025). We selected this paradigm since TDP-43 loss of function in response to acute NaAsO₂ treatment is also supported by work from other labs(Huang et al., 2024).

      (4) Implications of using a TDP-43 Loss-of-Function sensor for therapeutic applications: In the revised manuscript, we clarify that CUTS-TDP43 is auto-regulated and we highlight two potential therapeutic applications: i) TDP-43 Knockdown-and-replacement: CUTS-TDP43 provides a strategy for simultaneous depletion of pathological TDP-43 species while enabling autoregulated re-expression of wild-type TDP-43. This design mitigates the risk of supraphysiologic overexpression, a known liability in conventional replacement approaches, by restoring TDP-43 within a self-limiting regulatory network that maintains homeostatic control. ii) Aggregation-independent correction: Because CUTS is autoregulatory, it can be repurposed to regulate alternative downstream effectors, including splicing modifiers or TDP-43 functional interactors, without expressing TDP-43 itself. This approach provides a potential aggregation-independent strategy to compensate for TDP-43 loss-of-function (LOF) by restoring downstream splicing. We are evaluating this work in a follow up study (Xie et al., 2025). In these ongoing studies, we show that CUTS-regulated expression of splicing proteins in response to TDP-43 loss restored subsets of cryptic exon events (24/28 events evaluated). These findings suggest CUTS as a versatile tool for both autoregulated TDP-43 replacement and trans-regulatory therapeutic correction. We expanded on this concept in the discussion section of this revised manuscript. We also note that autoregulatory TDP-43 biosensor strategies have been proposed in related systems, including TDP-Reg, underscoring broader interest in self-regulated TDP-43 systems (Wilkins et al., 2024).

      (5) Clarified mechanism of TDP-43 5FL causing strong loss of function: The TDP-43 5FL exhibits reduced RNA binding capacity, and we previously showed that the lack of RNA binding promotes aberrant homotypic phase separation of TDP-43 (Mann et al., 2019). Expression of RNA-deficient TDP-43 variant forms nuclear “anisomes” (Yu et al., 2021), which evidence suggests sequesters endogenous TDP-43 protein into insoluble structures. We expanded on this in our results section in this revised manuscript.

      (6) Improved figure clarity and data presentation: To enhance clarity and organization, we maintained the main structure of the manuscript while reorganizing figures and improved data visualization. Some examples include:

      Figure 1: We revised the schematic layout for greater clarity and simplicity. The figure now focuses more specifically on the CUTS data, with additional data on the UNC13A-TS and CFTR-TS moved to Figure S1. To improve readability, titles were added to all schematic panels. Visual consistency was also improved by refining the color labelling for each sensor in Figures 1C and 1D and adjusting the corresponding bar graphs accordingly.

      Figure 2: We reorganized the figure to clearly distinguish between protein and mRNA analyses for greater clarity. In the revised layout, western blot quantifications of TDP-43 and CUTS (GFP) signals are shown in Figures 2D and 2E, respectively, while the corresponding qPCR analyses are presented in Figures 2H and 2I. Minor edits include removing the percentage knockdown and fold-change annotations from the graphs and incorporating these values into a mini-table in Figure S2E.

      The original Figure 2D and 2G were reincorportated as reference panels in Figure S2A–B, while new graphs showing CUTS protein-level changes as a function of TDP-43 knockdown were added (Figure S2C–D). We also incorporated new data showing the behavior of endogenous cryptic exons under low siTDP-43 treatment (Figure S2E).

      Figure 3: We added new data demonstrating that the application of the CUTS system in detecting TDP-43 loss of function induced by stress conditions. Specifically, we show that sodium arsenite (NaAsO₂) treatment leads to TDP-43 functional impairment detectable by CUTS and supported with endogenous cryptic exon via RT-PCR (Figure 3D-I).

      Figure 5 and Figure S4: We introduced a new figure that demonstrates the effective application of the CUTS system in differentiated neuronal systems, thereby extending its usability to disease-relevant cell types.

      Figures 2SA and 4B were edited to include the corresponding labels on the sides of each image for clarity. Sup Figure 2A was moved to Sup Figure 3A, while Figure 4B remains in its original configuration.

      We thank the reviewers again for their insightful critiques and helpful suggestions, which have enabled us to substantially improve the manuscript. Please find our detailed response to each review below:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices.

      (1) Testing the sensor in other cell lines

      We thank the reviewer for raising this important point. In agreement with this suggestion, we generated ReN VM cell lines and used a neuroblastoma cell line model (BE(2)-C) expressing the TetOn3G CUTS system under a human synapsin I (hSYN1) promoter. In this construct the transactivator protein is under the control of a neuronal specific hSYN1 promoter whereas the classical TetOn3G system uses a CMV-like promoter. Several studies have reported reduced activity or silencing of CMV and PGK-driven transgenes in neurons. Therefore, we for our neuronal experiments, we removed this promoter to generate a new version of a doxycycline-inducible CUTS system in which Tet-On 3G transactivator is now driven by the hSYN1 promoter which will express CUTS in response to doxycycline treatment. In this improved construct, we also replaced mCherry with mScarlet to enhance the fluorescent signal.

      To test this neuronal-adapted system, we established stable CUTS expression in undifferentiated BE(2)-C cells, a subclone of the SK-N-BE(2) neuroblastoma line that has been used to study TDP-43–dependent splicing function(Brown et al., 2022). This model can be differentiated into neuron-like cells within 10 days, as shown in Supplementary Figure 4A. Using this model, we confirmed that TDP-43 knockdown leads to robust activation of the CUTS system (Figure 5B-E). We additionally tested this in in a stable polyclonal ReN VM cells following differentiation into cortical-like neurons (Figure 5D, Figure S4B-C).

      (2) Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP43.

      We agree with the reviewer that correlating the sensor’s readout with physiological TDP-43 splicing targets is essential to validate its biological relevance. To this end, we complemented our sensor expression profile with endogenous cryptic exons (CEs) sensitive to TDP-43 depletion. We tested a panel of five physiological cryptic exons regulated by TDP-43 (LRP8, EPB41L4A, ARHGAP32, HDGFL2, and ACBD3). To address the reviewer’s concerned, we performed RT-PCR on samples from the low-dose siTDP-43 experiment shown in Figure S2E.

      The endogenous CEs used in the panel were selected based on our own and others’ preliminary observations. Among these, HDGFL2 showed a particularly robust increase in cryptic exon inclusion at very low siTDP-43 concentrations (38 pM), while untreated samples showed almost no CE inclusion. This finding strongly supports a direct mechanism linking mild TDP-43 reduction to loss of physiological splicing control.

      (3) Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank the reviewer for this thoughtful point and agree that in the disease-relevant context where endogenous TDP-43 is intact but TDP-43 function is lost due to mislocalization and/or aggregation, a re-supply of TDP-43 risks sequestration and loss of activity. In our manuscript, the CUTS-TDP43 module was presented as a control circuit proof-of-concept rather than a stand-alone approach: it demonstrates that CUTS can (i) sense LOF with high dynamic range and proportionality, and (ii) drive a payload under negative feedback such that total TDP-43 remains near baseline while partially rescuing a splicing readout (CFTR minigene) under knockdown conditions.

      Importantly, we evaluated CUTS in aggregation/mislocalization-prone contexts: ΔNLS, 5FL, and ΔNLS+5FL variants trigger CUTS activation (ref), allowing us to quantify LOF arising from these aggregation modes. This confirms that CUTS can operate precisely in the very settings where sequestration is likely to occur.

      To directly address the reviewer’s suggestion, in the revision we (i) clarify in the Discussion that CUTS-TDP43 is a circuit demonstration and not our proposed monotherapy in aggregation-dominant disease; and (ii) expand our therapeutic framing into two approaches:

      Knockdown-and-replacement: concurrently deplete aggregation-prone/endogenous pathologic TDP-43 species (i.e., mutant TDP-43) while using CUTS to re-deliver wild-type TDP-43 under autoregulation. Aggregation-independent correction: use of CUTS to deliver modifiers that bypass TDP-43 sequestration (e.g., downstream effectors or splicing correctors that restore LOF consequences without expressing TDP-43 itself).

      (4) I don't think the quantity of siRNA is directly proportional to the degree of TDP-43 knockdown/extent of TDP-43 loss. Therefore, to enhance the utility of the dose-response curves, I'd suggest using TDP-43 levels as the variable on the x-axis, rather than the amount of siRNA administered or even just adding a plot alongside the current plots would enable readers to quickly evaluate LOF response levels concerning the protein. While I understand that the sensitivity of Western blots for quantification might be why the authors have not created the graphs in this manner, having this information would be useful.

      We appreciate the reviewer’s insightful comment. As noted, in the original version of the graph, we incorporated the percentage of TDP-43 knockdown corresponding to each siTDP-43 concentration (indicated in red text). However, we agree that this format was not easy to interpret, given the amount of information presented. To address this, we generated two new plots in which the x-axis represents TDP-43 levels (percentage of remaining protein or mRNA), and the y-axis shows the fold change in CUTS signal measured by (i) TDP-43 protein pixel intensity and (ii) TDP-43 mRNA levels, respectively. These new plots are now included as Supplementary Figures 2C–D, which allow a clearer visualization of CUTS readout in relation to actual TDP-43 levels rather than siRNA dose. As the reviewer anticipated, the reason we did not originally present the data in this format was that at low siTDP-43 concentrations, the fold change is minimal and more difficult to quantify by Western blot. Nevertheless, we have now incorporated the revised plots to strengthen the interpretation of the dose–response relationship. Additionally, we experience batch effects across siRNA lots. We believe this revised format should enhance the clarity of the result.

      (5) p3 line 74: one of the reasons cited as a pitfall of using the endogenous cryptic exons exhibit variable responses to TDP-43 loss and may be cell type-specific. has the sensor been used in different cell lines?

      We tested the CUTS system in differentiated neuronal models using two differentiated neuronal cell types, BE(2)C and ReN VM cells. The results are presented in Figure 5 and Figure S4 of the revised manuscript.

      (6) The order of the text describing 1A and 1B is confusing. The text starts describing the TS cassettes referring to 1A using the CUTS cassettes which haven't been introduced yet as an example. I'd suggest reorganising this section. The graph, always in 1A showing readout proportional to GFP should be taken out or highlighted in the figure legend that it is theoretical.

      We agree with the reviewer’s point. In the original schematic (Figure 1A), we included the CUTS system as an example to introduce the TS cassette design, since it contains the three possible sensor configurations. However, we recognize that this could be confusing. Therefore, we have removed the CUTS cassette from Figure 1A, along with the theoretical graph showing GFP readout proportional to the degree of TDP-43 LOF. In agreement with this change, we also restructured Figure 1. As the focus is the CUTS system, we have moved the Western blot and quantification of UNC13A-TS and CFTR-TS to Supplementary Figure 1.

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFPfluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      (1) Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed.

      We thank the reviewer for highlighting the importance of validating the sensor in neuronal models, given the central role of TDP-43 dysfunction in ALS/FTD and related neurodegenerative disorders. While initial characterization in established cell lines provides experimental control and scalability, we agree that demonstrating functionality in neuronal systems is essential. To address this, we adapted the CUTS platform for neuronal application by incorporating the human synapsin-1 (hSYN1) promoter into the Tet-On 3G system to enable inducible, neuronal specific expression. We validated this configuration in differentiated BE(2)-C cells (Figures 5A-C, S4A-C), where CUTS retained robust responsiveness to TDP-43 perturbation. In parallel, we generated stable CUTS-expressing ReN VM neural progenitor cells and differentiated them for three weeks prior to functional assessment (Figures 5A-C, S4A-C). In both neuronal models, CUTS was functional and responsive to TDP-43 siRNA. We are currently optimizing promoter selection and expression paradigms for fully differentiated iPSC-derived neuronal models and will be the subject of future studies.

      (2) The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      We thank the reviewer for this thoughtful suggestion. We agree that flow cytometry and sorting of GFP-positive populations would provide a higher-resolution, single-cell–level relationship between TDP-43 abundance and sensor output. Such an approach would reduce heterogeneity arising from incomplete siRNA penetrance and allow more precise quantification of how incremental changes in TDP-43 protein levels track with GFP fluorescence. In the present study, our goal was to establish proof-of-principle functionality of the CUTS circuit and to demonstrate that graded TDP-43 depletion produces a proportional sensor response at the population level. While GFP signal heterogeneity is visible in imaging panels, we hypothesize that this variability likely reflects known differences in siRNA uptake and transfection efficiency rather than instability of the circuit itself. Importantly, bulk measurements consistently demonstrated dose-dependent sensor regulation across independent experiments, supporting the robustness of the system despite cellular heterogeneity. Furthermore, we were able to quantify CUTS activation in HeLa TARDBP<sup>-/-</sup> cells. We also note that CUTS was developed as a practical tool for rapid assessment of TDP-43 LOF in standard laboratory settings. Although flow cytometry increases resolution, the ability to detect functional perturbation using bulk fluorescence measurements supports the utility of the system for routine and high-throughput applications.

      We agree that flow cytometry would provide a more refined analysis of the dynamic range and sensitivity of CUTS, particularly for defining thresholds such as minimal TDP-43 knockdown required for measurable activation. We plan to include this work in future studies. Specifically, we have implemented FACs sorting of CUTS-expressing cells in a parallel study in which we are conducting a CRISPR knockout screen to identify modifiers of TDP-43 splicing function. For this, we incorporate TDP-43 knockdown followed by FACs to stratify cells based on CUTS activation. This strategy enables direct evaluation of the relationship between the extent of TDP-43 LOF and CUTS sensor activation. These analyses are ongoing and provide a more quantitative analyses linking TDP-43 depletion to CUTS activation and address the reviewer’s concern regarding heterogeneity in bulk measurements. We plan to include this in a future study.

      (3) Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs.

      We thank the reviewer for this suggestion. In response, we have split the graphs previously shown in Figures 2D and 2G to improve clarity, as we agree that these panels contained an extensive amount of data. We Specifically split Figure 2D into two separate graphs showing TDP-43 and GFP pixel intensity from Western blots on the Y-axis, plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 D and Figure 2E in the new manuscript.

      Furthermore, for Figure 2G we also split into graphs showing the fold change of mRNA for TDP-43 and the CUTS cryptic exon plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 H and Figure 2I in the new manuscript. We have maintained the previous graphs in Supplementary Figure 2 to preserve the full dataset for reference.

      (4) Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      We appreciate the reviewer’s careful observation. In both figures, we are showing mCherry and GFP signals. In the revised version, we have added the corresponding labels to the side of each image for clarity. Therefore, Sup Figure 2A has been moved and is now Sup Figure 3A, while Figure 4B remains in its original configuration.

      (5) Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified.

      The TDP-43 5FL variant exhibits reduced RNA-binding capacity, and we previously demonstrated that impaired RNA binding promotes aberrant homotypic phase separation of TDP-43. Consistent with this mechanism, expression of RNA-binding–deficient TDP-43 variants induces the formation of nuclear “anisomes” which have been shown to sequester endogenous TDP-43 into insoluble fractions via dominant-negative mechanisms (Cohen et al., 2015; Keating et al., 2023; Mann et al., 2019; Yu et al., 2021). These findings support a model in which disruption of RNA engagement alters TDP-43 biophysical behavior and promotes functional depletion through self-association. We have expanded this mechanistic explanation in the Results section of the revised manuscript to better contextualize the behavior of the 5FL construct and its impact on endogenous TDP-43.

      (6) Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      We appreciate this suggestion and agree with this important point. Due to the lack of methods to directly induce endogenous TDP-43 aggregation and loss of function, the use of stressors has become a partial solution to address this issue. In line with this, our group has tested several stressors in follow-up research, including sodium arsenite (NaAsO₂), puromycin, KCl, MG132, sorbitol, and tunicamycin, using HEK cells expressing the CUTS system(Xie et al., 2025). We were able to show a dose-response relationship in relative GFP intensity under these conditions, with sodium arsenite showing the strongest effect, consistent with previous reports(Huang et al., 2024). To provide additional relevant findings in the current manuscript, we expanded this analysis by testing sodium arsenite in the CUTS system while also including endogenous cryptic exons. We therefore added a new figure showing the effect of sodium arsenite on the CUTS system, including GFP intensity measurements, qPCR using CUTS cryptic exon primers, and three endogenous cryptic exon reporters (ATG4B, GPSM2, and KCNQ2).

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      (7) Regarding the methods, they seem a bit sparse and would benefit from additional detail. For example, I do not see a section in the methods where microscopy images were quantified (%GFP positive cells for example). This information is important and is lacking in the current form.

      We thank the reviewers, and we add the following information in the method section: For live imaging quantification, we measured the mean GFP signal intensity for each group. The values were averaged, and the fold change was calculated and plotted. For immunofluorescent imaging, we first created maximum intensity projection images. We then applied masks to the GFP, mCherry, and Hoechst signals. By overlapping the GFP and mCherry signals, we identified the number of GFP-positive cells. Similarly, by overlapping the mCherry signal with the Hoechst mask, we identified the CUTS-expressing cells. We then calculated the ratio of GFPpositive cells to CUTS-expressing cells and plotted it as a percentage of GFP-positive cells. All analyses were performed using the Nikon NIS software. This information is included in the methods of the revised manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      (1) While the rationale for selecting UNC13A CE as the reporting CE species is understood given the relevance to disease, could the authors please comment on whether other CE sequences would behave similarly or as robustly? This is particularly critical given the multitude of different splicing changes that can occur as a result of TDP-43 loss of function (ie cryptic exons of differing sensitivity, skiptic exons, premature polyadenylation).

      We thank the reviewer for this question regarding generalizability beyond the UNC13A CE. While UNC13A was selected due to its strong disease relevance and well-characterized sensitivity to TDP-43 loss-of-function (LOF), our platform is not intrinsically restricted to this sequence. In the manuscript, we directly compared three architectures: UNC13A-TS, CFTR-TS, and the combined CUTS sensor incorporating additional UG motif optimization. Under matched conditions in stable HEK293 lines, CUTS demonstrated superior specificity and sensitivity, exhibiting near-zero baseline activity and a proportional, log-linear response across low-dose siTDP43 (38–1200 pM) (Figures 1–2). Importantly, this head-to-head comparison demonstrates that sensor performance can be engineered and optimized beyond a single CE species.

      TDP-43 LOF is known to induce a spectrum of RNA processing defects, including cryptic exons with differing sensitivities and cell-type dependence, premature polyadenylation events (e.g., STMN2), and, under conditions of excess nuclear TDP-43, exon skipping (“skiptic exons”). This diversity supports the concept in which alternative CE elements, or other TDP-43 regulated RNAs, can be incorporated into the same sensor backbone and tuned for specific biological scenarios (cell type, specific stress responses, etc...). Consistent with this, the recently described TDP-REG system (Wilkins et al., 2024) designed and AI-generated de novo CE sequences to express reporters or gene payloads, and screened multiple candidates to identify the appropriate RNA elements required for this response. These findings demonstrate that CE sequences beyond UNC13A can serve as robust TDP-43 sensing elements when optimized. Our results complement this work by demonstrating that CUTS achieves tight baseline control and a steep dynamic range (>110,000-fold induction over baseline in HEK293 cells), while maintaining compatibility across both non-neuronal and neuronal model systems, as shown in the revised manuscript.

      In the revised manuscript, we show direct comparisons indicating that CUTS outperforms single-CE sensors such as UNC13A-TS and CFTR-TS under identical conditions. This supports independent work from other groups that alternative CE sequences can be engineered into effective sensors, depending on their paradigm and model systems. We have clarified this in the revised Discussion and now note that CUTS is adaptable to alternative CE inserts.

      (3) Could the authors provide evidence of the utility of their biosensor in disease relevant systems that do not rely on TDP-43 KD? For example, does this biosensor report on TDP-43 loss of function in C9orf72 iPSNs in a time-dependent manner? Alternatively, groups have modeled TDP-43 proteinopathy in wildtype iPSNs via MG132 treatment.

      We thank the reviewer for this important suggestion. We agree that demonstrating CUTS responsiveness in disease-relevant models independent of artificial TDP-43 knockdown would further strengthen its translational relevance. In the current study, our primary objective was to establish the sensitivity, dynamic range, and autoregulatory properties of the CUTS circuit under controlled perturbation of TDP-43 levels. siRNA-mediated depletion provides a reliable approach to establish the relationship between graded TDP-43 LOF and the CUTS sensor sensitivity/specificity. That said, CUTS is designed to detect functional TDP-43 loss irrespective of the upstream cause. As the reviewer notes, disease-relevant systems, such as C9orf72 iPSC-derived neurons and proteotoxic stress paradigms (e.g., MG132-induced impairment of TDP-43 nuclear function), are important for future studies. We are currently evaluating CUTS in iPSC-derived neuronal models of TDP-43 proteinopathy, but are optimizing the induction system, promoters, and timing. It should be noted that C9orf72 iPSC neurons do not exhibit TDP-43 LOF using standard differentiation protocols. Regarding pharmacological stress, we have shown that acute sodium arsenite treatment can activate CUTS (Figure 3). In a concurrent study under revision, we show that MG132 similarly causes TDP-43 LOF and CUTS activation (Xie et al., 2025). Notably, none of these induce complete nuclear loss of TDP-43; instead, they show nuclear TDP-43 retention or modest mislocalization. This suggests that TDP-43 LOF may also result from nuclear redistribution and dysfunction under these stress conditions, rather than from complete nuclear loss. We look forward to presenting these ongoing studies in the future.

      References

      Brown A-L, Wilkins OG, Keuss MJ, Kargbo-Hill SE, Zanovello M, Lee WC, Bampton A, Lee FCY, Masino L, Qi YA, Bryce-Smith S, Gatt A, Hallegger M, Fagegaltier D, Phatnani H, NYGC ALS Consortium, Newcombe J, Gustavsson EK, Seddighi S, Reyes JF, Coon SL, Ramos D, Schiavo G, Fisher EMC, Raj T, Secrier M, Lashley T, Ule J, Buratti E, Humphrey J, Ward ME, Fratta P. 2022. TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A. Nature 603:131–137. doi:10.1038/s41586-022-04436-3

      Cohen TJ, Hwang AW, Restrepo CR, Yuan C-X, Trojanowski JQ, Lee VMY. 2015. An acetylation switch controls TDP-43 function and aggregation propensity. Nat Commun 6:5845. doi:10.1038/ncomms6845

      Huang W-P, Ellis BCS, Hodgson RE, Sanchez Avila A, Kumar V, Rayment J, Moll T, Shelkovnikova TA. 2024. Stress-induced TDP-43 nuclear condensation causes splicing loss of function and STMN2 depletion. Cell Rep 43:114421. doi:10.1016/j.celrep.2024.114421

      Keating SS, Bademosi AT, San Gil R, Walker AK. 2023. Aggregation-prone TDP-43 sequesters and drives pathological transitions of free nuclear TDP-43. Cell Mol Life Sci 80:95. doi:10.1007/s00018-023-04739-2

      Mann JR, Gleixner AM, Mauna JC, Gomes E, DeChellis-Marks MR, Needham PG, Copley KE, Hurtle B, Portz B, Pyles NJ, Guo L, Calder CB, Wills ZP, Pandey UB, Kofler JK, Brodsky JL, Thathiah A, Shorter J, Donnelly CJ. 2019. RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43. Neuron 102:321-338.e8. doi:10.1016/j.neuron.2019.01.048

      Wilkins OG, Chien MZYJ, Wlaschin JJ, Barattucci S, Harley P, Mattedi F, Mehta PR, Pisliakova M, Ryadnov E, Keuss MJ, Thompson D, Digby H, Knez L, Simkin RL, Diaz JA, Zanovello M, Brown A-L, Darbey A, Karda R, Fisher EMC, Cunningham TJ, Le Pichon CE, Ule J, Fratta P. 2024. Creation of de novo cryptic splicing for ALS and FTD precision medicine. Science 386:61–69. doi:10.1126/science.adk2539

      Xie L, Zhu Y, Hurtle BT, Wright M, Robinson JL, Mauna JC, Brown EE, Ngo M, Bergmann CA, Xu J, Merjane J, Gleixner AM, Grigorean G, Liu F, Rossoll W, Lee EB, Kiskinis E, Chikina M, Donnelly CJ. 2025. Contextdependent Interactors Regulate TDP-43 Dysfunction in ALS/FTLD. BioRxiv. doi:10.1101/2025.04.07.646890

      Yu H, Lu S, Gasior K, Singh D, Vazquez-Sanchez S, Tapia O, Toprani D, Beccari MS, Yates JR, Da Cruz S, Newby JM, Lafarga M, Gladfelter AS, Villa E, Cleveland DW. 2021. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371. doi:10.1126/science.abb4309.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

      We thank reviewer 1 for careful reading of our revised manuscript, and additional constructive criticisms. Since the two reviewers had divergent opinions towards our revised manuscript, we think that it might be more productive to request a Version of Record at this point, and have our proposed model debated/ tested by others in the field. We will keep the reviewer’s suggestions in mind while design ongoing studies. We would like to address the criticisms collectively below:

      (1) The primary goal of our current manuscript is to build a mechanistic model for non-canonical Wnt signaling through elucidating the functional relationships between Dvl, Vangl, PK and Ror during CE. They each have been studied extensively in prior literature using DMZ injected embryos, and DMZ, Keller and animal cap explants, so there is little doubt that the reduced LWR following their over-expression or knockdown in DMZ is due to disruption of CE. In the context of our study in the current manuscript, we primarily performed their co-injections in different combinations to differentiate synergistic vs. antagonistic relationship, and in the majority cases we relied on epistatsis to draw conclusions (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). Nevertheless, we did follow the reviewer’s suggestion and used animal cap elongation as an additional assay to confirm that Pk and Vangl2 did synergize to disrupt CE, and their synergy could be blocked by Dvl2 co-overexpression; the new data is added to Fig. 1 (Fig. 1h, h’). Therefore, given the prior literature, our new animal cap explant data, and the specific scope of our current study, we feel that the LWR measurement is a reasonable assay to determine CE phenotype in this manuscript. We fully agree with the reviewer that our model will need to be tested at the cellular level through live imaging of DMZ explants; it is indeed the direction of our future study, but is beyond the scope of the current manuscript.

      (2) A salient feature of non-canonical Wnt signaling is that loss or over-expression of any components can often cause identical CE defects at the tissue/ embryo level. We used many co-injection experiments to demonstrate that this is due, at least in part, to a counterbalance between Dvl/Ror and Vangl/PK (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). It is in this context that we planned the imaging and biochemical experiments to determine the possible molecular mechanisms underlying their functional interaction, and we feel that the moderate over-expression used is reasonable in this case for us to build the first integrated model. We do plan to test our model using lower expression in the future. To acknowledge the limitation of our study, we also added the following sentences in the Discussion:

      “We acknowledge, however, that our model explains primarily the potential molecular actions underlying the regulation of CE at the tissue level. Whether and how our model may explain the cellular behavior during CE, such as polarized remodeling of cell junction or extension of cell protrusions, will require further study.”

      (3) The Wnt11 induced reduction of Dvl2-Vangl2 co-IP (Suppl. Fig. 8, 15) may be moderate, but is statistically significant and reproducible, and we have reported similar findings in two other publications (DOI: 10.1093/hmg/ddx095; DOI: 10.1038/s41467-025-57658-0). Given the limitation of co-IP, we had to rely on high level over-expression to make the experiments feasible. We are building proximity based assays such as NanoBRET, and plan to verify the result with lower level expression in the future.

      Reviewer #2 (Public review):

      We thank the reviewer for the encouraging comments, and the suggestion to clarify the description related to Suppl. Fig. 15. We made revision according to the reviewer’s suggestion, and added Suppl. Fig. 16 to further examine the effect of Ror2 knockdown on the steady state interaction between Dvl2 and Vangl2 using imaging approach.

    1. I have been a vegetarian formore than twenty years, which I oncethought exempted me from the violence that accompanies the securing of

      Unfortunately, we are animals. We don't live off the sun's rays and water and simply kill out of competition for non-living resources, we eat other living things. Jains put great effort into not killing living things (don't eat root vegetables for example), but that severely impacts their lives.

      Being vegan I have a couple ways I think about the violence of my life. Mainly, I honestly don't think it has changed MY life much at all to be vegan, yet it has changed the lives of the many animals impacted by eating animal products regularly. * From an energy perspective, eating plants takes less lives simply because the animal I may eat had to eat something as well, and energy is lost as it goes through that cycle of eating. This is unchangeable right now. * The difficulties with being vegan aren't really because of the lifestyle itself, it's because of greater society. Society allows me to live a vegan lifestyle, in that I can easily get the nutrients I need from the grocery store's options (there is an abundance of food). Society also makes it difficult to be vegan because most available dishes and processed foods use animal products unnecessarily, it is simply the dominant way of living that perpetuates itself. I don't view that inconvenience as important to me, because it is simply a structural problem. * The Jain lifestyle at its most extreme kind of consumes one's life. Not being able to take a step without brushing potential bugs out of the way on the ground makes it difficult to merely exist. Perhaps it is the way of living that reduces suffering the most, but at what cost to you? Veganism doesn't require so much change in ways of living, just choices.

    Annotators

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Figure 1D: It would be useful to indicate the number of embryos analyzed for these experiments (n = ?).

      Number of embryos now included in figure legend

      Figure 3B: The control condition for gcl⁻/⁻; ras-RNAi is labeled as "EV". This terminology (presumably "empty vector") is not defined in either the text or the figure legend. In addition, the magenta channel for the Ras-G37 condition appears to be flipped horizontally.

      We replaced with “-“ in figure and figure legend

      Page 7: The text states that "Ras-C40 activates the PI3K pathway," whereas the figure depicts Ras-C40 as activating the RalA pathway. This discrepancy could be confusing for the reader and should be corrected.

      The diagram has been corrected

      Figures 4 and 5: To facilitate interpretation, it may be helpful to include a schematic of the PI3K complex indicating the different subunits used in the study, along with information (potentially color-coded) about whether each construct primarily acts as an activator or inhibitor of PI3K function.

      Figure 4E and Figure 5E were added

      Figure 4A and 4B: For clarity and consistency with the text, the panels (and corresponding plots) for dp110-WT and dp110-CAAX could be placed before those for dp110-D954A and dp110-ΔRBD.

      Order of constructs was rearranged

      Figure 5C: The term "p60-TCEp3," which appears to correspond to the germ plasm-targeted p60-WT construct, is not defined in either the figure legend or the main text.

      Clarification was added to the text (p.11, line 225)

      Page 12: The reference "(Fig. S1A, Movie 1)" should be corrected to "(Fig. S2A, Movie 1)."

      Corrected

      Page 13: There is a missing word in the sentence "the biosensor appeared to be enrich to...", which should be corrected to "enriched."

      Corrected

      Figure 7A: Although the data presented are interesting and ultimately support the authors' conclusion that Torso regulates PIP3 levels, the results are somewhat counter-intuitive and may be confusing for readers. The authors might consider moving this panel to the Supplementary Figures. In addition, it could be informative to include PIP3 measurements for gcl⁻/⁻ (and possibly gcl⁺/⁻) pole buds in Figure 7B, as PIP3 appears particularly enriched in these conditions compared to wild type.

      We agree that at first the findings in the early embryos were confusing, but we prefer including them in the main figure to demonstrate changes in PIP 3 distributions in torso mutants. We are now providing a possible explanation for these findings (p13 line 270-). The differences are quite clear in the older embryos and measurements shown in 7B-D. Pole bud measurements for gcl-/- and gcl+/- are shown in figure 6 E-G.

      Reviewer #2

      Fig. legends to 1C and 1D are swapped.

      Corrected

      Why is csw not necessary for PGC formation? It acts upstream of Ras. This is not discussed.

      We now highlight this point in the text (and refer to studies on the sevenless kinase, which suggested a similar position of Csw parallel or downstream of Ras (page 6 line 107-).

      Fig 3C. Consider changing the order of the ras-variants used: S35, G37, C40 instead of S35, C40, G37.

      We changed the schematic in Figure 3C that should make the order of Ras variants more intuitive.

      Fig 4A, B: Consider changing the order of the panels. Control, dp110-wt, dp110-CAAX, dp110-D954A, dp110-deltaRBD.

      Order of constructs was rearranged

      Fig S4 is mentioned in the text before S2 and S3. Consider changing the suppl. figure order.

      Order of supplementary figures was rearranged

      Page 12: Fig S1 A does not show PIP2 dynamics. Movie 1 is not available to this reviewer. The authors most likely refer to fig. S2.

      Movie 1 was uploaded and figure calls were corrected

      Page 13, 1st para: Why do the authors use glc heterozygous embryos to look at PIP3 and PIP2? Particularly so when they report later in the MS that glc+/- behave differently to wt controls in terms of PIP3 levels (Fig. 7C). By looking at gcl+/+, they might find that now PIP2 levels are different in gcl mutant embryos or that the differences between PIP3 levels in +/+ and -/- are larger than compared with +/-.

      Since gcl+/- embryos form the same number of PGCs as WT but show a statistically significant increase in PI3K activity when comparing membrane to cytoplasm staining intensity, we favor using gcl+/- embryos, as these embryos may represent a more sensitive test for PIP2 and PIP3 levels.

      Pages 15 and 16: revise figure calls in the text.

      Figure calls were revised

      M+M: How were gcl+/- and gcl-/- embryos identified?

      Since all genetic manipulations in this alter the maternal contribution to the embryo, we us the term ‘mutant’ embryos referring to the maternal genotype (indicated on page 3 line 33 and more clearly stated in material and methods and reagent table). Embryos derived from mother of a specific maternal genotype are all identical, thus we can easily distinguish between embryos derived from homozygous mutant mothers (gcl-/-) or heterozygous mutant mothers (gcl-/+) In the reagents table we include the precise genotype description. “CyO” refers to the balancer chromosome commonly used to identify heterozygotes on the second chromosome. Flies with the CyO balancer have curly wings.

      Reviewer #3

      Figure 1B: The authors describe that embryos with OptoSos still form buds which protruded from the cortex, but PGCs largely fail to cellularize (described in pg. 5). I'm not sure what they meant by "fail to cellularize" as this is not obvious to me when looking at the figure. The authors should describe how they know it's cellularized in the controls and not in the OptoSos or change the wording to "suggesting a failure to cellularize".

      We used the word ‘protruded’ to describe our live observations. PGCs were quantified in fixed embryos, immunostained with anti-Vasa antibody to count Vasa positive cells (Fig 1C and D. We observe a lack of Vasa-positive PGCs, only in the light-activated OptoSos condition.

      Fig. 1B, lines 4-5: at what stage are these embryos? Cycle 9? Cycle 14? Both?

      Nuclear cycles of embryos for each panel are noted on the left side of each panel

      Fig. 4A: add dp110-CAAX results to Results section

      dp110-CAAX results are included in the Results section (p.9. line 177)

      Figure 5C: The hyper-clustered phenotype they describe is hard to visualize in this figure (described in pg. 11). The authors should describe what is meant by "hyper-clustered".

      We agree and re-worded the description of this observation to be clearer, page 11, line 226-.

      Figure 7: When comparing Fig. 7A and 7B torsoHH/WK images, we can see that in Fig. 7A that PIP3 pattern changes such that PIP3 is now at the most posterior end where PGC will eventually form (compared to control that has low PIP3 in this region), but then in Fig. 7B they are looking at the buds and they say PIP3 levels decrease, which does not correspond to Fig. 7A. Are these simply different stages and PIP3 levels change over time (looking at Fig. 7C, PIP3 does not seem to change a lot over time)?

      The figure legend now states more clearly that embryos were of different ages. We also explain in the text the apparent discrepancy in the patterns before and during budding (page13 line 266). The time points in figure 7C span nuclear cycle 10, not earlier (page14 line 274). By measuring membrane to cytoplasmic distribution, a more accurate comparison is possible at this stage.

      p. 5, line 5: "Optosos" is written "OptoSos" elsewhere (suggest using OptoSos throughout)

      Corrected

      Is it possible that inhibition of myosin II recruitment is due to conversion of PIP2 -> PIP3, thus loss of PIP2, or is it that myosin is specifically recruited to regions where PIP2 is high? This seems like a point that should be added to the discussion.

      This point is now discussed on page 20, line 403

      p. 5, line 6: suggest adding a comma after "Ras" for clarity

      Corrected

      p. 5, last line: the genotype is "w^1118" (with ^ indicating a superscript), not "w^-1118", and is italicized (this should be corrected throughout)

      Corrected

      p. 6, line 2: replace "cellularizing" with "cellularization"

      Corrected

      p. 6, lines 11-13: Where is it shown that knockdown of csw, dsor1 and rolled did not restore PGC formation? The data are not present in Fig. 2C (could include in supp fig?)

      We added these data as Supplementary figure 1

      p. 7, line 1: replace "interfere" with "interferes"

      Corrected

      p. 7, last three lines: what is stated here, "Ras-G37 [activates] both the RalA and the PI3K pathways, and Ras-C40 activates the PI3K pathway" is not consistent with what is diagrammed in Fig. 3C, where Ras-C40 is indicated as activating RalA (please correct either the text or the diagram)

      We apologize and corrected the figure

      p. 11, lines 1-2: the Pi3K21B gene and transcript should be italicized (note that Pi3K21B is the official gene name on FlyBase)

      Gene name was italicized

      p. 11, lines 6-10: it might be helpful to explain how the p60 construct was overexpressed (current lines 9-10) before describing the results (current lines 7-8)

      Clarification on p60 construct was added to p.11, line 215-

      p. 12, paragraph 2, line 2: the PIP2 biosensor should be written as "PLCgamma[PH]:mCherry" throughout, not "PLCy[PH]:mCherry"; this should be changed in the figures as well as the text (Symbol font can be used to turn "g" into lower-case "gamma", both in Word and in Illustrator)

      Gamma symbol was added

      It would also be helpful to show the overlap of the PIP2 and PIP3 signals in control vs. gcl mutants at different stages so the relative distribution and intensity of the signals can be better appreciated (consider adding this as a supplementary figure).

      Our data show that PIP2 is not affected by lack of GCL (Fig 6 B-D). We thus do not think that simultaneous imaging of PIP2 and PIP3 in gcl-/- would add to our conclusions. Furthermore, these experiments would require a significant time investment to generate the respective genotypes. Thus, we agree with the reviewer that this is experiment is beyond the scope of the paper.

      p. 12, paragraph 2, line 3: it does not appear that the two PIP markers were used "simultaneously" in Fig. 6A; however, this is evident from Fig. S2 and Movie 1 (consider placing callouts to these earlier in the paragraph or moving the description of simultaneous expression and observation of the two markers later in the paragraph to avoid confusion)

      We did simultaneously image PIP2 and PIP3 sensors and have added this as Movie 1 and also in supplementary Figure S4, which are now clearly referred to in the text.

      p. 12, paragraph 2, line 7: replace "Fig. S1A" with "Fig. S2" (this was confusing)

      Figure call was updated

      p. 16: change "Fig. 7G-I" to "Fig. 8G-I"

      Figure call was updated

      p. 20, Deming reference: there appears to be a stray asterisk in the title

      Asterisk was removed from reference

      Fig. 1D: need to explain that the colors in the graph indicate the numbers of PGCs formed (this could also be added as a label across the top of the graph); in addition, the number of embryos examined for each genotype should be included in the legend

      We added a label at the top of the graph and ‘n’ were added to figure legend

      Fig. 2B: spell out where csw, dsor1 and rolled data are shown; also, "n" is not defined; was this the number of embryos per genotype?

      We added these data as Supplemental Figure 1

      Fig. 3B: "EV" should be defined in the legend; is this "empty vector"?

      We are using a “-“ to mark controls without transgene

      Fig. 3C: see previous comment re: mistake in the diagram; I believe Ras-C40 was described as activating PI3K, not RalA

      We apologize and corrected the figure

      Fig. 4B, line 2: was the graph plotted from the data in panel (C) or panel (A)? panel (A) seems more likely, because the data in C is plotted in D; please correct the panel callout

      Figure legend was updated to refer to the correct panel

      Fig. 5C: describe "p60-TCEp3" in the legend

      We added germplasm-targeting 3’UTR (TCEp3) to legend and the construct and reference are provided in Material and Methods section

      Figure 6: In Fig. 6E-G, the "brightness" of PIP3 at the membrane corresponds to the images even with different views (posterior and orthogonal) and agrees with the graph.

      However, when looking at Fig. 6B, it looks to me that PIP2 is brighter in gcl+/-, but the opposite is true when looking at Fig. 6D (i.e., PIP2 looks brighter in gcl-/-). The authors might want to comment on this.

      We have updated the figure to better reflect our observations.

      Fig. 6A: define "(fire)" here or in the first figure legend where this is used

      We added an inset for the fire lookup table to clearly define the pseudcolor scheme used in the image

      Figure 8 title: "Actin fluorescence is increased in gcl-/- pole buds",But their graph in Fig. 8B comparing actin in gcl+/- to -/- is not significant

      Thanks for catching our mistake, myosin not actin is changed

      Fig. 8I: replace "Scarlett" with "Scarlet"

      Corrected

      Fig. 8D-F: Although the plots in panel E agree with the images in panel D, it is unclear why those in panel F are not more concordant. In F, myosin appears enriched at the cortex relative to the cytoplasm in gcl-/- mutants, which is hard to reconcile with the data in D-E.

      We have updated the figure to better reflect our observations.

      Fig. S2A: define the three time points shown here, and clarify that these are shown left to right (if this is indeed the case)

      We removed S2A and updated the movie to replace it

      Fig. S4: change "P60" to "p60" in the figure title

      Corrected

      Movie: The movies showing PIP2 and PIP3 in whole embryos are nice, but it would also be helpful to also include merged images of the two channels, so the reader can examine the relative accumulation of the two PIPs over time.

      Merged images panel was added to the movie.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Although Torso is known to antagonize primordial germ cell (PGC) formation, the underlying mechanisms remain unclear. Canonical Torso signalling typically results in activation of Ras. However, the authors show that Ras-mediated suppression of PGC formation is independent of the Raf/MEK/ERK pathway. Instead, they uncover an unexpected role for Torso in activating phosphoinositide 3-kinase (PI3K) that promotes formation of PIP3 enriched posterior membrane domains. The resulting increase in PI3K activity disrupts PGC formation. Furthermore, they show that by promoting Torso degradation, the ubiquitin ligase adaptor Germ Cell-Less (GCL) primes the posterior membrane with reduced PIP3 to facilitate PGC formation. Lastly, the authors suggest a model where antagonistic relationship between GCL and Torso influences actomyosin contractility that may allow the bud to constrict for proper PGC formation.

      Major comments:

      Figure 1B: The authors describe that embryos with OptoSos still form buds which protruded from the cortex, but PGCs largely fail to cellularize (described in pg. 5). I'm not sure what they meant by "fail to cellularize" as this is not obvious to me when looking at the figure. The authors should describe how they know it's cellularized in the controls and not in the OptoSos or change the wording to "suggesting a failure to cellularize".

      Figure 5C: The hyper-clustered phenotype they describe is hard to visualize in this figure (described in pg. 11). The authors should describe what is meant by "hyper-clustered".

      Figure 6: In Fig. 6E-G, the "brightness" of PIP3 at the membrane corresponds to the images even with different views (posterior and orthogonal) and agrees with the graph. However, when looking at Fig. 6B, it looks to me that PIP2 is brighter in gcl+/-, but the opposite is true when looking at Fig. 6D (i.e., PIP2 looks brighter in gcl-/-). The authors might want to comment on this.

      It would also be helpful to show the overlap of the PIP2 and PIP3 signals in control vs. gcl mutants at different stages so the relative distribution and intensity of the signals can be better appreciated (consider adding this as a supplementary figure).

      Figure 7: When comparing Fig. 7A and 7B torsoHH/WK images, we can see that in Fig. 7A that PIP3 pattern changes such that PIP3 is now at the most posterior end where PGC will eventually form (compared to control that has low PIP3 in this region), but then in Fig. 7B they are looking at the buds and they say PIP3 levels decrease, which does not correspond to Fig. 7A. Are these simply different stages and PIP3 levels change over time (looking at Fig. 7C, PIP3 does not seem to change a lot over time)?

      Page 15, last paragraph: "If myosin II recruitment is inhibited when PIP3 levels are high" Is it possible that inhibition of myosin II recruitment is due to conversion of PIP2 -> PIP3, thus loss of PIP2, or is it that myosin is specifically recruited to regions where PIP2 is high? This seems like a point that should be added to the discussion.

      Overall, I think their claim that antagonistic activities of GCL and Torso is crucial for PGC formation is well justified. The combination of optogenetic tools with activation and lof mutants is nicely done. Some clarification regarding the PIP3 and PIP2 levels will be helpful to the reader (see my comments above). The myosin claim is less convincing (see my comment on Fig. 8D-F below).

      Minor comments on the text:

      p. 5, line 5: "Optosos" is written "OptoSos" elsewhere (suggest using OptoSos throughout) p. 5, line 6: suggest adding a comma after "Ras" for clarity p. 5, last line: the genotype is "w^1118" (with ^ indicating a superscript), not "w^-1118", and is italicized (this should be corrected throughout) p. 6, line 2: replace "cellularizing" with "cellularization" p. 6, lines 11-13: Where is it shown that knockdown of csw, dsor1 and rolled did not restore PGC formation? The data are not present in Fig. 2C (could include in supp fig?) p. 7, line 1: replace "interfere" with "interferes" p. 7, last three lines: what is stated here, "Ras-G37 [activates] both the RalA and the PI3K pathways, and Ras-C40 activates the PI3K pathway" is not consistent with what is diagrammed in Fig. 3C, where Ras-C40 is indicated as activating RalA (please correct either the text or the diagram) p. 11, lines 1-2: the Pi3K21B gene and transcript should be italicized (note that Pi3K21B is the official gene name on FlyBase) p. 11, lines 6-10: it might be helpful to explain how the p60 construct was overexpressed (current lines 9-10) before describing the results (current lines 7-8) p. 12, paragraph 2, line 2: the PIP2 biosensor should be written as "PLCgamma[PH]:mCherry" throughout, not "PLCy[PH]:mCherry"; this should be changed in the figures as well as the text (Symbol font can be used to turn "g" into lower-case "gamma", both in Word and in Illustrator) p. 12, paragraph 2, line 3: it does not appear that the two PIP markers were used "simultaneously" in Fig. 6A; however, this is evident from Fig. S2 and Movie 1 (consider placing callouts to these earlier in the paragraph or moving the description of simultaneous expression and observation of the two markers later in the paragraph to avoid confusion) p. 12, paragraph 2, line 7: replace "Fig. S1A" with "Fig. S2" (this was confusing) p. 16: change "Fig. 7G-I" to "Fig. 8G-I" p. 20, Deming reference: there appears to be a stray asterisk in the title

      Minor comments on the figures and figure legends:

      Fig. 1B, lines 4-5: at what stage are these embryos? Cycle 9? Cycle 14? Both? Fig. 1C: see previous comment about "w^1118" genotype nomenclature Fig. 1D: need to explain that the colors in the graph indicate the numbers of PGCs formed (this could also be added as a label across the top of the graph); in addition, the number of embryos examined for each genotype should be included in the legend Fig. 2B: spell out where csw, dsor1 and rolled data are shown; also, "n" is not defined; was this the number of embryos per genotype? Fig. 3B: "EV" should be defined in the legend; is this "empty vector"? Fig. 3C: see previous comment re: mistake in the diagram; I believe Ras-C40 was described as activating PI3K, not RalA Fig. 3E: fix "w^1118" as described above Fig. 4A: add dp110-CAAX results to Results section Fig. 4B, line 2: was the graph plotted from the data in panel (C) or panel (A)? panel (A) seems more likely, because the data in C is plotted in D; please correct the panel callout Fig. 5C: describe "p60-TCEp3" in the legend Fig. 6A: define "(fire)" here or in the first figure legend where this is used Figure 8 title: "Actin fluorescence is increased in gcl-/- pole buds",But their graph in Fig. 8B comparing actin in gcl+/- to -/- is not significant Fig. 8D-F: Although the plots in panel E agree with the images in panel D, it is unclear why those in panel F are not more concordant. In F, myosin appears enriched at the cortex relative to the cytoplasm in gcl-/- mutants, which is hard to reconcile with the data in D-E. Fig. 8I: replace "Scarlett" with "Scarlet" Fig. S2A: define the three time points shown here, and clarify that these are shown left to right (if this is indeed the case) Fig. S4: change "P60" to "p60" in the figure title

      Movie: The movies showing PIP2 and PIP3 in whole embryos are nice, but it would also be helpful to also include merged images of the two channels, so the reader can examine the relative accumulation of the two PIPs over time.

      Referees cross-commenting

      I agree enthusiastically with the comments of the other reviewers, who often came to the same conclusion I did about the manuscript and the data, including some of the detailed points about the figures, etc.

      Significance

      General assessment:

      The many strengths of this manuscript include elegant genetic and optogenetic approaches using well-designed transgenes.

      The main weakness is the lack of experiments showing simultaneous live imaging of the PIP2 and PIP3 sensors in gcl-/- and other genetic backgrounds, which would help the reader better envision how regulators of this pathway affect phospholipid distribution at the level of whole embryos and prospective pole cells. Note that because of the time required, I do not insist that they do this.

      Advance:

      Study demonstrates for the first time an unexpected role of Torso in PI3K regulation

      Audience:

      germ cell afficionados, developmental biologists, cell biologists, PI3K researchers

      My field of expertise:

      Drosophila, germ cell development, genetics, cell biology, live imaging, phosphoinositides

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as it is preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX⁺ stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP⁺ cells per mouse is shown in Figure 1F, which reports TRAP⁺ cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      As mentioned, we previously reported no correlation between task success and TRAP+ density. We have now performed additional analyses examining correlations with learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB), and found no significant relationships. Therefore, as we did not find any positive correlations the original interpretation that DG activity primarily reflects task engagement rather than performance level seems the most parsimonious.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      We agree that reducing spatial separation between stimuli likely engages multiple behavioral and cognitive processes beyond a single, strictly defined operation. We have now clarified this point in the manuscript and explicitly state that our use of the term “cognitive demand” reflects a multidimensional behavioral challenge rather than a singular cognitive process (see Discussion).

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      While the primary goal of this study was to examine activity and functional recruitment of adult-born granule cells, we also quantified the survival of birth-dated neurons at the end of behavioral training. Density measurements of BrdU⁺ and EdU⁺ cells revealed no differences across experimental groups, indicating that engagement in the pattern separation task, across low to high cognitive demand conditions, did not significantly alter survival of adult-born neurons. In addition, we examined the spatial distribution of BrdU⁺ and EdU⁺ neurons between the suprapyramidal and infrapyramidal blades of the dentate gyrus. The proportion of newborn neurons was consistent across all groups, with approximately 60% located in the suprapyramidal blade and 40% in the infrapyramidal blade. These findings indicate that behavioral training did not alter the baseline distribution of adult-born neurons. We have now clarified these points in the manuscript (See Results).

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      We agree and have made changes to the manuscript to discuss these points (see Discussion and Limitations).

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      We agree that differences in learning stage could in principle confound the interpretation of DG activation patterns. However, although Cre+ DCZ-treated mice exhibited delayed learning, they ultimately reached the same performance criterion as control animals. Thus, adult-born DGC inhibition did not prevent learning but increased the time required to reach criterion, indicating that these neurons are beneficial for learning efficiency rather than strictly necessary for task acquisition. Importantly, all animals were sacrificed only after reaching the predefined success criterion. Therefore, the immunohistochemical analyses were performed at the same behavioral endpoint for Cre+ DCZ and control groups, even though the number of training days differed. Consequently, the observed differences in DG activation reflect circuit recruitment at equivalent task mastery rather than differences in learning stage.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

      We agree that our findings highlight that successful performance is not determined solely by the overall level of dentate gyrus activity, but rather by the composition and spatial organization of the active neuronal ensemble. In our study, inhibition of abDGCs increased overall mGC activity while disrupting the spatially organized, blade-biased activation pattern and impaired performance. In contrast, direct inhibition of mGCs reduced global excitability but preserved the relative spatial organization of active neurons in animals that continued to perform the task. These findings suggest that different perturbations alter task performance by shifting the identity and coordination of the active neuronal ensemble, rather than simply increasing or decreasing total activity levels. We have now expanded the Discussion to more explicitly address how dentate gyrus computations may depend on the structured recruitment of granule cell ensembles and how distinct manipulations differentially disrupt this organization.

      Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

      Please refer to the updated statistical analyses in response to the recommendations below.

      Recommendations for the authors:

      Reviewing Editor Comments

      Please note that reviewers agreed that appropriate revisions are needed to increase the strength of evidence for the paper's claims. Concerns were raised about a lack of statistical rigor in the statistical analyses used. Results of statistical tests were not consistently provided (i.e., statistic applied, value of statistic, degrees of freedom, p-value), and seemingly inappropriate statistical tests were used in some instances. Also, some comparisons had lower statistical power than others. When clarifying the statistical approaches used in the manuscript, we also encourage you to consider reading this article that outlines common statistical mistakes (Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 Oct 9;8:e48175. doi: 10.7554/eLife.48175.), such as the importance of not basing conclusions on a significant p-value for one pair-wise comparison vs a non-significant p-value for another pairwise comparison (i.e., groups that are being compared should be included in the same statistical analysis, and interaction effects should be reported when appropriate). We hope that you find this information to be helpful should you decide to submit a revised manuscript to eLife.

      Reviewer #1 (Recommendations for the authors):

      (1) Standardize TRAP+ quantification across Figure 1.

      Please report TRAP+ cell numbers using consistent metrics (e.g., density or percentage) to enable comparison across cell types. In addition, extend the TRAP+ reactivation analysis in Figure 1H to include abDGCs so that reactivation dynamics can be compared directly between mGCs and abDGCs.

      Reply in Public Review

      (2) Clarify whether dorsal or ventral DG was analyzed in Figure 2.

      The differing anatomical distributions of TRAP+ cells under low- and high-demand conditions raise important questions about DG axis specificity. Please indicate whether analyses were performed in dorsal DG, ventral DG, or both, and provide data or justification accordingly.

      Reply in Public Review

      (3) Acknowledge limitations of the tamoxifen-chow labeling strategy in AsclCreER; hM4 experiments.

      Since tamoxifen chow administered over 4-7 weeks labels a heterogeneous abDGC population spanning a broad age range, this approach does not generate birth-dated cohorts. This limitation should be clearly addressed in the text and interpretations, particularly related to cell age-dependent effects, should be tempered.

      Reply in Public Review

      (4) Revise DREADD quantification using HA rather than mCitrine.

      The hM4 mouse line requires HA immunostaining to accurately identify Ascl-lineage cells expressing the DREADD receptor. Because mCitrine is not specific to adult-born neurons and does not reliably reflect hM4 expression, quantification based on mCitrine should be revised.

      Reply in Public Review

      (5) Include markers to assess abDGC maturation state.

      Adding quantification of DCX and NeuN would help define the developmental stage of abDGCs in key experiments and improve the interpretation of cell-age-dependent effects.

      Reply in Public Review

      (6) Clarify DG layer boundaries and terminology in Figure 2.

      If the metric labeled "Distance from the hilus" corresponds to the subgranular zone (SGZ), using SGZ terminology would prevent confusion. Additionally, please provide clearer delineation of DG and hilus borders in sample images.

      Reply in Public Review

      (7) Provide missing cell number data for Figures 2B and 2C.

      Reply in Public Review

      (8) Clarify the tamoxifen administration protocol in Figure 6.

      Please describe how the protocol selectively targets 6-7-week-old abDGCs and how it differs from the chow-based approach. This will help readers understand the intended specificity of the manipulation.

      Reply in Public Review

      Reviewer #2 (Recommendations for the authors):

      (1) EdU birth-dating timeline

      The manuscript would benefit from a clearer description of the EdU birth-dating timeline, ideally with a schematic similar to that provided for BrdU in Supplementary Figure 1.

      We appreciate the suggestion. However, we did not include a separate schematic for EdU because its use and birth-dating logic are identical to BrdU (both are thymidine analogs administered systemically and incorporated during S-phase). Therefore, the timeline shown in Supplementary Figure 1 applies equally to both markers. We have clarified this point in the Methods section to avoid confusion.

      (2) Clarity of TUNL task description.

      The description of the TUNL task, particularly for readers unfamiliar with touchscreen-based paradigms, is difficult to follow without consulting prior literature. A simplified schematic or a clearer step-by-step explanation in the main text or supplementary material would improve accessibility.

      We note that the main steps of the TUNL protocol are illustrated in Figure 1A, Supplementary Figure 2A and 2B. Nevertheless, we agree that the description in the text can be made clearer for readers less familiar with touchscreen-based tasks. Thus , we have now revised the Methods section to provide a clearer step-by-step description of the TUNL.

      (3) Influence of outliers in Figure 1G.

      In Figure 1G, the reported trend that ~1% of 25-39-day-old abDGCs are TRAP+ during LS trials appears to be driven by a small number of outliers. This should be acknowledged, and the wording of the conclusion moderated to reflect the variability in the data.

      We agree with the reviewer that the apparent outliers reflect the inherent sparsity of TRAP labeling in this population. In absolute terms, this corresponds to between 0 and 2 TRAP⁺ 25–39-day-old abDGCs per mouse, such that the presence or absence of a small number of labeled cells can appear as outliers when expressed as a percentage. We have revised the text to acknowledge this (see Results).

      (4) Presentation of learning curves.

      Rather than focusing primarily on "days before criterion" (DBC), it would be helpful to show full learning curves across the entire training period. This would provide a clearer picture of acquisition dynamics and inter-animal variability.

      We agree that learning curves can be informative in many behavioral paradigms. However, in our protocol, mice do not undergo the same number of training days because training stops individually once each animal reaches criterion. As a result, plotting full learning curves would produce trajectories of different lengths, making group comparisons difficult and visually cluttered. For this reason, we aligned animals based on days before criterion (DBC), which allows direct comparison of learning dynamics relative to task acquisition. We also consider the cumulative probability representation to be the most appropriate way to summarize learning progression across animals in this context which are also included in the figures.

      (5) Clarification of Figure 3B labeling

      In Figure 3B, the identity of the orange-labeled group above the LS condition is unclear. Clarification in the figure legend would improve interoperability.

      Figure 3B includes two experimental groups. One group performed both the large- and small-separation conditions; this group is shown in orange and labeled LS. Within this group, the upper orange trace corresponds to performance in the large-separation condition, while the lower orange trace corresponds to performance in the small-separation condition. The second group is a control group that performed only the large-separation configuration, and therefore only a single green trace is shown. We agree that this distinction was not sufficiently clear and have revised the figure legend and text to clarify the identity of each trace.

      Reviewer #3 (Recommendations for the authors):

      (1) Please label figures and, even better, put the legends on the same page.

      (2) Just to confirm, in establishing the task, mice performed above 70% for the small separation trials in one of the sessions on 2 consecutive days, for each criterion? Performance seems to be below 70%.

      Yes. To meet the criterion, each mouse had to reach ≥70% correct performance in at least one of the two daily sessions on two consecutive days. We then averaged the performance across both sessions for each of those days. As a result, if one session was ≥70% but the other was lower, the daily average could fall below 70%. The values shown in the figure correspond to these daily averages, further averaged across mice.

      (3) mGC needs to be explicitly defined. Am I assuming any non-birthdated GC is an mGC according to the authors? (which means it is unknown whether they are in fact mature, though likely most of them are).

      In this study, “mature granule cells” (mGCs) refer operationally to granule cells that are not birth-dated with BrdU or EdU and therefore are not classified as adult-born neurons within the defined labeling window. We agree that this population is not directly age-defined, and that while the majority are expected to be mature based on their birth timing relative to the labeling period, we cannot exclude the possibility that a small fraction may include younger, unlabeled neurons. We have now explicitly defined this usage of mGCs in the Methods and clarified this point in the text to avoid ambiguity.

      (4) Methods state that Kruskal-Wallis tests were used when more than 3 groups were compared, but I don't see these stats presented (e.g., for trap data in Figure 1, blade x task TRAP expt in Figure 3 (should be 2-way RM anova here and elsewhere), etc) or any corrections for multiple comparisons. I appreciate that the mean rates of TRAPed abGCs are higher in the S and LS groups than in the shaping group, but most mice do not have any BrdU+ cells that are also TRAPed, and there are no statistics here to support the claim. I don't think there is enough sampling to accurately quantify activation of abGCs. Also, no stats to support the claim that TRAPing increases at the "tip of the SB after the more demanding LS task".

      We agree with this comment. We have now systematically tested all datasets for normality (by group) and applied parametric tests when the data met normality assumptions, and non-parametric tests otherwise. The statistical analyses have been revised accordingly. We added the appropriate tests (including two-way ANOVA where relevant, such as for blade × group comparisons) and now report full statistics in the figure legends and results sections. For the TRAP analyses in adult-born DGCs, we explicitly acknowledge the very low number of BrdU⁺/TRAP⁺ cells, which limits statistical power and, in some cases, precludes robust statistical testing. These limitations are now clearly stated in the Results and Discussion, and the corresponding interpretations have been tempered. For all Kruskal–Wallis tests, post hoc pairwise comparisons were performed using Dunn’s test, with Bonferroni correction for multiple comparisons, as now specified in the Methods section. We also expanded the Methods to describe the statistical workflow in detail. In addition, we have added the previously missing statistical analysis for Figure 2C. Comparisons were performed between the 0–50% and 50–100% portions of the blade, where 0% corresponds to the apex and 100% corresponds to the distal tip of the blade.

      (5) Figure 3I: I can't figure out which effect is statistically significant here (what does the asterisk signify?). Why no individual data points in this graph?

      We agree that the absence of individual data points reduced interpretability, and we have now updated the figure to include individual data points to better illustrate data distribution and variability.

      (6) The gradient of activity (shap < S < LS) could be due to how long they've been trained on a given stage (e.g. less activity during shaping because they have habituated, and neurons encoding that task phase have already been selected)

      We agree that task duration and habituation could, in principle, influence activity levels. Under this interpretation, higher activity would primarily reflect task novelty rather than cognitive demand. However, our data do not support this explanation. Specifically, we found no correlation between the number of training days required to reach criterion and c-Fos–positive or TRAP-positive cell density within a given stage. Thus, animals that reached criterion rapidly did not show higher activity levels than animals that required more days of training and were presumably more habituated to the task demands. This suggests that the observed activity gradient (shaping < S < LS) is not driven by exposure duration or habituation, but rather reflects differences in cognitive demand across task stages.

      (7) The TRAP+ EDU+ cell in Figure 3 looks odd because the BrdU signal is (a lot) larger than the TRAP signal, but BrdU is in the nucleus and should be smaller.

      We agree that the example in Figure 3 is not optimal. In dividing cells, BrdU/EdU signals can sometimes appear broader or closely apposed, which may affect their apparent size.

      (8) For the Ascl-HM4Di experiment, HM4Di appears to be expressed in all of the areas of the granule cell layer where abGCs are NOT located (i.e. no expression in the deep cell layer, near the sgz). This is problematic because it suggests perhaps abGCs are not inhibited as expected.

      As noted in our response to Reviewer #1, we did not use the mCitrine to localize the DREADD receptor as it has been demonstrated that mCitrine expression is expressed in a Cre-independent manner and not correlated with hM4Di expression. In the revised manuscript we include a representative image were we performed immunostaining using an HA antibody to directly visualize hM4Di and confirm its expression in adult-born granule cells (Figure 5).

      (9) Line 267: "6-7 week old neurons by themselves do not influence either the performance of mice in the task". I don't think this is fair because this experiment wasn't designed with as much power to detect an effect. The group trends are in the same direction, but there are many fewer mice in this experiment (n=6/group) than in the =<7w experiment (n=11/group), where the effect just reached statistical significance.

      We are sorry for this confusion which came from an incorrect version. The experiment shown in Figure 6 does not target 6–7-week-old neurons specifically. It uses the same tamoxifen chow–based protocol as Figure 5, but with a shorter exposure (4 weeks vs. 7 weeks), thereby labeling a younger and more restricted cohort of adult-born DGCs. By contrast, Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells (Dock10+).

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be welldesigned. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

      We thank the reviewer for the suggestion and have now incorporated a schematic in the supplementary figure S9B, explaining our methodology for achieving tissue-specific knockdowns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, bloodfed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      We agree with the reviewer and thank them for pointing it out. We have now revised the figure legend and the text to reflect these results (see lines 351-354).

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

      We are truly thankful to this reviewer for insisting on this point. It has made us revisit what we thought we understood and now realise were doing wrong (though many in literature do it this way!). We were – incorrectly – setting each control to 1 and calculating relative fold changes for each replicate independently. While this is often seen in literature, we now realise that it is incorrect. We have revisited all our analyses and normalized all samples to the mean ΔCt of the control group, which captures biological variation in both control and experimental groups. All data are now re-plotted to show individual data points for both control and experimental groups, and the error bars on controls represent the biological variation across replicates (Figure 4D, 4F, 4G, S8, S9). Statistical analyses were also revised accordingly, and, importantly, they do not change any conclusions. Please note that the abdominal expression of sNPF and RYa are so low that the controls show very variable baseline expression values.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (2) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal hostseeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of offtarget effects, are not adequately discussed.

      These comments were addressed in the previous round.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Awesome paper everyone. A delight to read and review.

      Thank you very much! We appreciated your comments too!

    1. Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher retinotopic areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study of Molz et al. but I believe, given anatomical variability, the larger n in this study, and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work.

      *Effects of eye-movements

      The authors have carried out the eye-movement analyses I asked of them. Unfortunately, in 4 individuals they couldn't calibrate the eyetracker (it's impressive they managed in 10). I think this means that 4 of 13 (since a different participant was excluded from head motion) individuals weren't included in correlation analyses. Limiting the correlation analysis to individuals with better fixation has obvious issues. I'd recommend redoing (or additionally including) stats using non-parametric measures while classifying these 4 as having fixation instability of 3 (i.e. greater instability than the participant with the worst fixation who was successfully calibrated).

      *Interpreting pRFs

      The paper would be strengthened by a little more explicit clarity about what pRFs represent and how that affects their interpretation of their findings as plasticity vs. non-plasticity (I know the authors are aware of this, but I think it would be helpful for readers who are less experienced in pRFs). In the introduction it would be helpful to point out that pRFs represent the collective response of a large population of neurons, and as a result pRF estimates can vary depending on which population of neurons that stimulus drives.

      For example, imagine for the sake of argument that rods only project to V1 neurons with larger receptive fields. If one measured pRFs in a control observer under phototopic vs. scotopic conditions one would see smaller pRFs in the photopic conditions. This wouldn't represent 'plasticity' - it would represent the fact that the firing neurons contributing to the pRF signal are a slightly different population because of a change in the stimulus content. This is of course exactly what you see in 2C. And indeed, the authors make this identical point ". In the non-selective condition, the smaller pRFs in controls are in line with the higher spatial resolution of the<br /> cone system, which is not active in the achromat group." But this point would be clearer if more of the conceptual underpinnings were made explicit in the introduction (or at this point in the paper).

      Shifts in which population of neurons drive your pRFs can explain main of the more puzzling results in the paper without detracting from your main conclusions. For example, in 2D, I don't think it's differences in S/N driving your results (pRFs are at least theoretically meant to be robust to S/N). If smaller RFs 'drop out' under low luminance and these smaller RFs also tend to be more central, then one would expect the control results of 1D. And I think a similar argument might even be made for the smaller difference in the rod monochromats.

      It would be possible to make the point of Figure 4B more simply if Figure 4B was replaced by additional Panels in Figure 2 simply showing V3 pRF sizes/eccentricity distributions. That would make the point that you don't see the same expansion in pRF sizes in V3 in a way that is just as clear, and is closer to the data.

      *Interpreting cRFs

      Similarly, I think the paper would be improved with more clarity about the underlying signal in CF modeling. Once again, I appreciate that the authors are familiar with this, but it will help the reader in interpretation. (And I do believe thinking carefully about this may alter your interpretations). CF receptive fields 'find' the region in V1 that best predict the V3 signal in a given voxel. In resting state this likely represents a combination of:

      (1) visually driven signal - correlations that may or may not reflect connectivity but represent the fact that regions that represent the same region of visual space will be active at the same time.

      (2) global bilaterally symmetrical signal consisting of enhanced correlations between iso-eccentric regions (Raemaekers et al., 2014), which may arise from vasculature that symmetrically stems from the posterior cerebral artery (Tong et al., 2013; Tong and Frederick, 2014).

      (3) intrinsic neural fluctuations that are more strongly correlated between connected neurons. These are likely quite weak compared to the other contributions.

      I think if you ignore 2, (which is not likely to differ between rod mono and controls) and model 1 and 3, you might well see shifts in CFs towards the boundary of the scotoma - essentially the CF's location will be biased towards the region of V1 that has stronger correlations - which = the region which has a visual signal.

      I do find convincing the argument that you don't see the same shift in controls in the rod-selective condition. So I think the results of 4A are fine. But a little more clarity about 'what's under the hood' in CF modeling would be nice.

      *Interpreting the relationship between pRFs and cRFs

      So there's something here that confuses me. We are all agreed that V3 pRF sizes are similar across RM and control. V1 pRFs are larger in RM. It feels intuitive that smaller CFs would compensate but I can't make it make sense to myself when I think it through. Each pRF represents a combination of receptive field location scatter and bandwidth. You want to argue that eccentricity mapping looks pretty normal, so there's no reason to think increased rf scatter, and I can believe that (though I do think this assumption should be discussed explictly).

      So far I think we agree.

      But let's think about what drives a CF during visual stimulation ... Specifically lets think about 'the pRF of the CF' (the region of visual space represented by the cluster of voxels in the CF). If pRFs for individual voxels in V1 are big, then the pRF for the CF is also going to be large. But we know that pRFs for V3 are normal size. So, the V3 CF will 'find' a smaller number of voxels in V1, in order to try to find the 'correct sized' CF pRF. Note that this explanation is very similar to yours. But doesn't require ANY 'intrinsic' connectivity. It's really just assuming the whole thing is driven by the visual signal and the CF size is determined by the ratio of the pRF sizes in V3 vs. V1.

      One possible solution would be to regress out the visual stimulus and redo this analysis based on the residuals.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates the work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher visual areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study by Molz et al. but I believe, given anatomical variability (and the very large n in this study) and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      Strengths:

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work. I have a number of methodological comments but I hope they will be considered as constructive engagement - this work is highly technical with a large number of factors to consider.

      Weaknesses:

      (1) Effects of eye-movements

      I have some concerns with how the effects of eye-movements are being examined. There are two main reasons the authors give for excluding eye-movements as a factor in their results. Both explanations have limitations.

      (a) The first is that R2 values are similar across groups in the foveal confluence. This is fine as far as it goes, but R2 values are going to be low in that region. So this shows that eyemovements don't affect coverage (the number of voxels that generate a reliable pRF), but doesn't show that eye-movements aren't impacting their other measures.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space. 

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures were then averaged across the two run repeats.”

      We report the resulting new fixation data results as follows:

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      (b) The authors don't see a clear relationship between coverage and fixation stability. This seems to rest on a few ad hoc examples. (What happens if one plots mean fixation deviation vs. coverage (and sets the individuals who could not be calibrated as the highest value of calibrated fixation deviation. Does a relationship then emerge?).

      In any case, I wouldn't expect coverage to be particularly susceptible to eye-movements. If a voxel in the cortex entirely projects to the scotoma then it should be robustly silent. The effects of eye-movements will be to distort the size and eccentricity estimates of voxels that are not entirely silent.

      There are many places in the paper where eye-movements might be playing an important role. 

      Examples include the larger pRF sizes observed in achromats. Are those related to fixation instability?

      We thank the reviewer for their comment. As detailed in our previous response, we have now extracted fixation instability data from additional patients and have expanded our discussion of its potential effects throughout the manuscript.

      Given that fixation instability is expected to increase pRF size by a fixed amount, that would explain why ratios are close to 1 in V3 (Figure 4).

      We agree with the reviewer’s point, that the ratio change on its own is not strong evidence of compensation, this analysis was meant to complement the CF result. The plot in Figure 4 is intended to reconcile the connective field (CF) and pRF results. Its purpose is to illustrate that even though larger pRFs in achromats might seem counterintuitive alongside their smaller V3 CF sizes, the pRF data do not contradict the CF findings but they are in fact consistent with one another. We also agree that there are alternative explanations for the differences in pRF size, such as fixation stability, and we have now added this point to the text.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      (2) Topography

      The claim of no change in topography is a little confusing given that you do see a change in eccentricity mapping in achromats. 

      Either this result is real, in which case there *is* a change in topography, albeit subtle, or it's an artifact. 

      Perhaps these results need a little bit of additional scrutiny. 

      One reason for concern is that you see different functions relating eccentricity to V1 segments depending on the stimulus. That almost certainly reflects biases in the modelling, not reorganization - the curves of Figure 2D are exactly what Binda et al. predict. 

      Another reason for concern is that I'm very surprised that you see so little effect of including/not including the scotoma - the differences seem more like what I'd expect from simply repeating the same code twice. (The quickest sanity check is just to increase the size of the estimated scotoma to be even bigger?).

      We thank the reviewer for their comment. We have double-checked our scotoma modelling, confirming its correct implementation. The results of the scotoma modelling are not identical to the full one, just similar (see below).

      Previous studies on “artificial scotomas” (such as the one reported by Binda et al.) have shown mixed results. While Binda and colleagues found that modelling artificial scotomas normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rodfree zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas. Moreover, it is unclear whether scotoma modelling is beneficial in clinical populations as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors. A recent achromatopsia study (Anderson et al. 2024) also found no change in pRF estimates with scotoma modelling.

      In our scotoma analyses, we found meaningful differences only in the non-selective condition in controls where cones in the rod-free zone are stimulated - which would be the main expected effect of this modelling exercise (see below). In all other conditions (rod-selective in controls, both conditions in achromats), only rods are stimulated, we found no difference in coverage, eccentricity or pRF size when modelling the scotoma likely because the foveal signal is weak/absent, and did not contribute much to pRF estimates in the unmasked analyses.

      This means we cannot account for the eccentricity shift as an edge effect with this scotoma model – but we remain cautious about interpreting it as real. This is because first, as we mention in the paper, in the non-selective condition, which has a higher signal-to-noise ratio, the eccentricity estimates in achromats match those of the control group's rod system. Second, it is still possible that the observed shift is an artefact of modelling that was not accounted for by the approach of scotoma modelling.

      Our claim of "no change in topography" specifically referred to the absence of "filling-in" as measured by cortical coverage - the percentage of activated tissue regardless of fitted parameters. However, to avoid confusing given the eccentricity and pRF size results we now rephrased our claim.

      Abstract:

      “Cortical input stages (V1) exhibited high stability, with input-deprived cortex showing no retinotopic remapping and exhibiting structural hallmarks of deprivation.”

      Results (pRF eccentricity):

      “It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      To better illustrate the effect of scotoma modelling text has been added to Supplement 3:

      “Studies on artificial scotomas, where part of the visual field is masked, suggest that pRF estimates of eccentricity and size can be biased by fitting scotoma-edge artefacts, and that these can be mitigated by modelling the scotoma in the pRF fitting procedure (e.g., Binda et al. 2013).

      We therefore repeated the pRF modelling procedure with the rod-scotoma being modelled as a black oval mask (1.25°x0.9°) over the stimulus aperture model. As expected, a visible difference between the two models is only apparent in the nonselective condition in controls where the cones in the rod-free zone are being stimulated. In all the other conditions (rod-selective in controls, and both stimulation conditions in achromats) only the rods are stimulated, therefore the masked stimulus still matches the retinal activation, and no major differences can be observed. Performing the same statistical tests applied to the full model in the main text yields equivalent results of equivalent coverage in the rod-selective condition, with equivalent coverage across groups(t(47) = 0.78, p=0.43, BF10=0.31) and controls show a higher coverage in the non-selective stimulation condition compared to achromats (Mann U(52)=141, p<0.01; unequal variance, reverted to non-parametric).

      This consistency in pRF properties when modelling the rod scotoma, is in line with previous results from scotoma modelling; While Binda and colleagues found that this normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rod-free zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas, and as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors, it is unclear how artificial scotoma findings generalise to clinical populations. Our results are in line with a recent achromatopsia study (Anderson et al. 2024) which also found no change in pRF estimates with scotoma modelling.”

      I'd also look at voxels that pass an R2>0.2 threshold for both the non-selective and selective stimulus. Are the pRF sizes the same for both stimuli? Are the eccentricity estimates? If not, that's another clear warning sign.

      Comparable results were obtained when using higher R2 thresholds. These results are now included in Supplement 6.

      (3) Connective field modelling

      Let's imagine a voxel on the edge of the scotoma. It will tend to have a connective field that borders the scotoma, and will be reduced in size (since it will likely exclude the cortical region of V1 that is solely driven by resting state activity). This predicts your rod monochromat data. The interesting question is why this doesn't happen for controls. One possibility is that there is topdown 'predictive' activity that smooths out the border of the scotoma (there's some hint of that in the data), e.g., Masuda and Wandell.

      One thing that concerns me is that the smaller connective fields don't make sense intuitively. When there is a visual stimulus, connective fields are predominantly driven by the visual signal. In achromats, there is a large swath of cortex (between 1-2.5 degrees) which shows relatively flat tuning as regards eccentricity. The curves for controls are much steeper, See Figure 2b. This predicts that visually driven connective fields should be larger for achromats. So, what's going on?

      The reviewer raises interesting points about the interpretation of our connective field results. The possibility of differential top-down modulation between controls and achromats is intriguing, however it is not supported by the data, if top-down modulation is activating foveal V1 in controls then we shouldn’t see a drop in the amount of significant vertices sampling from the fovea in the rod-selective condition compared to the non-selective, but in fact we do see quite a large drop in the amount of significant vertices in that area in the rod-selective condition. Therefore, at the moment we do not think there is strong basis to assume our data could be explained by achromats lacking top-down predictive activity in the scotoma area that is present in controls.

      Regarding the concern about smaller CFs seeming counterintuitive given the flat eccentricity tuning in achromats' V1: we believe there is not a straightforward prediction from pRF properties to CF sizes. The relationship between V1 pRF characteristics and V3 CF sampling is complex and not well-established in the literature, and the two can be decoupled to some degree. For instance, in our data, controls show flat V1 pRF sizes in the rod-selective condition (similar to achromats), yet their V3 CF sizes maintain the typical eccentricity-dependent increase seen in the non-selective condition. This suggests that CF size patterns don't simply mirror V1 pRF properties or visual stimuli responses.

      Importantly, CF modelling fundamentally differs from pRF analysis in how it might be affected by scotomas. Unlike pRF analysis where a scotoma creates a "silent" region in visual space, in CF modelling the deprived cortex remains physically present and continues generating neural signals (albeit not visually-driven ones). If V3-V1 connectivity were anatomically fixed, V3 would continue sampling from deprived V1 regions even if they do not produce visual-driven signals. A change in this sampling pattern, as we see in our data, is therefore evidence for plasticity.

      Our data support this interpretation. First, in achromats, the CF size pattern observed cannot be easily explained by scotoma-edge artefacts. V3 vertices sampling from the immediate vicinity of the scotoma (1°-3°) show CF sizes comparable to controls. The effect is only significant further away from the scotoma (4°-6°).

      Second, to assess how the presence of a scotoma affects CF measure we can compare the two conditions in the controls, since the rod-selective condition has a scotoma present and the nonselective condition does not. For this purpose, we performed an additional analysis, quantifying on a vertex-by-vertex level the differences in CF fitted parameters between the two stimulation conditions across V1. See results below. In achromats there are no systematic shifts between the stimulation conditions, as expected as both are rod-driven. In controls, this analysis reveals only subtle shifts (~0.45° in the rod-selective condition). CF size has also changed slightly although not significantly different from that observed in achromats. These shifts are much smaller than the CF size and eccentricity differences between controls and achromats, so we consider it unlikely that our findings are driven by scotoma artefacts.

      Author response image 1.

      Results (CF size):

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.

      To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      The beta parameter is not described (and I believe it can alter connective field sizes).

      In Author response image 2, we plot the beta parameter of the pRF modelling in V1 with no R<sup>2</sup> filtering, error bars are 95% CIs:

      Author response image 2.

      The reviewer did not specify how beta might alter connective field sizes. We assume he meant that as in pRF mapping, the slope of activity from deprived to non-deprived cortex will artefactually create a CF model fit with smaller CF sizes. To test this, we calculated the slope of beta values between 0° and 3° in each participant in the rod-selective condition, as this range includes the scotoma and the area at the edge of the scotoma. We then used the slope as a covariate in an ANCOVA when comparing the CF sizes across groups in each sampled V1 segment. Accounting for the beta slope of V1 did not change the reported results. This analysis still shows smaller CF sizes in V3 in the rod-selective conditions between 4°-6° eccentricity – these differences remain significant (p<0.001 for 4°-5° and p<0.05 for 5°-6° when comparing achromats vs controls).

      Similarly, it's possible to get very small connective fields, but there wasn't a minimum size described in the thresholding.

      CF sizes were fit with a grid fit. Possible values were [0.5,1,2,3,4,5,7,10]. Therefore, the minimum size is 0.5. Filtering out the smallest connective field sizes does not change the results:

      Author response image 3.

      I might be missing something obvious, but I'm just deeply confused as to how the visual maps and the connectome maps can provide contradictory results given that the connectome maps are predominantly determined by the visual signal. Some intuition would be helpful.

      We agree that this appears counterintuitive, and now added further clarification. The two models (pRF and CF) fundamentally differ in what they measure and how they relate to visual processing. V1 pRF sizes reflect the relationship between neural activity and visual stimuli - essentially how much of a visual stimulus drives a voxel's response - while V3 CF sizes reflect how V3 samples from the V1 cortical surface, indicating how many V1 voxels contribute to a V3 voxel's activity.

      The measures constrain each other, as a V3 voxel's pRF size is expected to match the pooling of its connected V1 inputs. But they can be decoupled: A V3 voxel could sample from a small area of V1 cortex (a small CF in mm) that happens to represent a large area of visual space if those V1 voxels have large pRFs. The aim of Figure 4B is to clarify that the measures are consistent with one another even though they diverge in direction. In achromats, where V1 voxels have larger pRFs (coarser spatial resolution), V3 appears to compensate by sampling more selectively from V1 via smaller CF sizes. Theoretically, this should reduce the pRF size difference between controls and patients in V3, a prediction that our data supports.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Some analyses might also help provide the reader with insight. For example, doing analyses separately on V3 voxels that project entirely to scotoma regions, project entirely to stimulusdriven regions, and V3 voxels that project to 'mixed' regions.

      We agree that it is important to plot the connective field dynamics across the scotoma region.

      In Figure 4A we split the V3 vertices based on the V1 area they sample from. Therefore the 0°-1° would be considered as mainly sampling from the “scotoma” region and the higher the eccentricity is, the less “scotoma” it includes. The V3 vertices that have a significantly smaller CF size compared to controls are those sampling from mostly if not entirely stimulusdriven regions 4°-5° and 5°-6°. We are not sure how further binning the data by within, across and outside scotoma would be more informative.

      However, in Author response image 4, we plot in more details the distribution of CF sizes sampling from a V1 segment clearly inside and clearly outside the scotoma. The top figure shows the CF size distribution of V3 vertices that sample from a V1 0°-1° segment, where V1 is deprived of input due to the rod scotoma. In achromats, there is a clear drop in vertices with a very small (0.5) CF size. The bottom figure shows the distribution of V3 vertices that sample from the V1 4°-5° segment which falls outside the scotoma and shows a significant difference in CF size across the groups. Here in achromats you can see a drop in larger V3 CF sizes sampling from the V1 region, and an increase in smaller ones (note that this further addresses a previous concern that connective field differences across groups are solely driven by very small CFs).

      Author response image 4.

      Following the reviewer’s comment we have added the following statement in the results section discussing CF size:

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.”

      The finding that pRF sizes are larger in achromats by a constant factor as a function of eccentricity is what differences in eye-movements would predict. It would be worth examining the relationship between pRF sizes and fixation stability.

      We found no relationship between fixation stability and pRF size in V1, although as we explain in response to an earlier point, this does not fully exclude the reviewers alterative explanation, which we now add to the discussion.

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Reviewer #2 (Public review):

      Summary:

      The authors inspect the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. They report an increased cortical thickness in central (eccentricities 0-2 deg) in V1 and the expansion of this effect to V2 (trend) and V3 in a cohort with an average age of adolescents.

      In analyzing the receptive fields, they show that V1 had increased receptive field sizes in achromats, but there were no clear signs of reorganization filling in the rod-free area. In contrast, V3 showed an altered readout of V1 receptive fields. V3 of achromats oversampled the receptive fields bordering the rod-free zone, presumably to compensate and arrive at similar receptive fields as in the controls.

      These findings support a retention of peripheral-V1 connectivity, but a reorganization of later hierarchical stages of the visual system to compensate for the loss, highlighting a balance between stability and compensation in different stages of the visual hierarchy.

      Strengths:

      The experiment is carefully analyzed, and the data convey a clear and interesting message about the capacities of plasticity. 

      Weaknesses:

      The existence of unstable fixation and nystagmus in the patient group is alluded to, but not quantified or modeled out in the analyses. The authors may want to address this possible confound with a quantitative approach.

      We have responded to this in the “Recommendations for the authors” section of this reviewer, as they included a more detailed description of these points there.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think the term rod monochromats should be included early in the paper since it's a more intuitive term to describe this population.

      We agree with the reviewer that the term “rod monochromats” is more intuitive as it clarifies the retinal source of the disease but have chosen the term achromats for consistency with a wide literature of published work in this group, including our own and our close collaborators’. To clarify, in the first mention of the group as achromats in the introduction we have now added this term:

      “Achromatopsia (also known as rod monochromacy) causes cone photoreceptors in the retina to be inactive from birth (Aboshiha et al., 2014).”

      (2) The paper essentially contains two definitions of 'eccentricity'. One (atlas/segments) comes from the Benson atlas and the other (functional) comes from pRF mapping. It would be good to make this distinction terminology clearer earlier in the paper. It would also be good to use more consistent terminology. I assume 'sampled atlas V1 eccentricity' in 3A is the same as 'V1 segment' in 1A?

      For consistency we have now referred to these as V1 segment and sampled V1 segment in the figures when describing the atlas-based definition, and eccentricity for the measured pRF-based eccentricity.

      (3) The 'stability vs. plasticity' framing in the introduction could be tightened slightly.

      We have made the following changes following the reviewer’s comment:

      “In the visual domain, the focal point of the debate on plasticity and stability has hinged on the extent to which retinal input deprivation can drive local reorganisation in early visual cortex, for example, for deprived tissue to take on inputs from spared retinal locations (Adams et al., 2007; Baker et al., 2005, 2008; Baseler et al., 2002, 2011; Calford et al., 2005; Dilks et al., 2009; Dumoulin & Knapen, 2018; Ferreira et al., 2016; Goesaert et al., 2014; Haak et al., 2015; Molz et al., 2023; Ritter et al., 2019; Schumacher et al., 2008). In reality visual impairment is a more global phenomenon, affecting all levels of visual processing, with complex dynamics beyond constricted local retinocortical projection zones(Carvalho et al., 2019).”

      (4) Figure 1A, define the x axis as degrees.

      We have now added the ° sign to all the tick labels indicating Benson map eccentricity.

      (5) Figure 2B, is there room for pictures of the silent substitution/standard stimulus

      We have now added images in a Supplement 5 to avoid cluttering the main Figure 2B

      (6) Figure 2

      Panel A has a slightly weird organization. The reader is supposed to compare the square symbols to each other, and the circles to each other, why not organize the figure so they are adjacent in the graph (i.e. non selective control, non-selective achromat, selective control, selective achromat)? That also helps the reader orient that in the non-selective conditions you have almost complete pRF coverage. 

      We have taken on the reviewer’s suggestion and changed the order.

      In the inset, maybe use empty symbols? That's the traditional way to say that the square/circle applies to both red and black.

      We prefer the current format.

      Figure 2C - the symbols change to circles? Why not keep the symbols of A?

      We have now changed the symbols of 2C&D.

      I'd put the non-selective maps above the selective maps?

      We appreciate the feedback but prefer to keep it as it is, as we feel the critical point is conveyed by the rod maps.

      (7) 'We propose a new hierarchical model of neural adaptation'. These ideas are hardly new. There are also other models, that would explain your data (cumulative plasticity) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953572/

      We thank the reviewer for the reference. We have now cited it in our discussion and removed the word “new” form the mentioned sentence.

      “Therefore, there is theoretically broader scope for experience-dependent reweighting of inputs (Beyeler et al., 2017; Makin & Krakauer, 2023) and to optimise use of inputs that are still available, more reliable, or more relevant in the impaired system. Conversely, higher-order visual areas may appear more plastic simply because they integrate the cumulative effects of learning from multiple lower stages (Beyeler et al., 2017).”

      We propose a hierarchical model of neural adaptation…” [deleted the word new]

      (8) Line 508. No image of the stimulus is contained in the paper

      Corrected

      (9) Line 620. I believe the Figure is 1B, not 1C.

      Corrected

      (10) Figure 4A. CF Size - add mm2 to the axes.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I am not an expert on pRF mapping, and as such, I am unsure how to relate to pRF mapping performed in patients with unstable fixation (not quantified, but referred to) and nystagmus, such as the achromatic population here. Since the majority of the results hinge on this analysis, I would appreciate more data about the differences between the groups. Supplement 2, which is meant to speak to this, shows only the data from 3 typical participants, and in itself is not evidence for "no correlation between stable fixation and enhanced foveal". Additionally, I'd appreciate a clear methods explanation of how the authors address these confounds; this is too important a concern to be left for the discussion section.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space.

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures when then averaged across the two run repeats.”

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      The field connectivity analysis similarly seems to be used only on task data from the same design; if it was replicated from resting-state data, that would be a good way to show consistency which is independent of measures requiring fixation. 

      We agree that resting-state data would be valuable; however, we did not collect such data in these individuals due to time limitations. Instead, we demonstrate the consistency and reliability of our results by replicating our findings across two different stimulation conditions (rod-selective and non-selective), which differ in luminance, contrast and signal amplitude in both groups and for controls also in the photoreceptors involved. The convergence of results across these distinct visual conditions strengthens our confidence in the reliability of the observed effects. Also, notably, CF estimates have been shown to be robust to large eye movements, and therefore also to differences in fixation stability across groups (Tangtartharakul et al., 2023).

      The authors may want to contextualize their findings in relation to what reorganization exists in cases of late-onset loss of part of the visual field on one hand (stroke recovery), and in the case of complete blindness from early life on the other, as both speak to different levels of plasticity the visual system is capable of.

      We thank the reviewer for their comment and have added a new paragraph discussing this topic.

      Discussion:

      “Our findings on hierarchical adaptation have broader implications for other visual disorders, depending on their timing and nature. For instance, a central scotoma acquired in adulthood, as in macular degeneration, may not trigger the same V3 sampling shifts (Haak et al., 2016), suggesting a sensitive window for this form of plasticity, after which connective fields remain more stable. This also raises questions about congenital blindness, where the absence of any driving input could lead to weakening or repurposing of hierarchical connections (Saccone et al., 2024). Moreover, principles may differ between a deprived but structurally intact cortex, as in retinal dystrophies, and a physically damaged cortex, as in stroke. In the latter, more extensive reorganisation may be required to sample effectively from surviving, and potentially disparate, regions of V1. Perceptual training effects in stroke rehabilitation may reflect such dynamics (Cavanaugh et al., 2025; Elshout et al., 2021).”

      A more minor point: Can the authors clarify what the dark adaptation is used for, and provide the supplementary analysis showing that the duration difference for some of the participants didn't impact the results (stated but not shown).

      The dark adaptation period before the rod-selective condition allowed rod photoreceptors to recover from bleaching caused by prior mesopic light exposure, ensuring optimal rod sensitivity under scotopic conditions. To verify that our 15-minute adaptation period was sufficient, we tested 10 control participants with an extended 45-minute adaptation period. As we found no differences in the resulting rod maps between standard and extended adaptation protocols, these participants were combined with the main control group for all analyses. Author response image 5 are the plots for the two dark adaptation periods.

      Author response image 5.

    1. On 2020-05-01 09:48:59, user Kasper Kepp wrote:

      This paper on the state-of-the-art Danish blood-donor data finds a IFR = 0.08% for people between 0-69 years of age. The study is very important because the sampling bias from case fatality ratios (the iceberg effect of knowing almost all deaths but only the most symptomatic cases, i.e. missing the dark number) is largely removed.

      By interpolation, the Danish population now has approximately 1.6% infection, corresponding to 100,000 people out of 6 million. The dark number stands at 12-fold the known cases (7-18).

      Some minor sampling biases remain (people who are blood donors need to be healthy and may be socioeconomically skewed) but considering the wide blood donor representativeness in Denmark, I think all Danish researchers will agree that sampling bias must be small.

      The IFR is also fully in line with the most representative data we have from Iceland (14% of population tested, 48000 tests), where the sampling bias is essentially eliminated, which stands at approximately 0.56% (10 deaths / 1799 cases as of May 1) and includes all the high-risk individuals >70 years. https://www.worldometers.in...

      Compared to the Santa Clara study, which caried potential major sampling bias, this issue seems to be now largely removed. Consensus in Denmark is now emerging that the overall whole-population crude mortality of covid-19 is of the order of 0.25-0.6%, in excellent agreement with the Iceland data.

      These two countries have not have their health care systems strained, making them the relevant data also for this reason for pinpointing the "real" mortality of covid-19 absent overmortality by capacity exhaustion as seen in some other countries.

      Obviously, the fact that the IFR is 0.08% for the 0-69 year old has enormous implications for political decision making in Scandinavia, as it evidences that most of the population can build immunity at much reduced mortality than previously assumed.

    1. On 2020-04-22 16:02:39, user Texas Longhorns wrote:

      The research paper does not indicate how many of those that participated had already been tested for Covid and what those test results were.

      If they over sampled people that had already tested positive and recovered of course you will get a higher rate of positive antibodies. That would not be indicative of the general population.

      There is also the problem of false positives because the test can trigger for the common cold that is also a coronavirus.

      I don't think this research passes muster as any reliable indication of antibodies in the general population and should absolutely not be used as a basis to reopen businesses and large public gatherings.

      Having antibodies to one strain of the virus may not give you any immunity to the more than 8 strains of Covid we know are out there.

      Even if the test results are accurate at 2% that is nothing and you need at least 60% solid immunity to consider any large population to have herd immunity protection.

    1. On 2020-04-10 13:51:26, user steve rubin wrote:

      Does anyone know peak hospitalization during the 2017-2018 flu season? From the cdc summary there were 808,000 total hospitalizations with 61,000 deaths and I remember the flu season was pretty long. I wonder how the curves for new cases, deaths, hospitalizations and icu use looked. I remember stories that the hospitals were crowded but I don't remember stories about people dying because there weren't beds or ventilators.

      I know that it's unpopular to compare coronavirus to the flu. Underestimating a threat is dangerous and could (and maybe did) lead to delay in ramping up testing and beds and ventilators and other necessary medical resources.

      When people were predicting 2,000,000 deaths in the US and then 200,000 deaths I could understand the fears. But now they're predicting 60,000 deaths and it may end up half of that, so I think it's reasonable to make the comparisons.

      Comparing situations to past situations is usually our best way to understand how to react. In terms of how contagious and how lethal this epidemic/pandemic is, it now seems that it and the flu are similarly contagious and that covid is much less lethal. The big difference is that we have a pretty big number of people with significant immunity to the flu while it's likely there was little immunity in our population to covid-19. If we end up with 200,000,000 people becoming infected but with only 60,000 deaths, then covid was a fifth as lethal as the flu for 2017-2018 with 3 in 10,000 infections dying vs 14 in 10,000 infections dying from the flu.

      However the comparison to the flu can lead to some counter-arguments. For example, the cdc uses a multiplier of ~80 for estimated current flu infections vs confirmed flu infections. Applying that to covid-19 means that we have ~500,000 confirmed cases so we would have had 40,000,000 total infections leaving another 160,000,000 to go assuming 60% of the population for herd immunity. Projecting deaths would mean 17,000 + 68,000 for a total of 85,000. We'll soon know what that multiplier is for covid-19 because there are a number of antibody surveys going on in the US and internationally. You can bet that the same thing will happen for the flu next year and we'll have a more accurate estimate of infections and lethality for the flu rather than the current guesstimates.

      A big question is how social distancing will have affected the final number of infections and deaths. It seems so logical that social distancing will curb infections and deaths, but many suggest that it may end up only prolonging the length of the pandemic while not making a significant difference in total final infections and total final deaths. The antibody tests may give us the answer to that as well.

    1. On 2020-05-20 17:49:28, user Christopher Leffler wrote:

      Bottom line, how many people does Dr. Ioannidis think will die in the US from this epidemic? If one reads the paper, he proposes that " even under congested circumstances, like cruise ships, aircraft carriers or homeless shelter, the proportion of people infected does not get to exceed 20-45%."<br /> Also, he believes that the infection fatality ratio is: " Infection fatality rates ranged from 0.03% to 0.50% and corrected values ranged from 0.02% to 0.40%."<br /> So, these numbers would give estimates for the United States of:<br /> Low end: 331,000,000 people * 0.2 * 0.0002 = 13,240.<br /> High end: 331,000,000 people * 0.45 * 0.004 = 595,800.<br /> The range is so wide as to provide no useful information. And of course, the pandemic is already at 92,387 deaths in the US, as of May 20, 2020. So we know Ioannidis low end is simply wrong.<br /> We have looked at the mortality in different age groups in New York, among residents and transit workers, and on the Diamond Princess:<br /> https://www.medrxiv.org/con...<br /> Quite early in the pandemic (early April), we showed that if the US followed the course that Italy and Spain had already experienced, we would see 100,000 dead in the US:<br /> https://www.researchgate.ne...<br /> More recently, we showed that if the mortality rates seen in New York MTA / New York State / Diamond Princess were observed nationally, the mortality could be over 600,000, which is the high end for Ioannidis work also:<br /> https://www.researchgate.ne...<br /> So, the bottom line is, that the high end projections from all groups could be quite high indeed. So we will need to be vigilant--wearing masks, protecting the vulnerable, etc. The pandemic is real. To say that it is similar to a typical flu is just plain false. Even Ioannidis own projections do not rule out that this is far worse than the flu. When is the last year the flu killed 92,000 Americans and was on track to kill potentially hundreds of thousands more?

    1. On 2020-10-07 12:44:20, user Iratxe Puebla wrote:

      Review completed as part of ASAPbio’s #PreprintReviewChallenge

      The study examines the incidence of heart disease deaths in the early pandemic period in the US (30 March to April 26) in areas without large COVID-19 outbreaks. The authors sought to study whether a decline in acute myocardial infarction (AMI) admissions was linked to either a higher mortality rate (which would suggest avoidance of care seeking), or lower mortality (which may suggest less triggers for AMI). The authors use data from the CDC’s s National Center for Health Statistics and apply inclusion criteria requiring >97% completeness for the data.

      The study includes data from a reliable source and includes controls involving a comparison to incidence of heart disease deaths in the same period in 2019 and 4 weeks earlier in 2020. While the study is observational and can only point to trends and not explain the reported decrease in incidence of heart disease death in several states during the study period, it helps surface this trend and opens lines for further research to evaluate whether the trend will sustain over a longer period and if so, look into the potential factors behind the trend. If the trend were to sustain over time and was found not to be associated with misclassification of death cause, it may provide avenues to identify factors that can reduce triggers for AMI.

      Minor comments<br /> - The authors indicate ‘The primary analysis captured 747,375,188 person-weeks for the early pandemic period and 101,620,248 person-weeks for the 2019 control period’ the number of person weeks for the control period is considerably lower, can the authors provide some context for this, and whether this may have any influence on the analysis?<br /> - The abstract indicates ‘The mean incidence rate (per 100,000 person-weeks) for heart disease in states without excess deaths during the early pandemic period was 3.95 (95% CI 3.83 to 4.06) versus 4.19 (95% CI 4.14 to 4.23) during the corresponding period in 2019’, the Results section reads ‘The mean incidence rate (per 100,000 person-weeks) for heart disease in states without excess deaths during the early pandemic period was 3.95 (95% CI 3.83 to 4.06) versus 4.35 (95% CI 4.23 to 4.48)’ it appears they need to be updated to match?

      Questions for the authors<br /> - Now that we have data from four additional months into the pandemic, are the authors planning an extension to the analysis?<br /> - For the states where an increase in the incidence of heart disease deaths was observed, the authors mention the possibility of harm due to avoidance of care, misclassification during a period of excess deaths and COVID-19 itself increasing cardiovascular deaths. Do the authors think that capacity at hospitals may have been a factor behind any increase in heart disease deaths? E.g. related to prioritization of COVID-19 admissions vs others.

    1. On 2020-10-26 17:59:08, user Meng-Ju Wu wrote:

      Hi! It is interesting to read the paper in discussion for EVs to differentiate ALS from healthy and diseased groups. And I want to share my thought on the study.

      I think the main contribution of the study includes the purification of EVs with the nickel-based isolation compared to the conventional methods that makes the analysis of specific EV parameters highly sensitive and reliable. If the EVs are reliably differentiate ALS patients from healthy and diseased group, clinical assessment with the blood test will significantly shorten the diagnosis time for ALS and that the treatment may be started as early as possible. In addition, if biomarkers are available to detect ALS patients, it means that we can develop the treatment specific to ALS using their unique properties. Patients can avoid costly and lengthy process of ALS diagnosis.

      I have two questions considering the methods. First, why was the supernatant from human plasma diluted in filtered PBS once but the serum from mice required 10 times for dilution? Second, what was the temperature and humidity condition for the incubation of activated charged agarose beads in NBI? I think the time to use the obtained serum would be the limitation of this approach. The content of the EVs might be changed if the centrifuged plasma samples are not immediately used. Such compositional change may be subject to the storage condition and the degradation rate of each specific proteins. It may also vary among species. Therefore, a specific time period to analyze the plasma should be strictly regulated.

      In general, I think there are no major grammatic or spelling errors. However, the content may be modified in order to make it more logical and convincing to read. In the introduction part, I think it is important to summarize how is ALS diagnosed clinically. If the readers are informed that electrophysiologic diagnosis takes longer time and effort and make the diagnosis, they would appreciate the value of blood test to detect suspected ALS patient in prodromal state. In the last paragraph of the introduction, it is not reasonable to mention that the study results suggesting EVs are food biomarkers. It should be mention in the discussion or conclusion section. In the material section, the time of patient inclusion was missing. In the animal model, the paper should mention why only female mice with SOD1G93A and male mice with TDP-43Q331K were studied. Also, the timing to study the two different genes as well as the number of the mice were concerning to interpret the results. I want to suggest making a visual diagram on the machine learning technique. You did a great job in comparing the difference between ultracentrifugation and NBI using EV-like liposomes. As such, I want to suggest applying the same comparison onto the animal model to test the reliability of the using the NBI method alone in the paper. The results and the discussion are well-written and consistent with the tables and figures provided

    1. On 2021-06-23 21:55:50, user David Wiseman PhD wrote:

      Summary:<br /> Regarding the continued and unnecessary confusion related to the Argoaic and Artuli comments.<br /> 1. These are in reality distractions from the central issue that the original NEJM paper remains uncorrected in NEJM as to shipping times. Although a secondary issue, also uncorrected is the "days" nomenclature that is the reason for confusion in the Argoaic and Artuli comments on this forum. Also uncorrected in the original paper is the exposure risk definition which were informed were also incorrect. Together, these issues controvert the conclusions of the original study.<br /> 2. The incorrect nomenclature for "days" in the NEJM paper as well as in a follow up work (Clin Infect Dis, Nicol et al.) inflates the number of "elapsed time" days. This has not been corrected by the original authors. We on the other hand have corrected this by providing the correct information in our preprint.<br /> 3. Dr. Argoaic seems to have been given a wrong and earlier version (10/26) of the data which, although contains a variable that is supposed to correct the above problem, does not. In fact one cannot come to any conclusion that there is a discrepancy based on this incorrect 10/26 version, unless you have some preconceived notion.<br /> 4. Other post hoc analyses reported in follow up works (including social media) by the original authors looking at time from last exposure, or using a pooled placebo group, although flawed for a several reasons, when examined closely, nonetheless support our conclusions that early PEP prophylaxis with HCQ is associated with a reduction of C19.

      Detail:<br /> Any confusion about "days" would disappear once the original authors correct the NEJM June 2020 paper as well as a follow up letter in Dec 2020 Clin Infect Dis (see upper red graph in Nicol et al. pubmed.ncbi.nlm.nih.gov/332... "pubmed.ncbi.nlm.nih.gov/33274360/)"). These errors inflate the "DAYS" by 1 day because the nomenclature for describing "days" was incorrect. As far as we know those corrections have not been made in the journals where these errors appear and in a way that can be retrieved in pubmed etc..

      As far as we can tell, anyone who has cited the NEJM paper (NIH guidelines, NEJM editorial, many meta-anlayses etc., our protocol in preprint version) also misunderstood the "days" to mean the inflated figure. So the authors need to correct this. As far as we know we are the only ones to do this. After we were informed of this error by the PI (who was unaware of the problem himself) we described this problem very clearly in our preprint, distinguishing between elapsed time and the day on which a study event occurred. For the benefit of those who remain confused, we will endeavor to make it even clearer in a future version. You can read our correspondence log referenced in the preprint to verify that the incorrect "days" nomenclature was unknown to the PI, at least until 10/27 when he informed us about it.

      You are confusing "DAY ON which an event occurred" with "DAYS FROM when an event occurred." For example the original NEJM Table 1 says "1 day, 2 days etc." for "Time from exposure to enrollment". This falsely inflates the number of elapsed time days by 1, and as the authors informed us (documented in our preprint), this really means DAY ON which enrollment occurred, with Day 1 = day of exposure, so you need to subtract 1 from the days to get elapsed time FROM exposure. The same error is repeated in Nicol et al. (note: we discuss other unrelated issues relating to time estimates in our preprint).

      To confuse matters further, the problem is not even corrected in the dataset linked (datestamp 10/26/20) in the Argoaic comment. In column FS there is a variable "exposure_days_to_drugstart." This appears to indicate elapsed time (ie DAYS FROM) when it actually means the "DAY ON" nomenclature. We were only informed of the nomenclature error on 10/27/20 and later provided with a new version of the dataset on 10/30 where an additional variable "Exposure_to_DrugStart" (column GR) was provided that corrects this error by subtracting 1 from all the values.

      Why the Argoaic comment does not link to the correct 10/30 version is unclear, but in this incorrect 10/26 version, the values for the new variable "Exposure_to_DrugStart" (column GR) are IDENTICAL to those in the "exposure_days_to_drugstart" (column FS) variable (they should be smaller by 1). Accordingly, unless Drs. Argoaic and Artuli had a preconceived notion (without checking the data) that some alteration had occurred, it is impossible to draw such a conclusion (albeit one that is incorrect for other reasons) from this incorrect 10/26 dataset. A number of colleagues have downloaded the 10/26 dataset from the link provided in the Agoraic comment, and have verified this problem.

      So in addition to the original data set released in August 2020, as well as the three revisions (9/9, 10/6 and 10/30) we describe in our preprint there is this incorrect 10/26 version. I don't know how many people this affects but it would be appropriate for them to be notified that the version they have may be an incorrect one. An announcement on the dataset signup page covidpep.umn.edu/data would also be in order (nothing there today).

      Regarding the possibly higher placebo rate of C19 on numbered day 4 (18.9%). This is matched by a commensurate change in its respective treatment arm, yielding RR=0.624 similar to that for numbered days 2 (0.578) and 3 (0.624), justifying pooling. We don't know if the 18.9% represents normal variation or has biological meaning.

      Although they used enrollment time data (completely irrelevant to considering whether or not early prophylaxis is beneficial), the original authors (Nicol et al.) in a post hoc analysis, used a pooled placebo cohort to compare daily event rates (red bar graph). This would mitigate possible effects of an outlying value in the placebo cohort. We applied this same pooled placebo method to the data that correctly takes into account shipping times. This method is still limited because it may obscure a poorly understood relationship between time and development of Covid-19. Although at best this would be considered a sensitivity analysis, we did it to answer the Artuli question. This approach yields the same trends as our primary analysis. Using 1-3 days elapsed time of intervention lag (numbered days 2-4) for Early prophylaxis, there is a 33% reduction trend in Covid-19 associated with HCQ (RR 0.67 p=0.12). Taking only 1-2 days elapsed time intervention lag, we obtain a 43% reduction trend (RR 0.57 p=0.09). This analysis appears to reveal a strong regression line (p=0.033) of Covid-19 reduction and intervention lag.

      We also looked at the post hoc analysis provided by the original authors (Nicol et al.) that used “Days from Last Exposure to Study Drug Start,” a variable not previously described in the publication, protocol or dataset, so we have no way of verifying it from the raw data. As in a similar PEP study (Barnabas et al. Ann Int Med) this variable has limited (or no) value, as we are trying to treat as quickly as possible from highest risk exposure, not an event (ie Last Exposure) that occurs at an undefined time later. (even the use of highest risk exposure has some limitation, which the authors pointed out to us and which we discuss in our preprint). Further the Nicol analysis used a modified ITT cohort, rather than the originally reported ITT cohort. with these limitations, pooling data for days 1-3 and comparing with the pooled placebo cohort (yields a trend reduction in C19 associated with HCQ (it is unclear which "days" nomenclature is used) after last exposure from 15.2% to 11.2% (RR 0.74, p=0.179).

      Taken together with these "sensitivity" analyses inspired by the original authors' methodology, suggests that this is not an artifact of subgroup analysis. It could be said that any conclusions made by the sort of analyses conducted by Nicol are equally prone to the "subgroup artifact" problem. (also note that in our paper, the demographics for placebo and treatment arms in the early cohort match well).

      Mention has been made elsewhere of two other PEP studies (Mitja, Barnabas) which concluded no effect of HCQ. It is important to note that the doses used in these studies were much lower than those used in the Boulware et al. NEJM study. Further, according to the PK modelling of the Boulware group (Al-Kofahi et al.) these doses would not have been expected to be efficacious (the Barnabas study used no substantial loading dose). So citing the Mitja and Barnabas studies to support claims of HCQ inefficacy in the Boulware et al paper is unjustified. On the contrary, taken together three studies suggest a dose-response effect. We discuss this in detail in our preprint.

      Lastly it is important to note the since the original NEJM study was terminated early, the entire original analysis can be thought of as a subgroup analysis, with all of the attendant problems referenced by the original authors (and us). There is certainly a great deal of under powering and propensity to Type 2 errors, among the issues inherent in a pragmatic study design. The study was not powered as an equivalence study and so no definitive statement can be made that the HCQ is not efficacious. Along with the still uncorrected (in the original journal) issues of shipping times, "days" nomenclature and exposure risk definitions, there are are certainly many efficacy signals that oppugn the original study conclusions,and controvert the statement made in a UMN press release (covidpep.umn.edu/updates) "covidpep.umn.edu/updates)") that the study provided a "conclusive" answer as to the efficacy of HCQ.

      _________________<br /> Please note that despite our offer to Dr. Argoaic to contact us directly to walk though the data to try to identify any issues, we have not been contacted.That offer is still extended to anyone who remains confused. We have also attempted to locate both Drs. Argoaic and Artuli to try to clear up their confusion, but these names do not exist in the mainstream literature (i.e pubmed, medrxiv), nor do they appear to have any kind of internet footprint.

      With regard to Table 1 of our preprint, the reason why there are no patients for “Day 1” is that there were no patients who received drug the same day as their high-risk exposure. This is consistent with the PIs comment on 8/25/20 (p10 of email log) (at a time when he thought that there was a “Day zero”) “Exposure time was a calculated variable based date of screening survey vs. data of high risk exposure. Same day would be zero. (Based on test turnaround time, I don’t think anyone was zero days).”

      We notice an obvious typo in the heading for the second column of our Table 1, which says “To”. But it should say “nPos”, to match the 5th column (and other tables). It is patently absurd that there should be a category of “1 to 0” days or “7 to 5” days etc. “From” makes no sense either and these typos have absolutely no effect on the analysis, interpretation or conclusions. This will be corrected in a later version.

    1. On 2025-11-30 17:00:32, user Cyril Burke wrote:

      RESPONSE TO REVIEWER #2<br /> June 27, 2022<br /> Reviewer #2: Thank-you for the opportunity to review this work which highlights the importance of monitoring serum creatinine over time and how this can be a useful tool in detecting possible CKD. This is an important topic as the use of sCr on its own is certainly under-utilized and changes are often missed because they don’t fall into a predefined category.<br /> Thank you for considering our manuscript and for your detailed comments.

      MAJOR CONCERNS

      A. “Choi- rates of ESRD in Black and White Veterans” doesn’t fit with the rest of the paper including the title; the introduction and conclusion also don’t adequately address this portion of the paper. It feels disjointed from the main point of discussion which is the use of sCr in screening “pre-CKD”. This section and discussion should be removed and possibly considered for another type of publication.<br /> We have attempted to clarify this inclusion. This manuscript could be divided into three or four short papers, increasing the likelihood that any one of them would be read. However, different groups tend to read papers about screening for kidney impairment, racial disparities, cofactors in modeling physiologic parameters, or policy proposals to encourage best practices. Despite the appeal of perhaps three or four publications, we decided to tell a complete story in a single paper, but we are open to suggestions.

      Black Americans suffer three times the kidney failure of White Americans. Other minority groups also have excessive rates of kidney disease. However, analysis of Veterans Administration interventions can bring that ratio close to one, similar interventions might also reduce to parity the risk for Hispanic, Asian, Native Americans, and others. Within-individual referencing should allow better monitoring of all patients and help to reveal the circumstances and novel kidney toxins that lead to progressive kidney decline. The ability to identify a healthy elderly cohort with essentially normal kidneys would help to calibrate expectations for all. Better modeling of GFR should help everyone, too.

      Over eight decades, anthropologists have had little scholarly success in diminishing the inappropriate use of ‘race’. Keeping these parts together may be no more successful, but we feel compelled to try.

      B. Cases 1 - 3, (lines 93 – 122): where are these cases from? There is no mention of ethics to publish these patient results, which appears to be a clear ethics violation. If so, these cases should be removed and patient consent and ethical approval obtained to publish them.<br /> The authors describe the reasons for not obtaining an ethics waiver for this secondary data analysis. Despite this, the relative ease of obtaining an ethics waiver for secondary data analysis usually means that this is done regardless.<br /> We take patient privacy seriously and have completely de-identified the Case data, as required by Privacy Act regulations. We understand that no authorization or waiver was necessary. We discussed the issues with an IRB representative, reviewed the relevant regulations, and confirmed no need for formal review of a secondary analysis of already publicly available IRB-approved data or of completely de-identified clinical data collected in the course of a treating relationship.

      IRBs have a critical role to play, but many (including ours) are overworked. We understand the impulse authors feel to gain IRB approval even when the regulations clearly do not required it. As we discuss in the revision, there is a more significant matter that IRBs could help to resolve if they have the resources to do so. For all of these reasons, and even though we, too, felt the urge to obtain IRB approval, we resisted adding “just a little more” to their work.

      C. The message of the article and data representation is unclear: do the authors wish to show that sCr is superior to eGFR in this “pre-CKD” stage, should both be used together? Do the authors wish to convey that a “creatinine blind range” does not exist? Or is the aim to demonstrate that continuous variables should not be interpreted in a categorical manner?<br /> Our interest is detection and prevention of progression of early kidney injury at GFRs above 60 mL/min – a range in which eGFR is especially unreliable. We have advanced the best argument we can to detect changes in sCr while kidney injury is still limited and perhaps reversible. If experience reveals that some avoidable exposure(s) begins the decline, then clinicians might alert patients and thereby reduce kidney disease. How best to use longitudinal sCr remains to be determined from experience. However, our message is that early changes in sCr can provide early warning of a decline in glomerular filtration. We are confident that clinicians can learn to separate other factors that may alter sCr, as we do for many other tests.

      MINOR CONCERNS<br /> ABSTRACT<br /> A. Vague. Doesn’t give a clear picture of the study<br /> We have tried to clarify the title and abstract and are open to further suggestions.

      INTRODUCTION<br /> B. 51 – 57: needs to state that these stats are from e.g. the US. The authors should consider adding international statistics to complement those from the US.<br /> We have updated the statistics on death rates from kidney disease to include US and global data.

      C. 68: reference KDIGO guidelines, state year<br /> We now reference the KDIGO 2012 guidelines.

      D. 75 – 77: is this reference of the New York Times the most appropriate?<br /> We have expanded this section with peer-reviewed, scholarly references. However, we found Hodge’s summary of the issue succinct and hence potentially more persuasive for some than decades of scholarly references that have had limited or no effect in the clinic.

      E. 82: within-individual variation not changes (this is repetition of the point made in lines 425 – 427, but should match the language)<br /> We have matched the language.

      F. 82 – 84: reference? If this is a question it should be presented as such<br /> We have attempted to clarify this statement.

      G. 84: “normal GFR above 60” = guidelines (including KDIGO) do not refer to 60 as normal GFR, 60 – 89 is mildly decreased. (see line 126)<br /> We agree and have corrected the language.

      H. 93: avoid the use of emotive words such as apparently (also in line 428)<br /> We wanted to emphasize appearance without proof and have made these changes.

      I. 94: “Not meeting KDIGO guidelines”: KDIGO 2.1.3 includes a drop in category (including those with GFR >90). This would appear to include some of the cases listed. Additionally, albuminuria should have been measured for case 2 and 3.<br /> We have clarified that cases may or may not fit KDIGO categories, though that question will frequently arise in evaluating sCr changes. Where available, we have added urine protein and/or albumin results to the Cases.

      J. 97: “progressive loss of nephrons equivalent to one kidney”: this is based on a single creatinine measurement.<br /> Since the original submission, we discovered for this Case (now Patient 3) early serum creatinine results and notes indicating a six-month period off thiazide diuretic. This data clarified the baseline and showed a remarkable effect of thiazide diuretic on sCr. We have added follow-up sCr results and details of thiazide use to the ASC chart.

      K. 93 – 122: Could any of these shifts be explained by changes in creatinine methodology or standardization of assays, especially over 15 – 20 years (major differences between assays existed before standardization and arguably still exist with certain methods).<br /> It would be useful to see a comparison between serial sCr and eGFR measurements on the same figure. There appears to be significant (possibly more pronounced) changes when eGFR is used. As line 87 mentions changes in eGFR may be as useful (and in some situations more useful) than changes in sCr alone.

      It would be helpful to have a chronology from each local laboratory with the date of every change in creatinine assay or standardization. However, any single shift draws attention but does not necessarily indicate significant change in glomerular filtration. After one or several incremental increases, over at least three months, the sCr pattern may meet the reference change value (RCV) that signals significant change. In the future, from age 20 or so, a patient’s medical record should retain the full range of the longitudinal sCr for true baseline comparison.

      As noted in the revised manuscript, Rule et al showed that there is measurable nephrosclerosis even in the youngest kidney donors, suggesting that some injuries (perhaps exposure to dietary toxins) may begin in childhood and that early preventive counseling may be worthwhile. Experience will show whether this can slow progression to CKD. As we note, quoting Delanaye, sCr accounts for virtually 100% of the variability in eGFR equations based on sCr (eGFRcr), and these equations add their own uncertainties, so no, we do not believe that eGFR is more useful than sCr when GFR is above 60 mL/min and possibly much lower as well.

      We have added eGFR results to the ASC charts (in blue), though availability was somewhat limited.

      L. 127 – 142: should there be separate charts for males and females, the differences in creatinine between males and females needs to be discussed somewhere in the paper.

      We do not think there should be separate charts for men and women based on size. The role of sex in eGFR equations is mainly based on the presumption that the average woman has less muscle mass than the average man. Clinicians care for individuals, not averages, and this sweeping generalization that increases agreement of the average of a population introduces unacceptable inaccuracy to individual care. Within-individual comparison eliminates the need for assumptions on relative size or muscle mass. Major changes in an individual’s muscle mass will usually be evident to the clinician who can adjust for them.

      However, reports suggest significant influence of sex hormones on renal function, including effects of estrogen and estrogen receptors, such as reducing kidney fibrosis, increasing lupus nephritis, and increasing CKD after bilateral oophorectomy. The mechanism of these effects and how they might be incorporated into eGFR estimating equations is unclear, but the effort may benefit from a more individualized approach with focus on a measurand rather than matching population-based averages of a quantity value (calculated from measurands).

      M. Similarly, is this suitable for all ages?<br /> We think so. Another sweeping generalization based on age merely introduces another inaccuracy which complicates the task of clinicians caring for individuals. Older persons have varying health, athleticism, muscle mass, dietary preferences, etc. Rule et al reported that biopsies of about 10% of older kidney donors had no nephrosclerosis. Within-individual comparison eliminates the need for assumptions on relative muscle mass or inevitable senescent decline in nephron number. We substitute the assumption that any change in an individual’s muscle mass will be evident and can be accounted for. A seemingly ubiquitous risk factor, or factors, starts injuring kidneys at a young age, which we may yet identify.

      N. 162 – 163: rephrase<br /> Done.

      METHODS<br /> O. 185 – 193: aim belongs in the introduction, can be adjusted to complement paragraph 178 – 182.<br /> Reorganized and rewritten.

      P. 196 – 205: reference sources

      References provided.

      Q. 224 – 247: not in keeping with the rest of the article or title and conclusion

      We have revised and restructured this section.

      RESULTS<br /> R. If eGFR is treated as a continuous variable does inverted sCr still have higher accuracy?<br /> We believe so. Serum creatinine is a measurand and reflects the total sum of physiologic processes, known and unknown. In contrast, eGFR equations yield a quantity value, calculated from a measurand and dependent on the assumptions and approximations incorporated by their authors. The eGFR equations are thus necessarily less accurate than the measurands they are derived from, in this case, sCr. In a hyperbolic relationship, as the independent variable drops below one and approaches zero, the effect is to amplify the inaccuracy of the independent variable in the dependent variable. By avoiding the mathematical inverting, the data suggest that direct use of sCr is far more practical for pre-CKD.

      S. As mentioned, the section on ESRD in black and white veterans doesn’t fit in with the rest of the article.<br /> We have revised, reorganized, and rewritten. We also outlined our rationale above.

      DISCUSSION<br /> T. As mentioned, section 4.1 doesn’t fit in with the rest of the article. As the authors note the correlation between illiteracy and CKD is likely not causal.<br /> See above.

      U. 387: erroneous creatinine blind range. The data presented does not show this is erroneous there is still a relative blind range. A distinction must be made between a population level “blind range” and an individual patient’s serial results. The data and figure 4 in particular demonstrate the lack of predictive ability of sCr above 40ml/min compared to below 40ml/min at a population level. For an individual patient this “blind range” is more relative, and a change in sCr even within the normal range may be predictive. (Note: the terminology “blind range” is problematic).<br /> We agree. On reading closer, Shemesh et al call attention to “subtle changes” in serum creatinine even though they had access only to the uncompensated Jaffe assay, so their recommendation to monitor sCr is even more forceful, today, due to more accurate and standardized creatinine assays. We have attempted to clarify this in the manuscript.

      V. 399 – 400: “rose slowly at first and then more rapidly as mGFR decreased below 60” this refers to a relative blind range. Whether these slow initial changes can be distinguished from analytical and intra-individual variation is the question that needs to be answered before we can say a “blind-range” doesn’t exist for an individual patient.

      We appreciate this observation. We believe longitudinal sCr is worth adopting to gain insights into individual sCr patterns, which may reveal early changes in GFR, among other influences on sCr. This is a low-cost, potentially high-impact population health measure, and there seems little risk in trying it because many clinicians already use components of the process.

      W. 425 - 432: sCr is indeed very useful when baseline measurements are available. eGFR remains useful when baseline sCr is not available or when large intervals between measurements are found.<br /> As Delanaye et al noted, virtually 100% of the variability in longitudinal eGFR is due to sCr, so we understand that the errors in eGFR can be (and usually are) greater than but cannot be less than those in sCr.

      X. 425: low analytical variation- if enzymatic methods are used<br /> Lee et al suggest that even the compensated Jaffe method provides some accuracy and reproducibility, which may allow longitudinal tracking of sCr even where more modern assays are as yet unavailable.

      Y. 428: avoid the use of “apparently”<br /> Done.

      Z. 430: reference 56 compares sCr and sCysC with creatinine clearance NOT with mGFR, this does not prove that mGFR has greater physiologic variability. Creatinine clearance is known to be highly variable (partially due to two sources of variability in the measurements of creatinine: serum and urine).<br /> The creatinine clearance is another form of mGFR, and our understanding of it begins with the units: if the clearance or removal of creatinine were being measured, the units should be umoles/minute, but they are mL/min. “Clearance” is an old concept coined by physiologists to describe many substances, such as urea, glucose, amino acids, and other metabolites. Since creatinine is mostly not reabsorbed and is only slightly secreted in the tubules, the “creatinine clearance” became a measure of GFR. The ratio of urine Creatinine to serum Creatinine is simply a factor for how much the original glomerular filtrate then gets concentrated (typically about 100-fold) by the kidney. Since the assumption is that the timed urine was once the rate of glomerular filtrate production, the creatinine clearance is a measure of the GFR.

      Creatinine clearance has some inaccuracies based on tubular secretion, but also has some advantages: blood concentrations are essentially constant during urine collection, no need for exogenous administration, and reliable measurements in serum and urine. The methods that we often call mGFR also have problems, including unverifiable assumptions about distributions, dilutional effects, and others we cite in the text. None of these are direct measures of GFR. Due to changes in remaining nephrons, even true GFR itself is not strictly proportional to the lost number of functional nephrons, which seems the ultimate measure of CKD that Rule et al estimated from biopsy material.

      AA. The limitations of sCr for screening should also be discussed: differences in performance and acceptability between enzymatic and Jaffe methods (still widely used in certain parts of the world), the effect of standardizing creatinine assays (an important initiative but one that could also produce shifts in results around the time of standardization- see cases), low InIx means that once-off values are exceedingly difficult to interpret, is a single raised creatinine value predictive (or should there be evidence of chronicity): similarly are there effects from protein rich meals, etc (The influence of a cooked-meat meal on estimated glomerular filtration rate. Annals of Clinical Biochemistry. 2007;44(1):35-42. doi:10.1258/000456307779595995)<br /> We have added discussion of additional references on reproducibility of sCr assays and discuss dietary meat and, in Part Three, possible dietary kidney toxins.

      CONCLUSION<br /> BB. The discussion recommends using SCr above eGFR while the conclusion recommends the NKF-ASN eGFR for use in pre-CKD and ASC charts. While the use of both together in a complementary fashion is understandable- this needs to be congruent with the discussion, aims and results.<br /> We have rewritten this section. We would welcome any further recommendations.

      Cyril O. Burke III, MD, FACP

    2. On 2025-11-30 16:56:07, user Cyril Burke wrote:

      RESPONSE TO REVIEWER #1

      June 27, 2022<br /> Re: Longitudinal changes in creatinine signal early decline in glomerular filtration rate without consideration of age, sex, ‘race’, and nationality

      We greatly appreciate that the reviewers were thorough, fair, and helpful in their comments.

      Comments to the Author

      Reviewer #1: Burke et al submit a somewhat unusual paper, devoted to a topic of potential major clinical relevance, and as yet understudied.

      General comments

      1. The thesis of the authors, that using the baseline serum creatinine of a given patient would potentially improve the earlier diagnosis of kidney disease, even in the normal range, is in line with the experience of this reviewer, who always retrieves, whatever the difficulty of reaching that goal, past results of blood tests, and uses them as a way to date the onset of kidney disease, sometimes with important prognostic implications.

      Your experience adds support to the literature suggesting that historical sCr levels provide a context for sCr changes. These benefits might encourage investments in digital data exchanges so that electronic health records (EHRs) can ease collection and presentation of sCr results from multiple commercial and hospital laboratories.

      2. Yet, the authors do not provide data strongly supporting their thesis. For instance, when looking at case 2 [now Patient 3], should the last point (the most recent one) be omitted, there would be very little evidence supporting progressive early kidney disease.

      We advocate prospective monitoring of longitudinal sCr as a proxy for glomerular filtration rate (GFR). The Cases were meant to show that charting the data and simple follow-up over several visits and months can allow general clinicians to differentiate CKD from other explanations for increased sCr. The four case histories represent patients in a non-nephrology medical practice with borderline eGFR that raised the possibility of CKD. In each of these cases, retrospective collection of sCr values suggested varied explanations for the elevated sCr, and we expect many cases will represent sCr influences other than CKD, not necessarily warranting nephrology referral. Armed with this tool, and used prospectively, Physicians, nurse practitioner, and physician assistants (PCPs) might identify and manage the 90% of patients with currently unrecognized CKD.

      3. The claim that the statistics fit the data better when all points are used (page 9,11) should not come as a surprise. Using thresholds instead of the full range of values has long been known to be more powerful for statistical analysis. But fitting the data does not equal to a high positive predictive value!

      We agree that this is counterintuitive, so we thought this was an important point to discuss. Research methods that get translated into clinical settings rely on assumptions that are not always familiar to healthcare workers. Whatever the merits of thresholding conventions, understanding their mathematical underpinnings can inform a more nuanced interpretation of lab results. The revision includes our initial, intuitive assessment of the data and the interpretation of the residuals – from a mathematics perspective. Lack of awareness about residuals can easily lead to improper interpretation of thresholded lab data. The use of statistics is not intended to document superiority of fit but rather to demonstrate how simplifications with practical clinical value may gloss over clinically relevant information in some cases. The inclusion of additional charts seeks to take it away from abstracted statistics and toward more intuitive clinical concerns. We favor early diagnosis of kidney injury through investigation of nonspecific changes in longitudinal sCr. This method seems usable and may be manageable by PCPs using a time frame of several visits over several months to separate false positives, which may be influenced by chance attributable to the mathematical properties of lab data.

      4. A key question is whether in a real-world context, the earlier diagnosis of kidney disease would be possible, without too much background noise from intercurrent illness (functional), drugs (NSAIDS, etc.). In other words, would the specificity (or PPV) of the suspicion of early kidney disease be reasonable enough to catch the attention of clinicians

      We think so. We believe longitudinal serum creatinine (sCr) will encourage dialogue between patients and clinicians, raising awareness of the importance of avoiding kidney injuries that often happen out of sight and out of mind until, for far too many, culminating in urgent dialysis. In the same way that patients now ask for their blood pressure, we anticipate patients tracking their own sCr and kidney risks. Decades after introduction of the mercury sphygmomanometer, PCPs learned how to manage blood pressure to improve health. We believe longitudinal sCr can soon be a widely used tool because the concepts are old, there is a broad literature supporting this approach, and the value can be enhanced by more frequent testing of sCr. This is what PCPs do – sort the random cough, costochondritis, or stress response from nascent pneumonia, angina, and hypertension. PCPs already worry about the kidneys. They may welcome a tool to accompany the chest radiograph, electrocardiogram, and sphygmomanometer.

      Of interest, the decision analysis by den Hartog et al found markedly more false-positive diagnoses of CKD with eGFR than with serum creatinine alone.

      5. Even though there has been improvement in the standardization of measurement of serum creatinine (IDMS), the comparability of results measured by different labs remains suboptimal, at least in the experience of this reviewer, and medical shopping is not uncommon, making the availability of all previous results in the same graph a logistical challenge.

      We share this concern, which laboratorians have wrestled with for many years and will not be solved soon. However, we propose utilizing the maximum serum creatinine (sCr-max) to smooth the variability of these inputs (as well as the variability from patient diet and hydration). One laboratory will be the highest, and when patients use multiple laboratories, one laboratory may more often define the sCr-max. As patients learn the rationale for using the same lab, we believe most (not all) will voluntarily use one or perhaps two labs (as they mostly do when we repeating longitudinal MRI imaging studies, for example). The sCr-max reduces the effect of variability between laboratories, allowing clinical insights even without future improvements in sCr assays.

      Australia, Canada, and the United Kingdom have stricter sCr analytical performance goals than the United States, which could improve its sCr comparability by matching their standards.

      Specific comments

      1. The authors should mention that the USPTFS decided a month ago to revisit the question of screening for kidney disease in high-risk groups (page …)

      One reference stated that this initiative has not been announced publicly but is “under active consideration” by USPTFS because “…for a screening to help people live longer, healthier lives, clinicians must be able to treat the condition once it is found. The existence of effective treatments is one of many important factors that the Task Force considers.” This perspective is surprising because it ignores the potential of effective prevention by avoiding NSAIDs, hypotension, dehydration, and nephrotoxic medical treatments (e.g., aminoglycosides). We, too, look forward to updated findings from USPTFS.

      2. Even though ESRD has a legal meaning in the USA, not very relevant to the topic of this paper about early kidney disease, the authors should stick to the nomenclature proposed by a recent KDIGO consensus conference (see Levey et al. Nature Reviews in Nephrology). In particular, use kidney failure instead of ESRD/ESKD. When the topic is glomerular filtration, use that wording instead of kidney function (page…)

      We have adopted this terminology and would welcome any further recommendations.

      3. The authors allude to the concepts of prediabetes and prehypertension. But this reviewer points to the fact that the levels used to define those entities are currently “generic”, rather than based on previous values in an individual subject. Please discuss.

      We understand that the normal population ranges for serum glucose and blood pressure are narrower, with less interindividual variation, so population reference ranges work well for monitoring diabetes mellitus and hypertension. Unfortunately, this is not true for serum creatinine, though within-individual reference of longitudinal sCr appears to facilitate diagnosis of pre-CKD.

      4. The authors repeatedly mention in the discussion section evidence that even small increases in serum creatinine have prognostic significance. This has indeed been known for decades but is a different topic: AKI. Admittedly, there is growing evidence that AKI and CKD are linked. But that the stability of a biological parameter is prognostically best is all except surprising: the same is true for body weight, mood, blood pressure etc.

      We agree that AKI and CKD appear to be merging and this may become clearer from more frequent sampling and charting of longitudinal sCr. What has been missing is graphical representation of the data to allow quick assessment for CKD in long-term trends, and this may soon be obtainable from EHRs and IT departments, which should end the practice of deleting historical data of value to longitudinal analysis.

      [See next comment for Response to Reviewer #2.]

    3. On 2025-11-30 23:44:45, user Cyril Burke wrote:

      [Note: This is the second of several rounds of review of an earlier version of our combined manuscript, aiming to reduce ‘racial’ disparity in kidney disease. The comments were kindly offered by nephrologists, through a medical journal, and we remain grateful to them for the time and care they gave to improve our manuscript.

      We removed identifying features and included our responses, at the end of this comment. The changing title and line numbers refer to earlier versions.]

      August 3, 2022<br /> Dear Dr. Burke III,

      REDACTED.

      Reviewer #1: Cyril O Burke III et al submit a revised version of their intriguing , unusual paper.

      Overall, the paper remains extremely lengthy (the total , including clean and track versions and reply to reviewers is close to 200 pages !!) , whereas it contains relatively little original data.

      The authors speculate and comment a lot (and most of these speculations/comments will hardly be understandable by the expected audience, primary care physicians), and this will in addition distract the reader from the main key message (which is right in the opinion of this reviewer (see first round of review) and warrants more attention and studies.

      The race part is irrelevant for the key point (race does not change over time, and thus is not relevant when looking at longitudinal serum creatinine or eGFR) and should be deleted in the opinion of this reviewer. In this respect, I completely agree with the comment of reviewer 2 in the first round.

      I can not resist quoting here the reply of the authors to reviewer 2. “This manuscript could be divided into three or four short papers, increasing the likelihood that any one of them would be read. However, different groups tend to read papers about screening for kidney impairment, racial disparities, cofactors in modeling physiologic parameters, or policy proposals to encourage best practices. Despite the appeal of perhaps three or four publications, we decided to tell a complete story in a single paper, but we are open to suggestions.”

      My reply to their reply: nobody would read the current paper , even partially. Shorten, shorten, shorten please and focus on the key message.

      Reviewer #2: Thank-you, once again, for the opportunity to review this lengthy “thesis-style” manuscript which discusses some important often over-looked topics. The under-use of serial creatinine measurements and over-reliance on often erroneous eGFR measurements is an important point which is easily missed by healthcare workers with potentially serious consequences. Likewise, the misuse of racial constructs in medicine (and elsewhere) is an important point.

      I am satisfied with this re-submission and the changes which have been made to the original manuscript.

      Minor points:<br /> 431: “creatinine inhibits several membrane transporters”. = Cimetidine

      502: “Because mGFRs have population variation as wide as sCr, with much greater physiologic variability compared to the relatively stable sCr and serum cystatin C”<br /> As mentioned previously the cited article compares the variability of sCr and cystatin C with CrCl, I agree with the authors that CrCl is a form of mGFR, however, probably one of the poorer forms and not what a reader will think of when mGFR is mentioned. In our current age of medicine when we talk about mGFR CrCl is seldom included, studies reviewing methods of mGFR will seldom include CrCl, however CrCl may be compared to one of the mGFR methods. Likewise, if a patient is sent for a mGFR, a CrCl will not be performed. In our current age of medicine mGFR refers to methods such as the clearance of iohexol, iothalamate, Cr-EDTA, inulin, DTPA, etc; the authors themselves mention this (line 539 – 540). I fully agree with the authors that mGFR is FAR from perfect and has many inaccuracies and imprecisions (which are often overlooked)- these are well published, some of which are cited in this manuscript. If the authors wish to use the current study as a source they should state the findings in a way that cannot be misinterpreted. For example: “CrCl has much greater physiologic variability than sCr and cystatin C …” – in this case the reader can determine for themselves whether they would use CrCl as a surrogate for mGFR. Alternatively, adjust the statement and use another source which has shown the variability that exists with what we currently refer to as mGFR method.

      670 – 719: As the authors specifically discuss age it would be prudent to briefly mention the short-comings, or considerations for interpretation, of serial creatinine measurements at a very young age which generally rise until late adolescence when steady muscle mass is achieved. Also note changes in creatinine and GFR from birth till 2 – 3 years.

      783 – 784: Consider re-wording the grammar makes this sentence difficult to read

      959 – 968: Note, editing has not been accepted (tracked changes still shown)

      1116 - 1121: “Using the opioid crisis as an example…. in, for example, the opioid crisis” – same sentence

      RESPONSE TO REVIEWERS:<br /> September 17, 2022<br /> Longitudinal creatinine, not ‘race’, signals pre-chronic kidney disease and decline in glomerular filtration rate

      We again greatly appreciate the reviewers for offering detailed comments and guidance, which we have endeavored to incorporate as best we could.

      Comments to the Author<br /> Reviewer #1: Cyril O Burke III et al submit a revised version of their intriguing, unusual paper.<br /> 1. Overall, the paper remains extremely lengthy (the total, including clean and track versions and reply to reviewers is close to 200 pages !!), whereas it contains relatively little original data.<br /> The authors speculate and comment a lot (and most of these speculations/comments will hardly be understandable by the expected audience, primary care physicians), and this will in addition distract the reader from the main key message (which is right in the opinion of this reviewer (see first round of review) and warrants more attention and studies.<br /> The race part is irrelevant for the key point (race does not change over time, and thus is not relevant when looking at longitudinal serum creatinine or eGFR) and should be deleted in the opinion of this reviewer. In this respect, I completely agree with the comment of reviewer 2 in the first round.<br /> I can not resist quoting here the reply of the authors to reviewer 2.<br /> "This manuscript could be divided into three or four short papers, increasing the likelihood that any one of them would be read. However, different groups tend to read papers about screening for kidney impairment, racial disparities, cofactors in modeling physiologic parameters, or policy proposals to encourage best practices. Despite the appeal of perhaps three or four publications, we decided to tell a complete story in a single paper, but we are open to suggestions."<br /> My reply to their reply: nobody would read the current paper, even partially. Shorten, shorten, shorten please, and focus on the key message.<br /> We fundamentally agree and have worked to shorten the text; to clarify our understanding that ‘race’ may change with time, location, and self-identification; and to add a Table of Contents to make the Parts more accessible to interested readers. We comment a lot because, in highly racialized societies, like the US [1,2], it can be difficult to see beyond ‘race’ without explicit speculation about other possible explanations for difference, which we understand, may or may not pan out under investigation. One hope is that all clinicians will pursue explanations other than ‘race’, but this seems unlikely. Busy medical researchers have little time to develop expertise outside their area of interest, which may explain why ‘Commentary’ and ‘Perspective’ articles have failed to inspire an ethical ban on the misuse of ‘race’ in medical research, journals, clinics, and elsewhere [3]. We do not know whether a suite of articles can meaningfully contribute to ending misuse of ‘race’, where so many scholarly articles have failed, but after perceiving little change over four decades, trying something completely different seemed (almost) rational.

      1. Nunez-Smith M, Curry LA, Bigby J, Berg D, Krumholz HM, Bradley EH. Impact of race on the professional lives of physicians of African descent. Ann Intern Med. 2007 Jan 2;146(1):45-51. doi: 10.7326/0003-4819-146-1-200701020-00008. PMID: 17200221.

      2. Betancourt JR, Reid AE. Black physicians' experience with race: should we be surprised? Ann Intern Med. 2007 Jan 2;146(1):68-9. doi: 10.7326/0003-4819-146-1-200701020-00013. PMID: 17200226.

      3. McFarling UL. Troubling podcast puts JAMA, the ‘voice of medicine,’ under fire for its mishandling of race. Stat News. 2021 April 6 [Cited 2022 August 31]. Available from: https://www.statnews.com/2021/04/06/podcast-puts-jama-under-fire-for-mishandling-of-race/ <br /> Reviewer #2: Thank-you, once again, for the opportunity to review this lengthy “thesis-style” manuscript which discusses some important often over-looked topics. The under-use of serial creatinine measurements and over-reliance on often erroneous eGFR measurements is an important point which is easily missed by healthcare workers with potentially serious consequences. Likewise, the misuse of racial constructs in medicine (and elsewhere) is an important point.<br /> Thank you for again giving time for helpful criticism and comments on our manuscript.

      A. I am satisfied with this re-submission and the changes which have been made to the original manuscript.<br /> Minor points:<br /> B. 431: “creatinine inhibits several membrane transporters”. = Cimetidine<br /> Corrected.

      C. 502: “Because mGFRs have population variation as wide as sCr, with much greater physiologic variability compared to the relatively stable sCr and serum cystatin C”<br /> As mentioned previously the cited article compares the variability of sCr and cystatin C with CrCl, I agree with the authors that CrCl is a form of mGFR, however, probably one of the poorer forms and not what a reader will think of when mGFR is mentioned. In our current age of medicine when we talk about mGFR CrCl is seldom included, studies reviewing methods of mGFR will seldom include CrCl, however CrCl may be compared to one of the mGFR methods. Likewise, if a patient is sent for a mGFR, a CrCl will not be performed. In our current age of medicine mGFR refers to methods such as the clearance of iohexol, iothalamate, Cr-EDTA, inulin, DTPA, etc; the authors themselves mention this (line 539 – 540). I fully agree with the authors that mGFR is FAR from perfect and has many inaccuracies and imprecisions (which are often overlooked)- these are well published, some of which are cited in this manuscript. If the authors wish to use the current study as a source they should state the findings in a way that cannot be misinterpreted. For example: “CrCl has much greater physiologic variability than sCr and cystatin C …” – in this case the reader can determine for themselves whether they would use CrCl as a surrogate for mGFR. Alternatively, adjust the statement and use another source which has shown the variability that exists with what we currently refer to as mGFR method.<br /> We appreciate this comment and have both added another reference and added to the text an argument for reconsidering creatinine clearance. Many hospitals and some countries lack the resources for advanced mGFR filtration markers, which are only used for research or for screening related to kidney transplants. However, most laboratories have the tools for ‘quick-creatinine clearance’ (quick-CrCl), which may be an acceptable alternative to the classic mGFRs. If confirmed, a simple and affordable quick-CrCl might allow hospitals and laboratories worldwide an alternative measurement requiring fewer assumptions for another aspect of glomerular filtration.

      D. 670 – 719: As the authors specifically discuss age it would be prudent to briefly mention the short-comings, or considerations for interpretation, of serial creatinine measurements at a very young age which generally rise until late adolescence when steady muscle mass is achieved. Also note changes in creatinine and GFR from birth till 2 – 3 years.<br /> We have added a brief discussion of the diagnosis of CKD in infants, children, and adolescents.

      E. 783 – 784: Consider re-wording, the grammar makes this sentence difficult to read<br /> Done.

      F. 959 – 968: Note, editing has not been accepted (tracked changes still shown).<br /> Done.

      G. 1116 - 1121: “Using the opioid crisis as an example…. in, for example, the opioid crisis” – same sentence.<br /> Rewritten.

      We thank you.

    1. On 2021-12-25 08:38:40, user Eslam Maher wrote:

      The authors investigate whether Machine Learning (ML) algorithms fare better compared to traditional Cox models in big data. They selected Glioblastoma and gliosarcoma from SEER as the basis of their data set. There are two main points that are worth considering here, (1) statistical, and (2) clinical.

      (1) a- Glioblastomas are relatively rare diseases, therefore, readers need to bare in mind that the hypothesis studied here may not be relevant to their work that is usually mono-institutional or multi-institutional. Unlike the huge SEER database, we never actually have such numbers at hand to analyze in survival models.

      There is no doubt that Cox would outperform ML models in smaller samples. ML is gaining popularity in the medical community that is hugely inflated and unnecessary.

      b- Unlike ML approaches, the performance of Cox models is heavily dependent on its assumptions. This includes the proportionality of hazards between levels of a given variable, which the authors do not seem to have investigated this assumption before running the model.

      Another assumption is how the model was selected in the first place. The authors say they have run Cox univariably to decide upon the variables that would be used in the final mode. It is unclear whether a "significant" variable is considered as such at 5% alpha. Regardless of the alpha level, automated stepwise methods are notorious, this is because they are very popular among physicians and not professional statisticians and epidemiologists. Stepwise methods do not allow modelers to think about the model at hand. Plus, some causal variables may not be statistically significant, while some nuisance variables may be coincidentally significant due to high N. Automated regression using p-values is a bad idea because it also ignores multiplicity problems.

      (2) a- 22.6% of the cases included had no surgery, how then were they diagnosed as glioblastomas if no tissue samples were available? It is unclear if surgeries comprised craniotomies and biopsies or the former alone.

      b- All glioblastomas and gliosarcomas are grade IV tumors, however, for some reason, grade is a variable included in the models with levels of grade I, II, III, and IV!

      c- Reference categories in the authors' models were selected alphabetically rather than clinically. For Site, there are 14 levels using ICD-O classifications. Such classifications are not meant for clinical correlations. For example, all Lobar sites (frontal, pariental, occipital etc) are part of the Cerebrum. There are only 2 cases available for cauda equina glioblastomas, which is nonsensical to include as a separate level in the model (which puts more constraints in the model's degrees of freedom while also resulting in unstable ratios).

      d- Finally, the median survival for glioblastoma patients as noted by the authors was eight months. Looking for model accuracy at 120 months is just insane.

      This would have a been a neat paper had the authors run a proper Cox model rather than run a straw man, and designed their study with a neuro-oncologist. Even then, please note that this preprint is concerned with the performace of these models IN BIG DATA only, so do not extrapolate to the data you are routinely working with.

    1. On 2020-03-20 20:57:29, user Sylvie Vullioud wrote:

      Could authors provide information to dissipate high risks of bias:

      1. Manuscript was first published on mediterranee-infection.com website, not on medRxiv. On the manuscript on the website on mediterranee-infection.com, I can read 'In Press 17 March 2020 – DOI : 10.1016/j.ijantimicag.2020.105949'. It means that manuscript was already accepted by International Journal of Antimicrobial Agents at the time when the manuscript was deposit on the 20.03.2020 on medRxiv.

      -> Pre-print on medRxiv is not a real pre-print to collect feed-back for manuscript improvement, as originally designed for. Moreover, medRxiv states: 'All preprints posted to medRxiv are accompanied by a prominent statement that the content has not been certified by peer review'.

      -> There is an obvious potential conflict of interest, because last author Raoult is editor of the article collection COVID-19 Therapeutic and Prevention in International Journal of Antimicrobial Agents.

      -> International Journal of Antimicrobial Agents is runned by Elsevier, suggesting 'If accepted for publication, we encourage authors to link from the preprint to their formal publication via its Digital Object Identifier (DOI)'.

      1. Discussion on the controversy of main cited Chinese paper, ref 8 ?

      2. According to paper, allocation of patients group was random but treated group is 51.2 years average and control group 37.3 years?

      3. Article describes 3 conditions of patients: asymptomatic, low and high symptoms. Why?

      4. Care to patients, biological and physiological sampling and analyses, and statistical analyses were not blinded. Why?

      5. I think that no placebo was used. Why?

      6. 6 patients on total of 42 were excluded from study: three patients were transferred to intensive care unit, 1 stopped because of nausea, 1 died. One left hospital. <br /> It is written :'study results presented here are therefore those of 36 patients (20 hydroxychloroquine-treated patients and 16 control patients). Why were dead, intensive care, and nausea patients not included in statistical treatment? <br /> -> This may be a selection bias? <br /> -> What about unwanted very worrying effects of the treatment?

      7. 'The protocol, appendices and any other relevant documentation were submitted to the French National Agency for Drug Safety (ANSM) (2020-000890-25) and to the French Ethic Committee (CPP Ile de France) (20.02.28.99113) for reviewing and approved on 5th and 6th March, 2020, respectively'. Pre-print was posted on 20.03.2020. Time points on day 14 on patients.<br /> -> So recruitment and study started before approval of ANSM and French Ethic Committee? How is it possible?

      8. How is it plausible that numerous authors (18!) participated equally to the work? Is it possible to add their respective contributions?

      Thank you in advance for considering my questions. <br /> Regards, <br /> Sylvie Vullioud

    1. On 2021-09-04 19:09:42, user Ben Veal wrote:

      As a qualified statistician who's been doing this stuff for over 20 years, and has worked on several medical studies I think I ought to add my voice to the crowd.<br /> There may be a few things that aren't fully accounted for such as the false positive rate for PCR tests, or unbalanced populations due to deaths of highly vulnerable members of the pre-infected group, but they should not alter the conclusions much. As mentioned by others the false positive rate for PCR tests would have the effect of biasing the risk ratio downwards, not upwards, so we should expect the effect to be even stronger than reported.

      As for the potential drop-out issue due to deaths of highly vulnerable people among the pre-infected group; this would only be a problem if there are some unaccounted for cofactors causing that high vulnerability. If this is the case then we can approximately correct for the imbalance by estimating the number of deaths in the pre-infected group based on the known infected mortality rate. <br /> I have done that calculation (see link below), and get a lower bound estimate for the 95% confidence interval of [4.3,11.23] which is still significant.<br /> However, it could make a big difference to the risk of hospitalization (again assuming there are some important cofactors unaccounted for).<br /> https://www.facebook.com/ec...

      Another criticism I have read in these comments is that they should have used a conditional model (https://en.wikipedia.org/wi... "https://en.wikipedia.org/wiki/Conditional_logistic_regression)") to account for the matching. Actually a conditional model is used when there is unequal distribution of the treatment groups (pre-infected & vaccinated) within each strata (age, gender, socio-economic status & geographic region), and you are unable to use covariates to control for this. But the matching that they did ensures that this isn't the case. Furthermore they control for all but one of the strata (geographic region) with covariates.

      So, overall I trust the overall conclusion; natural immunity from pre-infection is better than vaccination, but not as good as natural immunity + vaccination.

      This does not mean governments should put a halt to their vaccination programs since that's obviously going to result in more deaths among the vulnerable, but perhaps it might be wise to reduce the vaccination rate among the less vulnerable people (i.e. young healthy people) so that they can build up natural immunity and be better prepared to fend off new variants from spreading through the population. In fact it ought to now be possible to estimate the optimal proportions of vaccinated & unvaccinated that would result in the lowest risk of contagion spread, given that we can expect to see this virus reappearing every year.

    2. On 2021-09-14 13:39:06, user Henri van Werkhoven wrote:

      Dear colleagues,

      With interest did we read this manuscript which fueled a lively discussion during our journal club of the department of infectious diseases epidemiology at the University Medical Center Utrecht. The authors address a relevant research question. If there is a substantial difference in the risk of SARS-CoV-2 infections between previously infected and vaccinated individuals – as suggested - this may have consequences for social distancing, testing recommendations, and for projections of the impact of vaccination on future COVID-19 trends. However, we have several concerns regarding generalizability, selection bias, information bias, and confounding that we would like to address. We focus our discussion on model 1: the comparison of the fully vaccinated non-infected group (group 1) to the infected non-vaccinated group (group 2).

      In regard to generalizability:<br /> - Due to the matching process, only 4% of the available data is used (i.e. for model 1 only 32430/736559) and as a consequence the study population is fairly younger (with expectedly less comorbidity) than the source population (i.e. vaccinated individuals, infected individuals). Therefore, the study population may not be representative of this source population which severely limits the external validity of results for all vaccinated/infected people.<br /> - Naturally, subjects who died due to previous SARS-CoV-2 infection were not included in the study. Yet, without information on morbidity and mortality and contribution to the spread of SARS-CoV-2 from the primary infection, the results of the study are not informative for the question whether people without previous SARS-CoV-2 infection should be vaccinated or await natural infection. <br /> - All three study groups – vaccinated or infected at baseline (28th of February) – were established upon future information (no infection, no additional vaccination after June 1, 2021), which severely limits the use of the results for today’s decision making.

      In regard to selection bias:<br /> - People with a SARS-CoV-2 infection between February 28, 2021 and June 1, 2021, or those who received a first (infected group) or third vaccine (vaccinated group) between February 28, 2021 and August 14, 2021 were excluded from this study. Thus the study population of group 2 consists of previously infected people that do not take the opportunity to receive a booster vaccine, which may well be the less vulnerable people with a lower baseline risk of getting infected/hospitalized. This would bias the estimate in favor of the infected group.<br /> - Similarly, though at a smaller scale, people who died from COVID were not included in the analysis. This decreases the vulnerability of the infected group for secondary infections and/or hospitalization. This too would bias the estimate in favor of the infected group.

      In regard to information bias:<br /> - A difference in willingness to test between the vaccinated and previously infected group can result in biased estimates. Vaccinated people may be more on guard in regard to COVID-19 symptoms (especially if they adhere less to regulations because they are vaccinated) and will be tested more frequently. This can bias the estimate, again in favor of the infected group. However, this form of bias should not have affected the outcome hospitalization due to COVID-19, for which differences had the same direction. Yet, the number of those endpoints was low, limiting statistical power.

      In regard to confounding:<br /> - The authors acknowledge absence of information about health behavior, such as social distancing and masking. If the vaccinated group would adhere less to these preventive measures due to a sense of safety, this would also bias the estimates in favor of the infected group.<br /> - A potential important aspect is the young average age (36 years) of the study population. As they were all fully vaccinated before February 28th, we thought that a large proportion may have been health care workers, who have a higher chance of exposure to SARS-CoV-2, and thus infection after vaccination. This would also bias the estimate in favor of the infected group.

      We have scrutinized the paper in search of the fatal flaw; the one major methodological limitation that could explain the extreme effect in favor of the infected group, as reported. We conclude that it is not there, as we don’t think that any of the above biases can explain all of the effect. However, we did found several weaknesses that each have the potential to yield a modest bias, all in the same direction. Five modest biases may yield a large effect estimate. We, therefore, consider the question whether natural immunity provides better protection than full vaccination with Pfizer/BioNTech’s COVID vaccine remains unanswered.

      The authors (Annemarijn de Boer, Valentijn Schweitzer, Marc Bonten and Henri van Werkhoven, all at University Medical Center Utrecht) acknowledge all other journal club participants for their time dedicated to discussing the paper.

    1. On 2021-12-13 22:59:33, user Just Because I can wrote:

      Greetings RI team from Utah! I must begin with nicesties; "Go BRUNO"! My son graduated this past May 2021 from Brown. I am a speech and language pathologist with over 30 years of hospital, private and public school setting experiences. Over the past nine years, I have professionally focused on children ages 3-5 within the public preschool and private therapeutic settings. I service students and their parents with the most intensive and restrictive learning environments within our District due to cognitive, behavioral and communicative delays. I can't help but weigh in now, as I previously shared this article with my peers in August as I braced for the impact of the 2021 school year.

      Given your single assessment tool (I professionally do not profess strong decisions based on a single evaluative instrument, even as widely accepted at the Mullen), I've found your results to be intriguing and frankly, just as we anticipated.

      To compare to RI, our school district, closed schools for Remote Learning for only 3 mos. in the Spring of 2019 and returned to in person instruction with hybrid options in 2020. Of a caseload of 65 students, I had 3 that were online/virtual. In 2021, our District returned to essentially all in student learning.

      My informal observations this school year in Utah has been as follows:

      1. Increase in new referrals and eligible "older" 4+ year old children scoring remarkably delayed communication (Standard scores <50 given a typical range of 85-115) and no previous history of EI or preschool interventions. Our TIER 3, most restrictive preschool program has a marked influx of new referrals (e.g., total students in May was 24 and currently rises at 36 with 8 new referrals in Jan.)
      2. Many declined or rarely attended virtual Early Intervention supports, skipped medical wellness visits including dentistry during the pandemic.
      3. Increase in parent report of primary concerns with behavioral components.
      4. Given the current timeframe, we are NOT seeing marked progress with an influx in discharges (no longer eligible due to more typical standard scores). We are seeing progress and we have continued to see progress through the pandemic (which at times surprised me) but the levels of improvement are not as remarkable or typical as years past.
      5. Typical communication, fine/gross motor and even cognitive delays are still present but the comorbidity of exceptional delays in social/pragmatic and ultimately, behavioral skills combined make measured learning and ultimately IEP progress at a slower rate. Social/pragmatic delays are interfering with overall progress.
      6. Parent involvement, participation, enthusiasm and grit appear markedly depressed. Educational teams walk a fine line between empathy, compassion and expecting parents and care givers to step in and "do hard things" in difficult times. The teams are using external motivators such as pizza cards to motivate parents to attempt, complete and turn in 2x monthly parent based home practice pages.
      7. Increased rate of meeting attendance with Virtual options.

      Where do we go from here? I agree, measuring student outcomes is critical but supporting the parents (in any evidence based manner) is to me, a critical and crucial element. I thought the kids, once exposed to typical learning/situations and with repetition, our inflated numbers would flatten in a year and they would bounce back into typical ranges but it's the apathetic, tired, depressed parents that are lacking resilience and grit currently. I do think another component that would be most valuable and continues to need funding is Preschool for All (or most).

      Thank you to any cohort, parent, professional person interested in this dialogue, for reading my insights.

    1. On 2020-04-24 00:57:17, user Philip Davies wrote:

      Well, well well ...

      This pre-print would make a good script for an episode of Columbo.

      The retrospective analysis, as presented, leads the reader to just one conclusion in a bazaar of many possible conclusions.

      I am even starting to have sympathy with D. Raoult and his team. I note his hot tempered response to this paper, where he lists two enormous factors that should be considered when wrestling with the data: the fact that the HCQ and HCQ & AZ cohorts were a sicker crowd (he lists lymphopenia) and that the sickest of the non-HCQ ventilated patients were then given HCQ (plus AZ in most cases) in a desperate last bid only for most to die.

      Raoult's point is certainly valid.

      We must remember that for most of the study period the use of HCQ was "ex-license" on a compassionate basis only. This means only the sickest patients got it. Remember also that this is a retrospective analysis, therefore observational. It was not run as a therapeutic trial. On the other hand, the use of AZ was already accepted (hence 30% of the non-HCQ cohort got it anyway).... although do be aware that by this time there had been quite a lot of focus on potentially dangerous QT lengthening when HCQ and AZ were used together in very sick patients.

      The HCQ cohort was, across all key determinants, the weakest and sickest group (it had the poorest prospects looking at age, ethnicity, smoking status, congestive heart failure, peripheral vascular disease, cerebrovascular disease (strokes),dementia, COPD, Diabetes (with and without complications)! ... and indeed, the HCQ and HCQ & AZ cohorts did have 100% more lymphopenia than the non-HCQ group.

      BUT, the big asymmetric issues become obvious when we look at the pre- and post- ventilator numbers.

      In terms of patients discharged without needing ventilation, the "victorious" non-HCQ group performs poorer than the 2 treated groups. This despite having a better prognostic baseline. But the results for this group change dramatically (for the better) when we look at the outcomes of ventilation. 25 ventilated patients came from this group.... but 19 of these 25 patients were then started on HCQ or HCQ & AZ after ventilation was started. It is screamingly obvious that these would be the sickest patients in that group: they were given such compassionate drugs in extremis. So having ejected 19 of 25 ventilated patients into the other cohorts, the non-HCQ group only had 3 deaths from its remaining 6 ventilated patients.

      The numbers of ventilated patients in the other cohorts (HCQ and HCQ & AZ) were thus substantially inflated with these new super-sick patients, who mostly died.

      There really can be no conclusion at all when looking at a study of this nature without knowing much more about individual clinical conditions and guiding principles behind clinician's decision making. It's still possible to make some reasonable assumptions:

      If I were Columbo?... I would say the non-HCQ cohort contained patients of extremes, with the best and worst potential. The worst would have been the very frail (malignancy and or congestive heart failure maybe ... see the stats), who probably were earmarked for 'supplemental oxygen' only from the very start. Such patients would not have been suitable for compassionate use of non proven drugs (remember, most of this came before the "emergency use" edict by FDA). This would explain the number of non-ventilated patients who died in this group (they may have been given AZ only, not being a controversial drug, but otherwise they did not get any significant interventional therapy). These patients would have had significant chronic disease and very poor obs/indices (including lymphopenia). But given that this cohort had, overall, a better starting prognosis than the other two groups, it means that the remaining patients in the group were promising candidates for survival (with better obs/indices). Such patients, not being part of a clinical trial, would not have been offered HCQ on a compassionate basis unless they got dramatically worse .... and of course, the ones who did get worse on the ventilator were started on HCQ (& often AZ as well) and thus swapped into the HCQ / HCQ & AZ cohorts.

      If we can understand that, then we might start to think that in fact HCQ & AZ is the best performing cohort with the other 2 vaguely distant. But this is being unfair to the HCQ cohort:

      The reason that a sick patient would be given one experimental drug on a compassionate basis (HCQ) but not have a rather less experimental drug further added (AZ), can really only be explained by considering risk versus benefit. A clinician would choose to use HCQ because the patient was particularly sick. The clinician would only add AZ if it was felt that this was worth the risk.... but a particularly sick patient with significant cardiovascular disease (the HCQ contained the most CVD risk) might then die of a more abrupt arrhythmia through adding yet another QT lengthening drug. I dare say the clinicians were tempted to make some "Hail Mary" plays, but we must remember, these patients were not part of an ongoing trial, these drugs were "ex-license" for compassionate use only and clinicians were still accountable for responsible actions. So for those particularly sick frail patients, it wasn't worth the risk.

      I am pretty sure that the HCQ cohort (which had pretty good pre-ventilator stats) crashed badly because it was loaded with the sickest patients .... patients that were too sick to risk adding AZ.

      So, the findings of this retrospective analysis are, in my opinion, likely to be incorrect.

      I believe I can confidently state that:

      1. The HCQ cohort started with the sickest patients and had even more of the sickest added during ventilation. Some were too sick to risk the addition of AZ to existing HCQ.
      2. The HCQ/AZ cohort also had some very sick patients (again with more additions during ventilation).
      3. The Non-HCQ cohort had the best prognosis overall from the very start (although likely a polarized mixture of the most frail and the most promising)... and then its stats got even better when it jettisoned its sickest ventilated patients into the other 2 cohorts.

      It is almost impossible to reach a conclusion from all this. BUT, the most likely finding is NOT that adding HCQ delivers a worse outcome than standard treatment. In fact, if we look at the pre-ventilator stats, the addition of HCQ might actually have provided considerable benefit to a particularly sick group of patients. Whether or not the addition of AZ to HCQ adds benefit is also unclear ... although my 'swingometer' is pointing slightly more to benefit than harm.

      Once again. I suggest that a robust study into prophylaxis and early treatment (using sensible safer doses adjusted for pulmonary sequestration) will deliver the most interesting results for CQ/HCQ.

      Dr Phil Davies<br /> Aldershot Centre For Health<br /> http://thevirus.uk

    2. On 2020-04-24 09:57:00, user Philip Davies wrote:

      Well, well well,

      This pre-print would make a good script for an episode of Columbo.

      The retrospective analysis, as presented, leads the reader to just one conclusion in a bazaar of many possible conclusions.

      I am even starting to have sympathy with D. Raoult and his team. I note his hot tempered response to this paper, where he lists two enormous factors that should be considered when wrestling with the data: the fact that the HCQ and HCQ & AZ cohorts were a sicker crowd (he lists lymphopenia) and that the sickest of the non-HCQ ventilated patients were then given HCQ (plus AZ in most cases) in a desperate last bid only for most to die.

      Raoult's point is certainly valid.

      We must remember that for most of the study period the use of HCQ was "ex-license" on a compassionate basis only. This means only the sickest patients got it. Remember also that this is a retrospective analysis, therefore observational. It was not run as a therapeutic trial. On the other hand, the use of AZ was already accepted (hence 30% of the non-HCQ cohort got it anyway).... although do be aware that by this time there had been quite a lot of focus on potentially dangerous QT lengthening when HCQ and AZ were used together in very sick patients.

      The HCQ cohort was, across all key determinants, the weakest and sickest group (it had the poorest prospects looking at age, ethnicity, smoking status, congestive heart failure, peripheral vascular disease, cerebrovascular disease (strokes),dementia, COPD, Diabetes (with and without complications)! ... and indeed, the HCQ and HCQ & AZ cohorts did have 100% more lymphopenia than the non-HCQ group.

      BUT, the big asymmetric issues become obvious when we look at the pre- and post- ventilator numbers.

      In terms of patients discharged without needing ventilation, the "victorious" non-HCQ group performs poorer than the 2 treated groups. This despite having a better prognostic baseline. But the results for this group change dramatically (for the better) when we look at the outcomes of ventilation. 25 ventilated patients came from this group.... but 19 of these 25 patients were then started on HCQ or HCQ & AZ after ventilation was started. It is screamingly obvious that these would be the sickest patients in that group: they were given such compassionate drugs in extremis. So having ejected 19 of 25 ventilated patients into the other cohorts, the non-HCQ group only had 3 deaths from its remaining 6 ventilated patients.

      The numbers of ventilated patients in the other cohorts (HCQ and HCQ & AZ) were thus substantially inflated with these new super-sick patients, who mostly died.

      There really can be no conclusion at all when looking at a study of this nature without knowing much more about individual clinical conditions and guiding principles behind clinician's decision making. It's still possible to make some reasonable assumptions:

      If I were Columbo?... I would say the non-HCQ cohort contained patients of extremes, with the best and worst potential. The worst would have been the very frail (malignancy and or congestive heart failure maybe ... see the stats), who probably were earmarked for 'supplemental oxygen' only from the very start. Such patients would not have been suitable for compassionate use of non proven drugs (remember, most of this came before the "emergency use" edict by FDA). This would explain the number of non-ventilated patients who died in this group (they may have been given AZ only, not being a controversial drug, but otherwise they did not get any significant interventional therapy). These patients would have had significant chronic disease and very poor obs/indices (including lymphopenia). But given that this cohort had, overall, a better starting prognosis than the other two groups, it means that the remaining patients in the group were promising candidates for survival (with better obs/indices). Such patients, not being part of a clinical trial, would not have been offered HCQ on a compassionate basis unless they got dramatically worse .... and of course, the ones who did get worse on the ventilator were started on HCQ (& often AZ as well) and thus swapped into the HCQ / HCQ & AZ cohorts.

      If we can understand that, then we might start to think that in fact HCQ & AZ is the best performing cohort with the other 2 vaguely distant. But this is being unfair to the HCQ cohort:

      The reason that a sick patient would be given one experimental drug on a compassionate basis (HCQ) but not have a rather less experimental drug further added (AZ), can really only be explained by considering risk versus benefit. A clinician would choose to use HCQ because the patient was particularly sick. The clinician would only add AZ if it was felt that this was worth the risk.... but a particularly sick patient with significant cardiovascular disease (the HCQ contained the most CVD risk) might then die of a more abrupt arrhythmia through adding yet another QT lengthening drug. I dare say the clinicians were tempted to make some "Hail Mary" plays, but we must remember, these patients were not part of an ongoing trial, these drugs were "ex-license" for compassionate use only and clinicians were still accountable for responsible actions. So for those particularly sick frail patients, it wasn't worth the risk.

      I am pretty sure that the HCQ cohort (which had pretty good pre-ventilator stats) crashed badly because it was loaded with the sickest patients .... patients that were too sick to risk adding AZ.

      So, the findings of this retrospective analysis are, in my opinion, likely to be incorrect.

      I believe I can confidently state that:

      1. The HCQ cohort started with the sickest patients and had even more of the sickest added during ventilation. Some were too sick to risk the addition of AZ to existing HCQ.
      2. The HCQ/AZ cohort also had some very sick patients (again with more additions during ventilation).
      3. The Non-HCQ cohort had the best prognosis overall from the very start (although likely a polarized mixture of the most frail and the most promising)... and then its stats got even better when it jettisoned its sickest ventilated patients into the other 2 cohorts.

      It is almost impossible to reach a conclusion from all this. BUT, the most likely finding is NOT that adding HCQ delivers a worse outcome than standard treatment. In fact, if we look at the pre-ventilator stats, the addition of HCQ might actually have provided considerable benefit to a particularly sick group of patients. Whether or not the addition of AZ to HCQ adds benefit is also unclear ... although my 'swingometer' is pointing slightly more to benefit than harm.

      Once again. I suggest that a robust study into prophylaxis and early treatment (using sensible safer doses adjusted for pulmonary sequestration) will deliver the most interesting results for CQ/HCQ.

      Dr Phil Davies<br /> Aldershot Centre For Health<br /> http://thevirus.uk

      EditView in discussion<br /> Discussion on medrxiv 3 comments<br /> medrxiv viewer<br /> Philip Davies<br /> Philip Davies 4 days ago<br /> The low dose arm of this study is worth following.

      The big problem for this study is comparison. It really has not defined the control population at all. The Italian and Chinese references are entirely different. Even the 2 Chinese populations referenced had massively different outcomes because the populations examined were different.

      The Italian mortality rate was actually similar to the overall study average here (but much higher than the low dose arm). The Chinese study involved all patients admitted to the two hospitals ... that included a majority of patients with moderate ("ordinary" as the Chinese class it) disease severity. The patients in this Brazilian study were regarded as severe or critical ... such patients (looking at worldwide stats) would attract a mortality of 30-40% plus.

      This is the most important factor. Do not compare apples with pears. So far this study points the "swingometer" in favor of benefit versus harm for the use of HQN in patients with advanced disease.

      Once again however, we are looking at the potential impact of an orally administered drug to patients with advanced disease. That's a big ask.

      For CQ and HCQ the most interesting results will likely come from studies looking at prophylaxis and early treatment (using safe doses, not silly high doses with added drugs that also lengthen QT). We can't yet guess how they will pan out.

      Dr Philip Davies<br /> GP<br /> Aldershot Centre For Health, UK<br /> http://thevirus.uk

    1. On 2020-04-16 12:20:10, user Marlowe Fox wrote:

      The tests on the efficacy of HCQ are confounded by multiple variables, including comorbidities, symptom onset, prescription drugs (RAAS inhibitors appear to play a key role in viral intensity), and testosterone/estrogen level, to name only a few.

      Geneticists, epidemiologists, and other scientists have long used casual diagrams to clearly show variables that may potentially confound their results (1). The Wuhan study at the very least would need to account for the following:

      HCQ <— comorbidities —> recovery<br /> HCQ <— symptom onset —> recovery<br /> HCQ <— drug prescriptions —> recovery

      Adjusting for the confounding variable would essentially smooth out the flow of information between the treatment (HCQ) and the outcome (recovery), allowing for the inference of causal effects.

      Assuming observable data is not available to adjust for confounding variables, a casual mechanism (mediator) could smooth out the flow of information from the treatment to the outcome (so long as the mediator is not influenced by confounder).

      Luckily, multiple in vitro studies have been performed. One study posits that HCQ lowers endosomal pH which ultimately inhibits COVID from binding to ACE 2 and decreasing viral intensity (3).

      HCQ —> endosomal pH —>glycosylation of COVID cellular receptor —> ACE 2 binding —> viral intensity —> acute lung injury

      Another in-silico study posits that HCQ blocks specific protein sites on the host ACE2 cell, thereby thwarting its attempt to infect it and preventing the cytokine storm (over-reaction of the lymphatic system) that some posit is responsible for Acute Lung Injury (3). So here we have an entirely different causal mechanism:

      HCQ —> BRD-2 receptor sites —> cytokine storm —> acute lung injury

      Despite these problems, some believe that the p-values obviate the need to control for potentially lurking variables. However, they are subject to myriad influences, known as p-hacking. Whether it is the number of tests performed or the number of comparisons made, it increases the chance of finding a statistically significant p-value (4). Three professional statisticians co-authored a paper reviewing the validity of the Wuhan study (5). There were several issues with the data upon which the two significant p-values were based.

      I suppose there is also a pragmatic argument: The p-values, along with existing studies and reports, are sufficient enough evidence to offset any concern for lurking variables in these urgent times. In other words, how much evidence is sufficient to warrant large scale roll-out of a low-cost treatment that may have a beneficial effect, from saving individuals who would have otherwise died to curbing its spread?

      The consequences of large roll-out: manufacturing, scaling, distribution chains, and so forth could result in a tremendous diversion of resources. How many pharmaceutical manufacturers even have the capacity to roll out production of this magnitude? What if they all start scaling their labor to produce this particular drug. You can’t just put this genie back into the bottle. Not to mention the scientific energy/intellectual capital that would go to proving or disproving this proposed treatment. And why? Because scientific evidence demanded it? No because a tortured p-value and unpublished/unsubstantiated anecdotal evidence caught the attention of some in the media, and it has been over-popularized as a panacea. What about the risk that HCQ is not an effective treatment despite large investments in cash and resources that have been invested? Do you think the wheels of capitalism turn so easily? Investors will want a return and if that means continually touting an ineffective drug through spurious science, they will continue to do so. What about individuals taking HCQ as a prophylactic, believing themselves to be protected against COVID? Or COVID+ individuals taking HCQ and believing themselves to be cured? Or individuals who think: Well, if I get it—I’ll just take HCQ and be fine. This would increase the spread of COVID. From my perspective, the ignorance to viral transmission and the required precautions is widespread. This is just one more reason not to acquiesce to the new social norms of wearing face masks, social distancing, and abiding by shelter-in-place rules. Here, I think an understanding of cognitive psychology is important to anticipate the future behavior of a society in which a cheap and easy-to-manufacture cure is published in the media.

      To sum up, HCQ's efficacy is not sufficiently proven to warrant a widespread roll-out, because it could result in several downstream consequences, from the diversion of resources (both manufacturing capabilities and intellectual capital) to increasing the risk threshold of individuals--who spurious believe in an easy and cheap treatment--thereby increasing the infection rate. One of two things needs to happen. Clinical trials that properly adjust for all potential comorbidities. Or the discovery of a causal mechanism (in vivo), which would obviate the need to control/adjust for confounders. For me, this would tip the utilitarian scales in regard to the potential benefits versus the risks.

      References

      1. Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect (1st. ed.). Basic Books, Inc., USA.
      2. https://www.ncbi.nlm.nih.go...
      3. https://papers.ssrn.com/sol...
      4. https://www.scientificameri....
      5. https://zenodo.org/record/3....
    1. On 2020-05-15 01:01:11, user Timeisrelative wrote:

      This is not my field of study but I hope my comments are helpful to you. Thank you for publishing this important work.

      The name "SD" for your metric is confusing for three reasons. 1) Standard deviation which is also used in the paper is commonly abbreviated as SD. 2)Recently less travel has *increased* what people commonly refer to as "social distancing", however your metric "SD" tends to *decrease*. 3)Mobility is only one aspect of the common definition of social distancing. Other aspects are not attending mass gatherings, standing at least 6 ft apart, not shaking hands, etc.(https://hub.jhu.edu/2020/03... "https://hub.jhu.edu/2020/03/13/what-is-social-distancing/)") These other aspects are not captured by your metric so again I think it's confusing to call it a "social distancing ratio" and use the abbreviation SD. Better names might be "Mobility Reduction" or "Relative Mobility".

      Further, according to Wikipedia: "During the COVID-19 pandemic, the World Health Organization (WHO) suggested favoring the term "physical distancing" as opposed to "social distancing", in keeping with the fact that it is a physical distance which prevents transmission; people can remain socially connected via technology." (https://en.wikipedia.org/wi... "https://en.wikipedia.org/wiki/Social_distancing)")

      Your metric SD is based on "the assumption that when individuals make fewer trips, they physically interact less." But you are not looking at the number of trips directly, instead you look at the deviation from normal levels of trips. Why not look directly at the number of trips? Different areas my have widely varying baseline numbers of trips and one would expect infection rates to vary correspondingly. By measuring the correlation between the actual number of trips and infection rates we could see if that is in fact true.

      I'm having trouble understanding the calculation of GR. You state "A GR equal to zero indicates no new confirmed cases were reported in the last three days" However, plugging 0 into the all three Cj in the numerator of the GR calculation leads to log(0/3+0/3+0/3). The result is undefined(negative infinity) not zero. You also state " a value below one means that the growth rate during the last three days is lower than that of the last week" and testing some sample data does not produce this result. Perhaps I'm misinterpreting your formula?

      FIG 3 What is the "Raw Date" line? In your description of GR you say "We use 3-day moving averages to smooth volatile case reporting data." Does that statement refer to the 3-day summation in the numerator of "GR" or is there an additional 3-day moving average taken after GR is computed?

      The GR calculation itself introduces a lag due to averaging the previous 3 days of data in the numerator and previous 7 days of data in the denominator. This distinction is important as you state that the value of the 9-12 day lag "reflects the time it takes for symptoms to manifest after infection, worsen, and be reported." In fact the lag from the calculation itself is also a factor.

      It's also unclear if your source data is the date a positive test was taken or the date the lab results came back. When we are talking about a lag on the order of 10 days, a 1-3 day delay for results could be significant. Further, source data including the date of symptom onset is available in some states and would be more useful as it would eliminate part of the lag which could be affected by test availability and speed.

      Why are only the top 25 counties are analyzed? I would be interested in seeing the metrics calculated in other lesser affected areas. In other words, could mobility reductions result in the prevention of outbreaks or just in the reduction of major outbreaks?

      The metrics you've chosen (SD and GR) follow very similar paths among all 25 counties analyzed. All 25 counties saw sharp drops in SD between March 10th and March 20th. All 25 counties saw sharp drops in GR a few weeks later. However, adding counties that didn't have a sharp reduction in SD during that time period would be revealing. Also adding counties that had GR paths that either dropped over different time periods or that grew much slower and steadier would also help reveal if GR and SD are correlated in wider situations.

      Caption to Fig 2 has redundant text "(vertical dashed red lines)"

      "King County, Washington is excluded because it precedes widespread social distancing and was driven by an infection source that differs from other outbreaks in the US." Previously you demonstrated that the SD metric is not well correlated with dates of implementation for local and state social distancing directives. King County shouldn't be excluded just because it precedes widespread social distancing. Also how is it known that the "infection source" is different from the outbreaks at the top 25 counties chosen?

      "Last, the data used in this analysis does not differentiate amongst sociodemographic groups, and therefore may not representatively capture all groups such as the elderly, low income families and underrepresentative minorities, for whom social distancing may not be an option, or may not have cell phones." Everyone in those groups with a mobile phone and that has the apps and permissions required for teralytics to track them is expected to be included in the dataset. The dataset may not be representative of the population at large but that is not *because* the dataset doesn't differentiate between sociodemographic groups.

      Conclusions: "In conclusion, our results strongly support the conclusion that social distancing pays dividends in the vital reduction of load on hospital systems in the United States." I think this conclusion is too broad. You show no data on load of hospital systems. Your data is on the reduction in reported cases correlating to reduced number of trips in severely affected areas not social distancing as a whole.

    1. On 2020-11-26 12:13:59, user Dr Gareth Davies (Gruff) wrote:

      Thank you for this fascinating analysis! It brings together a great deal of very useful information, and the data were presented in useful and transparent ways, and the tables and graphs especially helpful in understanding the data.

      I would like to offer some constructive feedback concerning the statistics and their interpretation, as some results appear to have been misinterpreted and this undermines this excellent work.

      The use of term "statistically significant" (18 occurrences included negatives) is especially concerning and goes against best-practice. P values and confidence intervals are frequently misinterpreted by both review authors and readers. A lack of evidence is not evidence of lack of effect. This is especially concerning where interpretations of dose, frequency and trial length are interpreted, as they give the impression that some were demonstrably effective whereas others were demonstrably not effective and the latter is not something this study could ascertain and should definitely not conclude or discuss.

      (Best practice recommendations from Cochrane Handbook for Systematic Reviews of Interventions version 6.1 C.15.3.2: "Review authors should not describe results as ‘statistically significant’, ‘not statistically significant’ or ‘non-significant’ or unduly rely on thresholds for P values, but report the confidence interval together with the exact P value.")

      There is a great deal of heterogeneity in the studies that cannot be measured by an I-squared metric but are important and will affect. Differences in study populations, sizes, country, latitude, age ranges, comorbidities, length of trial, method of assessing outcome, dosing freqency, % participants <25nmol/L, year of study etc. can all introduce very large unmeasurable confounding bias that may strongly influence results in ways that cannot be accounted for by software calculating CIs, P values, or I^2 measures. I would strongly urge great caution in interpreting these as meaningful.

      For example, in the group of studies where dose equivalent > 2000 IU/d, the studies vary enormously in almost every attribute and yet the I-squared metric suggests only moderate heterogeneity which is very misleading. It is especially telling that in some studies the reported incidence of > 1 ARI in the intervention and control arms is wildly different across studies: ~17% (Rake 2020) ~74% (Camargo 2020); ~96% (Murdoch 2012), casting strong doubt on the reliability of the measure to capture the outcome of interest to the study.

      Berman 2012 showed a small population (N=124) of patients in Sweden (latitude 60°N) susceptible to ARIs (assessed with symptoms, range 40%-60%) and with measured high-prevalence of D deficiency (11.45%) responded positively to >2,000 IU with an odds ratio of 0.43 (CI 0.21 - .88). Among others, these results are combined with Camargo 2020 in New Zealand (40°S) in a very large population (N=5,056) of healthy adults with low prevalence of D deficiency (1.8%) where (ARIs self-reported cold/flu incidence ~75%) with an odds ratio 0.90 to 1.16; and Lehouck 2012 (adults with chronic obstructive lung disease).

      It's hard to see how the data from these trials can be meaningfully combined. It's no surprise the comined CI was large 0.84 to 1.31 (in truth it will be far larger since bias and measurement errors have not been accounted for), but the only interpretation possible here is that we cannot interpret anything from these combined data and more research is needed.

      The same problem occurs when combining individuals with deficiency (<25nmol/L) giving a combined CI of 0.53 - 1.16. This is reported as "a statistically significant protective effect of vitamin D was not seen in those with the lowest 25(OH)D concentrations" which is then wrongly interpreted to mean evidence of no effect which is simply not the case. All this means is the statistical power was too low to detect an effect with high confidence. Arguably, there IS a detectable effect if we use a lower confidence threshold. (I'm not suggesting this, I'm merely pointing out how careful we need to be interpreting statistics).

      Results with CIs crossing null can say nothing about the existence or non-existence of an effect and should not be reported or interpreted as such, especially if the ranges are large. The inability to reject the null hypothesis is not proof of the null hypothesis. It's just lack of study power.

      Statements such as "Greater protective efficacy of lower vs higher doses" has no evidential basis and should be removed. This analysis did not show a greater protective effect at lower doses! It showed an effect at lower doses and had insufficient data at higher doses to investigate the question. The subsequent musing over potential mechanisms to explain this imagined difference should also be removed.

      I would also strongly caution against multivariable meta-regressions on trial characteristics. There are simply too many potential unmeasured confounders and sources of measurement error to trust that this method will produce meaningful adjustments. There's no telling if this would properly adjust, or conversely introduce bias and loss of precision.

      I think if these issues were addressed the study contributes some very important and useful results confirming the positive beneficial effects of vitamin D, and suggests more research could help to answer the questions where the data were insufficient to cast light.

      Congratulations on the paper and I hope this feedback is helpful!

      Best wishes,

      Gareth

    1. On 2020-10-07 06:13:12, user Markku Peltonen wrote:

      There were a number of comments on this manuscript on twitter early August, with concerns on errors in the calculations among others. Might be useful for others, so here is what I tweeted on August 5th 2020 (https://twitter.com/MarkkuP...: "https://twitter.com/MarkkuPeltonen/status/1290754970292281349):")

      Recently there was a meta-analysis on the effects of masks conducted in Finland. A number of comments has been made about the quality of the piece, so I had a quick look at it. As the analysis was also mentioned at least in Sweden, few quick comments in English. 1/10

      Background: the Finnish Ministry of Social Affairs and Health did a systematic review in May 2020 on the use of community face coverings to prevent the spread of Covid-19. There was no meta-analysis in the review, which focused on effectiveness. 2/10

      The conclusion on that report was “very little research data available on the effectiveness of community face coverings in preventing the spread of COVID-19 in society.” and evidence “minor” or “non-existent”. 3/10

      So, now then a formal meta-analysis, identifying the same 5 randomised controlled trials, showing an effect with relative risk estimate 0.61 (95% CI 0.39-0.96).<br /> Few points: 4/10

      The meta-analysis focuses on efficacy; what is achievable potentially when perfect conditions. They do something which they call “account of bias caused by non-compliance”; ie. if persons in the mask-group did not were masks they “adjust” for this. 5/10

      To me, this sounds quite controversial: In my world we look at intention-to-treat first, and then perhaps maybe on the “per-protocol”/“as treated”. <br /> Efficacy important, but this is now something different than what the original systematic review aimed at. 6/10

      The problems of this accentuate in the Discussion, where the authors do not seem to understand the difference in efficacy and effectiveness, nor the fact that they are actually analysing something else than the original review, and making way too far-fetched conclusions. 7/10

      There are other peculiarities, for example “Four of the analyzed studies evaluated the use of masks on respiratory infections directly, and in one the primary outcome was compliance with mask use.”. Hopefully an error, I don’t believe they actually mix the outcomes like this. 8/

      . @jejkarppinen added the following comments after my initial post, which I agree with:<br /> - The potential biases in the original papers were not covered.<br /> - Quality of evidence was not evaluated at all.<br /> - Dissemination of the results did not consider the potential problems. 9/10

      Finally:<br /> - I've not read the original 5 studies. <br /> - I’m not an expert on systematic reviews/meta-analyses. <br /> - I do think recommendation for masks is motivated, and the evidence is there (but not here..).<br /> - I do think we should be objective when evaluating evidence. 10/10

      The original systematic review the Finnish Ministry of Social Affairs and Health in Finnish is here (english abstract only):<br /> http://julkaisut.valtioneuv...

      Ps. Somebody noted the lack of preregistered protocol, which reminded me that the PRISMA-guidelines helpful when reporting systematic reviews and meta-analyses. <br /> Their checklist should be followed in reporting:<br /> http://prisma-statement.org

      In addition, it was noted by Jesper Kivelä that there are errors in the calculations, these should be corrected (in Finnish):<br /> https://twitter.com/JesperK...

    1. On 2020-10-22 18:25:28, user helen colhoun wrote:

      From Helen M Colhoun, AXA Chair in Medical Informatics & Epidemiology, University of Edinburgh. Honorary Consultant in Public Health Medicine.<br /> David McAllister, Senior Clinical Lecturer in Epidemiology and Honorary Consultant in Public Health Medicine, University of Glasgow.<br /> The authors should be commended on attempting to characterise long-COVID-19. Post-viral syndromes are a well- recognised phenomenon and it is important to accurately quantify the full range of the COVID-19 on health. The authors are careful to state that their reported risks pertain only to those with symptomatic COVID. However there are several reasons to think that even among those symptomatic that these results may be subject to serious bias. First of all there is a fundamental weakness of estimating risk based on a non-representative sampling frame, i.e. those who have chosen to use the app in the first place. Then after dropping around half of the 45839 persons who tested positive as being asymptomatic (the numbers in the first part of the flow diagram do not quite add up) a further 14443 are dropped because of starting to use the app whilst already unhealthy- it is not clear whether some of this represents people reporting symptoms well before diagnosis. Then 25% of those remaining are dropped for not persistently logging their symptoms (which could easily be much more common in people with no persisting symptoms than those without). <br /> Another major problem is the lack of specificity of the diagnosis. The disease state of long-COVID19 would appear to be defined as having “at least one symptom lasting more than one day” which has then been further categorised as LC28 or LC56 if symptoms persisted for these number of days. These symptoms include clearly non-specific symptoms such as “fatigue” , “unusual muscle aches and pains” and “skipping a meal”. No comment is made as to the prevalence of such symptoms in the other millions of users of the ZOE app. In the paper we find a hint of the lack of specificity in that in a matched set of test negatives we find that “Individuals with long-COVID were more likely to report relapses (16.0%)….In comparison, in the matched group of 139 SARS-CoV2 negative tested individuals, a new bout of illness was reported in 11.5% of cases.” This difference could easily be attributable to recall bias since at least a large proportion of those with positive tests will have known their result.<br /> Unfortunately this paper is being widely reported in the press as showing that “long COVID affects around 10% of 18 to 49-year-olds who catch the virus.” However those studied comprise just 15% of all those with evidence of infection and it is plausible that many of those not studied have no evidence of long COVID. That is even before we consider the problem that most people who have “caught the virus” don’t even get tested. It would be more correct to say this; “having excluded 85% of people with detected COVID-19 who were asymptomatic or did not continue to record their symptom status, we find that 10% of young people with a positive test report at least one symptom for 28 days and 2% report at least one symptom for 56 days.These symptoms are not specific for COVID-19 and are commonly found in the general population. “ We suggest that the authors to make this important distinction clear in the title of the final version of their manuscript or it will continue to be misquoted. We also suggest that they discuss the impact of the potential biases raised above more fully.

    1. On 2019-07-16 13:28:54, user Guyguy wrote:

      EVOLUTION OF THE EBOLA EPIDEMIC IN THE PROVINCES OF NORTH KIVU AND

      ITURI.

      NEWS:

      High-Level Meeting on Ebola in Geneva

      On Monday, July 15, 2019, the Minister of Health, Dr. Oly Ilunga Kalenga, participated in the high-level meeting in Geneva to mobilize the international community to end the Ebola epidemic in the Democratic Republic of the Congo. His statement is available: Ladies and gentlemen.

      Since August 1, 2018, the Democratic Republic of the Congo is facing the Ebola epidemic<br /> the most complex of its history and the history of public health.<br /> As you followed yesterday, July 14, a positive case from Butembo was declared in the city of Goma. This morning, the positive case, quickly identified and isolated, was<br /> repatriated to Butembo. Vaccination has been launched for all contacts. Since the beginning of this epidemic, we prepared with the WHO for the possibility of positive cases in Goma.<br /> The situation is therefore under control and is being managed, as we did a few weeks with the positive case reported in Uganda. By the way, as a reminder, Goma is not the first provincial capital to report a positive case. This was the case in Bunia there<br /> a few weeks and in Mbandaka during the ninth epidemic of Ebola Virus Disease<br /> occurring in the province of Ecuador from May 7 to July 24, 2018.<br /> The risk factors of the current epidemic remain:<br /> - The density of the population;<br /> - the high mobility of the population;<br /> - The geographical area concerned covering 23 health zones spread over 2 provinces;<br /> - Part of the response is deployed in areas of military operation where armed groups and community militias;<br /> - The instrumentalisation of the epidemic by certain political actors during the period<br /> election.<br /> The tenth Ebola outbreak is not a humanitarian crisis. It's a health crisis public service, which intervenes in an environment characterized by development and shortcomings of the health system. This crisis requires a technical public health response to break the chain of<br /> transmission of the virus by relying on the actors of the health system and its partners<br /> traditional.<br /> Several pillars are thus implemented to break the chain of transmission, whose<br /> vaccination. The Ministry of Health has invited the last 28 and 29 June in Kinshasa, the<br /> producers of the four most advanced vaccines to fight Ebola, as well as the experts<br /> national and international for a meeting of scientific exchanges on vaccination in<br /> part of the ongoing epidemic. It emerged from these exchanges that the vaccine produced by the Merck, currently used in this outbreak, is the only one that has demonstrated its<br /> efficacy for reactive vaccination in the case of the current response. The good news<br /> is that there are enough doses available of this vaccine. To avoid confusion and<br /> amalgams in the difficult context of this epidemic, the Ministry of Health decided that no other vaccine trial would be implemented in the DRC until the tenth epidemic<br /> will be in progress.<br /> To date, thanks to the commitment of all, sufficient funds have been mobilized for<br /> previous response plans. On behalf of the Congolese Government, I express my gratitude to all donors.<br /> In developing the third strategic response plan (SRP3), covering the period of from February to July 2019, a special effort was made to put in place information for monitoring activities and expenditures to increase accountability operational than the financial accountability of all actors.<br /> The process of developing the fourth strategic response plan (SRP4), which will cover the<br /> period from July to December 2019, ended this Friday, July 12, 2019 in Goma. The<br /> The process was participatory and inclusive, and took into account lessons learned on an ongoing basis.<br /> The methodology for budgeting - bottom up - is part of the unit costs and<br /> the volume of the different activities to be implemented in each zone of<br /> health; these were then aggregated by sub-coordination.<br /> The Government is grateful for the contribution of our various partners as well as<br /> donors. However, this support must be in the respect of the Government, and in<br /> partnership with institutions and not in parallel. Only the anchoring of the riposte in the<br /> health system and the strengthening of the actors of the Ministry of Health will<br /> to ensure the sustainability of all achievements of the response. All sectoral support plans for the response must be developed in the same spirit, in consultation with the ministries<br /> sector. Public health actors want to make SRP4 a "final push". To get there, we demand from all actors of discipline and accountability. In each pillar, in each sub-coordination, the Ministry of Health and the co-leaders accredit implementation agencies on the basis of five criteria to ensure accountability:<br /> - Have a demonstrated operational capacity with regard to the number and<br /> the expertise of human resources (not agencies in "learning curve", recruiting<br /> on Linkedin for North Kivu);<br /> - Rationalize geographical deployment and ensure an effective presence on the<br /> field (not just attending meetings);<br /> - Commit to implementing the activities according to the validated protocols for the response;<br /> - Make a commitment to transmit the data to the General Coordination of the response, in<br /> respecting the reporting tools that allow the monitoring of the indicators of<br /> performance and produce dashboards;<br /> - Commit to adopting the scales and the Manual of Procedures for the Management of<br /> human resources developed by the Ministry of Health and the World Bank, which<br /> that no other vaccine trial would be implemented in the DRC until the tenth epidemic<br /> will be in progress.<br /> To date, thanks to the commitment of all, sufficient funds have been mobilized for<br /> previous response plans. On behalf of the Congolese Government, I express my gratitude to all donors.<br /> In developing the third strategic response plan (SRP3), covering the period of<br /> from February to July 2019, a special effort was made to put in place information for monitoring activities and expenditures to increase accountability operational than the financial accountability of all actors.<br /> The process of developing the fourth strategic response plan (SRP4), which will cover the<br /> period from July to December 2019, ended this Friday, July 12, 2019 in Goma. The process was participatory and inclusive, and took into account lessons learned on an ongoing basis.<br /> The methodology for budgeting - bottom up - is part of the unit costs and the volume of the different activities to be implemented in each zone of health; these were then aggregated by sub-coordination. The Government is grateful for the contribution of our various partners as well as donors. However, this support must be in the respect of the Government, and in<br /> partnership with institutions and not in parallel. Only the anchoring of the riposte in the<br /> health system and the strengthening of the actors of the Ministry of Health will<br /> to ensure the sustainability of all achievements of the response. All sectoral support plans for the response must be developed in the same spirit, in consultation with the ministry<br /> sector. Public health actors want to make SRP4 a "final push". To get there, we<br /> demand from all actors of discipline and accountability.<br /> In each pillar, in each sub-coordination, the Ministry of Health and the co-leaders<br /> accredit implementation agencies on the basis of five criteria to ensure<br /> accountability:<br /> - Have a demonstrated operational capacity with regard to the number and<br /> the expertise of human resources (not agencies in "learning curve", recruiting<br /> on Linkedin for North Kivu);<br /> - Rationalize geographical deployment and ensure an effective presence on the<br /> field (not just attending meetings);<br /> - Commit to implementing the activities according to the validated protocols for the response;<br /> - Make a commitment to transmit the data to the General Coordination of the response, in<br /> respecting the reporting tools that allow the monitoring of the indicators of<br /> performance and produce dashboards;<br /> - Commit to adopting the scales and the Manual of Procedures for the Management of<br /> prepared by the Ministry of Health and the World Bank, whom I wish to thank in particular for its unfailing support for the Government since the beginning of this epidemic.<br /> Only discipline and accountability will allow us to put an end to this epidemic, which has<br /> that too long.<br /> Now is the time to think about the post-Ebola era and start developing with others<br /> sectors, ambitious development plans that alone will be able to resolve fundamental problems of the population.<br /> Thank you.<br /> Source: Ministry of Health press team on the state of the response to the Ebola epidemic in the Democratic Republic of Congo

    1. On 2022-02-08 21:40:31, user Pierre Siffredi wrote:

      One factor influencing the validity of cross ancestry PRS is ancestral differences in the meaning of the phenotype, as well as the validity/reliability characteristics of it's measure.

      For example, it's been proposed that there be race specific charts for BMI. Given a white person and black person with the same BMI, the black person may have e.g. higher bone density, muscle mass, etc. But the genetics of these things, if observed in a white person, would give them a low BMI. Thus for this black person, using a european-based-PRS prediction of BMI provides a very different estimate from their observed BMI.

      When you get into softer phenotypes such as psychiatric measures, do we necessarily think that people of different ancestral backgrounds with the same BDI score have the same amount of depression? Does the concept of depression even hold consistently across ancestral background? If it does, does the variance hold constant too (thus affecting the r-squared predicted by PRS)?

      I think this notion is something under-explored in the context of PRS due to lack of availability of data, limited clinical/practical understanding of the phenotype (especially appraisals of measure validity in different groups), and the lazy desire to pretend as if we have perfectly measured everything and that there is no difference between the observed and latent variable.

    1. On 2021-12-19 11:46:40, user Kjell Krüger wrote:

      Tables and figures in the study point out that some 50% of the selection have status "unv." <br /> and "not born in Norway". Statistics from the study also marks out that some 80% of the total selection comes from the South-East region of Norway. Finally some 35% of unv. are marked with virusvariant "unknown", which we may suppose is other than omicron, as the study was done in the period up to october? It could be of interest to se some more deviation analyzes made on these parameters. Amount of beds i Norwegian hospitals are stated by SSB to be some 11500 beds, of which now some 400 are occupied with cov patients. I suppose all these parameters also should be interesting indata for future planning for how to manage future epidemic crises in Norway. Maybe new studies also will highlight possibilities that some regions should be set up with more capacity and competence than others, with the possibility to also transport both personell and patients between regions? I think questions and answers on these matters will be of big interest for politicians in both locally, regionally and nationally area one day when this crisis fade out - and preparation for the next one begins.

    1. On 2020-06-06 01:33:13, user David Hood wrote:

      I think the "39.5% of cases seeking medical consultation in primary care settings" may be overly conservative in the model for a parameter representing getting medical advice, as it is based of influenza in the 2018 'flu season (a fairly typical year). We know from the ESR influenza surveillance site that healthline historically (I don't know the period for what they determine historical) get around 40000 Influenza like illness calls a year, and for the period from the week of 14/2 to 29/5 there are historically around 10000 ILI calls. In 2020, for the period from the week of 14/2 to 29/5, there were around 26000 ILI calls. Even allowing for false positive worries from anxious people boosting call numbers, it suggests that people seeking official advice about ILI is dramatically higher in 2020 (which I also acknowledge is not the same as visiting a primary care location about an ILI, which is the 39.5% figure, but the official advice was to ring Healthline, who were presumably advising testing/ isolation/ primary health as appropriate)

    1. On 2023-07-21 14:12:39, user Gaël Nicolas wrote:

      I think that this variant is definitely a strong contributor to AD. However, the pedigrees also show that the patients with DNA available and carrying the variant, also carry one APOE4 allele. Actually, APOE4 segregates as good as SORL1 in these pedigrees! All affected individuals with DNA available are SORL1+/APOE4+. One unaffected individual is SORL1+/APOE4- (family 1) and one unaffected individual is SORL1-/APOE4+ (family 2). To be clear, I have absolutely no doubt of a major role of the SORL1 variant here, but I feel that this is very much consistent with a more complex inheritance and not purely autosomal dominant, as shown in our penetrance paper (Schramm et al., Genome Medicine 2022, PMID 35761418)

      Interestingly, we have the same variant in three independant families from France (one of them is mentioned in this preprint). Although there is an obvious aggregation of AD cases in the families, there is a huge diversity of ages of onset and younger cases have a positive family history in both branches, suggesting the contribution of additional factors. Some of them are APOE4+ but not the 2 youngest probands. This may suggest the contribution of undetected contributing variants along with SORL1.

      Overall, our penetrance paper (Schramm et al., 2022) and many pedigrees suggest a contribution of additional factors with SORL1 variants and that SORL1 alone may not be sufficient / fully penetrant. We have clear evidence for APOE4, as this is a common allele, but we know that there are many other other AD-associated variants, especially rare variants, among known variants (as families with SORL1+ABCA7 as we previously reported in Campion et al., Acta Neuropath 2019, PMID 30911827) and in other papers and, obviously, not yet known variants.

      I thus recommend to use such results with great caution for genetic counseling, as we still don't exactly know how variants in other genes may drastically change an age of onset from 50 to 75-80 for example, or to absence of AD (as also shown for some truncating variants, as in Campion et al., 2019 where a mother transmitted a truncating a truncating variant and was unaffected with AD at age 95 years).

    1. On 2020-06-24 18:56:17, user André GILLIBERT wrote:

      Title : Proposal for improved reporting of the Recovery trial<br /> André GILLIBERT (M.D.)1, Florian NAUDET (M.D., P.H.D.)2<br /> 1 Department of Biostatistics, CHU Rouen, F 76000, Rouen, France<br /> 2 Univ Rennes, CHU Rennes, Inserm, CIC 1414 (Centre d’Investigation Clinique de Rennes), F- 35000 Rennes, France

      **Introduction**

      Dear authors,<br /> We read with interest the pre-print of the article entitled “Effect of Dexamethasone in Hospitalized Patients with COVID-19: Preliminary Report”. This reports the preliminary results of a large scale randomized clinical trial (RCT) conducted in 176 hospitals in the United Kingdom. To our knowledge it is the largest scale pragmatic RCT comparing treatments of the COVID-19 in curative intent. The 28-days survival endpoint is objective, clinically relevant and should not be influenced by the measurement bias that may be caused by the open-label design. While 2,315 study protocols have been registered on ClinicalTrials.gov about COVID-19, as of June 24th 2020, Recovery is, to our knowledge, the only randomized clinical trial on COVID-19 that succeeded to include more than ten thousands patients. The open-label design and simple electronic case report form (e-CRF) may have helped to include a non-negligible proportion of all COVID-19 patients hospitalized in the United Kingdom (UK). Indeed, as of June 24th 2020, approximatively 43,000 patients died of COVID-19 in hospital in the UK, of whom approximatively 0.24 × 11,500 = 2,760, that is more than 6% of all hospital deaths of COVID-19, where included in the Recovery study.<br /> Having read with interest version 6.0 of the publicly available study protocol (https://www.recoverytrial.n... "https://www.recoverytrial.net/files/recovery-protocol-v6-0-2020-05-14.pdf)") we had hoped for more details in the reporting of methods and results of this trial and take advantage of the open-peer review process offered by pre-prints servers to suggest improving some aspects of the reporting before the final peer-reviewed publication. Please, find below some easy to answer comments that may help to improve the article overall.

      **Interim analyses and multiple treatment arms**

      The first information would be about interim analyses. The protocol (version 6.0) specifies that it is adaptive and that randomization arms may be added removed or paused according to decisions of the Trial Steering Committee (TSC) basing its decision on interim analyses performed by the Data Monitoring Committee (DMC) and communicated when “the randomised comparisons in the study have provided evidence on mortality that is strong enough […] to affect national and global treatment strategies” (protocol, page 16, section 4.4, 2nd paragraph). The Supplementary Materials of the manuscript specifies that “the independent Data Monitoring Committee reviews unblinded analyses of the study data and any other information considered relevant at intervals of around 2 weeks”. This suggests that many interim analyses may have been performed from the start (March 9th) to the end (June 8th) of the study.<br /> Statistically, interim analyses not properly taken in account generate an inflation of the type I error rate which may be increased again by the multiple treatment arms. Methods such as triangular tests make it possible to control the type I error rate. Most methods of control of type I error rate in interim analyses require that the maximal sample size be defined a priori and that the timing and number of interim analyses be pre-planned. This protocol being adaptive, new arms were added, implying new statistical tests in interim analyses, and no pre-defined sample size as seen in page 2 of the protocol: “[...] it may be possible to randomise several thousand with mild disease [...], but realistic, appropriate sample sizes could not be estimated at the start of the trial.” This make control of the type I error rate difficult. The fact that the study has been stopped on the final analysis as we understand from the current draft rather than interim analysis does not remove the type I error rate inflation. The multiple treatment arms lead to another inflation of the type I error rate.<br /> The current manuscript does not specify any procedure to fix these problems. The Statistical Analysis Plans (SAP) V1.0 (in section 5.5) and V1.1 (in section 5.6) specify that “Evaluation of the primary trial (main randomisation) and secondary randomisation will be conducted independently and no adjustment be made for these. Formal adjustment will not be made for multiple treatment comparisons, the testing of secondary and subsidiary outcomes, or subgroup analyses.” and nothing is specified about interim analysis. Therefore, we conclude that no P-value adjustment for multiple testing has been performed, neither for multiple treatment arms nor for interim analysis. If an interim analysis assessing 4 to 6 treatment arms at the 5% significance level has been performed every 2 weeks from march to June, up to 50 tests may have been performed, leading to major inflation of type I error rate. In our opinion, the best way to assess and maybe fix the type I error rate inflation, is to report with maximal transparency every interim analysis that has been performed, with the following information:<br /> 1. Date of the interim analysis and number of patients included at that stage<br /> 2. Was the interim analysis planned (e.g. every 2 weeks as planned according to supplementary material) or unplanned (e.g. due to an external event, for instance the article of Mehra et al about hydroxychloroquine published in The Lancet, doi:10.1016/S0140-6736(20)31180-6), and if exceptional, why?<br /> 3. Which statistical analyzes, on which randomization arms, have been performed at each stage <br /> 4. If predefined, what criteria (statistical or not) would have conducted to early arrest of a randomization arm for inefficiency and what criteria would have conducted to arrest for proved efficacy?<br /> 5. If statistical criteria were not predefined, did the DMC provide a rationale for his choice to communicate or not the results to the TSC? If yes, could the rationale be provided?<br /> 6. The results of statistical analyzes performed at each step<br /> 7. The decision of the DMC to communicate or not the results to the TSC and which results have been reported as the case may be<br /> The information about interim analyses and multiple randomization arms will help to assess whether the inflation of type I error rate is severe or not. A post hoc multiple testing adjustment, taking in account the many randomized treatments and interim analyses, should be attempted, and discussed, even though there may be technical issues due to the adaptative nature of the protocol.

      **Adjustment for age**

      An adjustment for age (in three categories <70 years, 70-79, >= 80 years, see legend of table S2) in a Cox model was performed for the comparison of dexamethasone to standard of care in the article. This adjustment was not specified in the version 6.0 of the protocol but was, according to the manuscript “added once the imbalance in age (a key prognostic factor) became apparent”. This is confirmed by the addition of a words ““However, in the event that there are any important imbalances between the randomised groups in key baseline subgroups (see section 5.4), emphasis will be placed on analyses that are adjusted for the relevant baseline characteristic(s).” in section 5.5 page 16 of the SAP V1.1 of June 20th compared to the SAP V1.0 of June 9th which specified a log-rank test. The SAP V1.0 of the 9th June may have been written before the database has been analyzed (data cut June 10th) but the SAP of the 20th has probably been written after preliminary analysis have been performed. This is consistent with the words “became apparent” of the manuscript. Therefore, in our opinion, this adjustment must be considered as a post hoc analysis rather than as the main analysis. Moreover, even though the SAP V1.1 specifies that an “important imbalance” will lead to an “emphasis” on adjusted analyses, it does not change the primary analysis (see section 5.1.1 page 14). It is not clear what “important imbalance” means. To interpret that, we will perform statistical tests to assess balance of key baseline subgroups specified in SAP V1.1 (see section 5.4):<br /> 1. Risk group (three risk groups with approximately equal number of deaths based on factors recorded at randomisation). Its distribution is shown in figure S2. A chi-square tests on the distribution of risk groups in Dexamethasone 1255/500/349 and Usual care 2680/926/715 groups, lead to a P-value=0.092. A chi-square test for trend yields a P-value equal to 0.23.<br /> 2. Requirement for respiratory support at randomisation (None; Oxygen only; Ventilation or ECMO). P-value=0.89 for chi-square test and P-value=0.86 for chi-square for trend.<br /> 3. Time since illness onset (<=7 days; >7 days). P-value=0.17<br /> 4. Age (<70; 70-79; 80+ years). P-value=0.016 for chi-square test, p=0.019 for chi-square test for trend<br /> 5. Sex (Male; Female). P-value=0.97 for chi-square test<br /> 6. Ethnicity (White; Black, Asian or Minority Ethnic). No data found.<br /> The criteria to define “important imbalance” seems to be statistical significance at the 0.05 threshold, however that should have been stated and tests for all other variables should have been provided too.<br /> First, this adjustment, from a theoretical point-of-view, was not necessary since the study was randomized; if the exact condition of imbalance triggering the adjustment was pre-specified in the protocol or SAP before the imbalance was known, it could induce a very slight reduction of the type I error rate and power. However, as it was performed when the imbalance was known, there is a risk that the sign of the imbalance (i.e. higher age in the dexamethasone group) have influenced the choice of adjustment. Indeed, an adjustment conditional to a higher age in the dexamethasone group will increase the estimated effect of dexamethasone in these conditions, and so, provide an inflation of the type I error rate. If the same conditional adjustment were further considered for other prognostic variables, the inflation could even be higher. <br /> Unless there is strong evidence that the amendment to the SAP was performed without knowledge of the sign of the imbalance (higher age in the dexamethasone group), we suggest that the primary analysis be kept as originally planned, without adjustment, and that the age adjustment be performed in a sensitivity analysis only. The knowledge of the sign of the unbalance is unclear in the last version of the SAP (V1.1, June 20th) and in the manuscript. In addition, in an open label trial, it is always better to stick to the protocol.

      **Results in other treatment arms**

      The manuscript specifies that “the Steering Committee closed recruitment to the dexamethasone arm since enrolment exceeded 2000 patients.” It is not stated whether any other treatment arm has exceeded 2000 patients or not and whether the study is still ongoing. Results of treatment arms that have been stopped should be provided (all arms having enrolled more than 2000 patients?). If not, the number of patients randomized in other treatment arms should, at least, be reported. If the study is completely stopped, all treatments should be analyzed and reported, unless there is a specific reason not to do so; that reason should be stated as the case may be. This data would be useful to provide evidence on other molecules. It would also clarify the number of statistical tests that have been performed or not, providing more information about the overall inflation of alpha risk.

      **Sample size**

      The paragraph about the sample size suggests that inclusions were planned, at some time, to stop when 2000 patients were included in the dexamethasone arm. The amended protocol (May 14th), the SAP V1.0 (June 9th) and the SAP V1.1 (June 20th, 4 days after the results have been officially announced) all have a paragraph about the sample size but all specify that the sample size is not fixed and none specify any criteria of arrest of the research based on sample size. There are 2104 patients included in this arm, which is substantially larger than the target of 2000 patients. The exact chronology and methodology should be clarified: when was the sample size computed and what was the exact criteria to arrest the research? Could the document (internal report?) related to this sample size calculation and statistical or non-statistical decision of arrest of the research be published in supplementary material?<br /> Indeed, assessment of the type I error rate requires knowing exactly when and why the research has been arrested: arrest for low inclusion rate of new patients or for reaching target sample size cannot be interpreted the same as arrest for high efficacy observed on an interim analysis.

      **Future of the protocol**

      With the new evidence about dexamethasone, the protocol will probably be stopped or evolve. The future recruitment may slow as the peak of the epidemic curve in United Kingdom is passed. The past, present and future of the protocol needs also to be known to assess the actual type I error rate. Indeed, future analyses, that have not yet been performed influence the overall type I error rate. That is why we suggest that author’s provide the daily or weekly inclusion rate from March to June and discuss the future of the study.

      **Loss to follow-up**

      Table S1 shows that the follow-up forms have been received for 1940/2104 (92.2%) patients of the dexamethasone group and 3973/4321 patients of the usual care group (91.9%). The patients without follow-up forms (8.5% overall) may either be lost to follow-up or have been included in the 28 last days before June 10th 2020 (data cut). The manuscript mentions that 4.8% of patients “had not been followed for 28 days by the time of the data cut”, suggesting that 8.5%-4.8% = 3.7% of patients are lost to follow-up, but that is our own interpretation. We suggest that authors report the actual number of loss to follow-up and how their data have been imputed or analyzed. The number of loss to follow-up may differ for different outcomes. For instance, if the Office of National Statistics (ONS) data has been used for vital status assessment, there should be no loss to follow-up on that outcome.

      **Vital status**

      The current manuscript only specifies the data of the web-based case report (e-CRF) form, filled by hospital staff, as source of information, suggesting that it is the only source of information about the vital status. The document entitled “Definition and Derivation of Baseline Characteristics and Outcomes” provided at https://www.recoverytrial.n... specifies many other sources. For instance, the vital status had to be assessed from the Office of National Statistics (ONS). Other sources, including Secondary Use Service Admitted Patient Care (SUSAPC) and e-CRF could be used for interim analysis. The ONS was considered as the defining source (most reliable). Whether the ONS data has been used or not should be clarified. If the ONS data have been used, statistics of agreement of the two data sources (e-CRF and ONS) may be provided to help assessing the quality of data. If the ONS data have not been used, this deviation from the planned protocol should be documented.<br /> The manuscript as well as the recovery-outcomes-definitions-v1-0.pdf file specifies that the follow-up form of the e-CRF is completed at “the earliest of (i) discharge from acute care (ii) death, or (iii) 28 days after the main randomisation”. If the follow-up form is not updated further, patients discharged alive before day 28 (e.g. day 14) may have incomplete vital status information at day 28. The following information should be specified:<br /> 1. Whether the follow-up form of the e-CRF had to be updated by hospital staff at day 28 for these patients<br /> 2. If response to (1) is yes, whether there was a means to distinguish between a lost to follow-up at day 28 (form not updated) and a patient discharged and alive at day 28 (form updated to “alive at day 28”)<br /> 3. If response to (2) is yes, how many patients discharged before day 28 were lost to follow-up at day 28<br /> 4. If response to (2) is yes, how has their vital status at day 28 been imputed or managed in models with censorships (log-rank, Kaplan-Meier, Cox)<br /> Of course, this information is really needed if the ONS and SUSAPC data have not been used.<br /> The quality of the vital status information is critical in such a large scale open-label multi-centric trial, because there is a risk that one or more center selectively report death, biasing the primary analysis.

      **Inclusion distribution by center**

      A multicentric study provides stronger evidence than a single-center study but sometimes, few centers include most patients, with a risk of low-quality data or selection bias. The very high number of included patients in the Recovery trial suggests that many centers included many patients but the distribution of inclusions per center could be reported.

      **Randomization**

      The protocol specifies that “in some hospitals, not all treatment arms will be available (e.g. due to manufacturing and supply shortages); and at some times, not all treatment arms will be active (e.g. due to lack of relevant approvals and contractual agreements).” This is further clarified in the SAP V1 (section 2.4.2 Exclusion criteria, page 8) by the sentence “If one or more of the active drug treatments is not available at the hospital or is believed, by the attending clinician, to be contraindicated (or definitely indicated) for the specific patient, then this fact will be recorded via the web-based form prior to randomisation; random allocation will then be between the remaining (or indicated) arms.” Showing that randomization arms may be closed on an individual basis, when the patient is included, with the argument of contraindication or definitive indication. It seems that the “standard of care” group could not be removed and that at least another randomization arm had to be kept as suggested by the words “random allocation will then be between the remaining arms (in a 2:1:1:1, 2:1:1 or 2:1 ratio)” in section 2.9.1 page 11 of the SAP V1.0. Even exclusion of a single randomization arm can lead to imbalance between groups. For instance, if physicians believed that a treatment was contraindicated for the most severe patients, only non-severe patients could be randomized to the treatment’s arm, while most severe patients would be randomized to other arms. Several things can be done to assess and fix this bias. First, report how many times this feature has been used and which randomization arms have been most excluded. If it has been used many times, provide the pattern of use that help to assess whether this is a collective measure (e.g. 2-weeks period of shortage of a treatment in a center ? no major selection bias) or individual measure. If its use has been rare, a sensitivity analysis could simply exclude these patients. If it has been frequent, we suggest a statistical method to analyze this data without bias, based on the following principles: patients randomized between 3 randomization arms A, B and C (population X) are comparable for the comparisons of A to B. Patients randomized between A, B and D (population Y), are comparable for the comparisons of A to B. Population X and population Y may differ but, inside each population, A can be compared to B. Therefore, the within-X comparison of A to B and within-Y comparison of A to B are both valid and can be meta-analyzed to assess a global difference between A and B. This can be simply done with an adjustment on the population (X or Y) in a fixed effects multivariate model. Pooling of X and Y populations should not be performed without adjustment.<br /> A second problem with randomization exists although the dexamethasone arm is the least affected. Randomization arms have been added in this adaptative trial. When a new randomization arm is added, new patients may be randomized to this arm and fewer patients are randomized to other arms. Consequently, the distribution of dates of inclusion may differ between groups. This may have some impact on the mortality at two levels: (1) the medical prescription of hospitalization may have evolved as the epidemic evolved, with hospitalization reserved to most severe patients at the peak of epidemic and maybe wider hospitalization criteria at the start of epidemic and (2) evolution of patients included in the Recovery trial. Indeed, even if centers should have included as many patients as possible as soon as their inclusion criteria were met, it is possible that they have only included part of eligible patients and that this part evolved with time. This bias can be easily assessed and fixed: the curves of inclusions in the different arms and mortality rate in the Recovery trial can be drawn as a function of date (from March to June) and an adjustment on date of inclusion may be performed in a sensitivity analysis.

      **Conclusion**

      Recovery is the study with the best methodology that we have seen on COVID-19 treatments in curative intent and we salute the initiative of publishing transparently the protocol, its amendments, the statistical analysis plan and the first draft of the report. We hope that our reporting suggestions will be taken in account in the final version of the paper. We think that discussing these points will qualify the interpretation of results, further improve the transparent approach adopted by designers of the study and improve the reliability of the conclusions. We expect a high-quality reporting of these final results, with full transparency on interim analyses, statistical analysis plans and statistical analysis reports. We hope that these comments are helpful and again we acknowledge that this study is not solely outstanding in terms of importance of the results but is also a stellar example for the whole field of therapeutic research. We invite other researchers to provide comments to this article to engage in Open Science.

    1. On 2022-01-14 00:43:08, user disqus_mV149tuM7g wrote:

      I am not a medical professional, but a common sense confounding variable immediately popped up in my mind, for which this (and most other studies) did not control for (though I understand it may not have been possible to control for it in this study given the data collection method, but more so I am baffled that from what I see 0 scientists and humans on earth apparently have thought of this common sense confounding variable and 0 studies that I know for attempted to control for it):

      A) Do we not know that omicron is more similar to the common cold compare to delta? B) Do we not know that there is at least some common T cell protection across different coronaviruses, such that even T cells produced from a common cold give at least some protection against covid?

      So then, without any further medical knowledge, the immediate common sense confounding variable that pops up in my mind using basic inferential logic is that if A and B are true, could it be that given the timing of omicron (came in early winter) compared to delta (came in summer), much more people had a common cold before omicron as opposed to delta? Also, less people abided by restrictions in Fall 2021 compared to Spring 2021. So couldn't this partially be the reason for why "omicron" is more mild than delta? Of course, that would mean that "omicron in those who had a common cold recently" is more mild than delta, NOT that "omicron" is more mild than delta. Do you see how dangerous it is (for people who did not have a common cold in a long time, especially if unvaccinated) to claim that "omicron" is more mild than delta? Again, I don't know if all of this is true or not, but I certainly think it warrants a more closer look.

      Another confounding variable I can think of (though this one I am less certain of, but I don't think it hurts to put it out there): I remember early studies in 2020 showed viral load was associated with illness severity, and that those who wore masks tended to have less severe illness. Assuming those studies were correct, could it be that because omicron is more transmissible, more people are getting infected with omicron with low viral load compared to delta? For example, maybe more people are getting delta through droplet spread resulting in higher viral load, and more people who wear surgical masks but get omicron due to being in a small store with enough aerosols going through the mask and giving them omicron get omicron, resulting in less viral loads overall for omicron infections. Has this been controlled for? I have yet to see any studies that controlled for it.

    1. On 2021-04-10 18:48:39, user Daniel Haake wrote:

      Regarding version 6 of your study, I have pointed out with my comment which statistical problems are present due to your study design, which leads to an overestimation of the calculated IFR (cf. https://www.medrxiv.org/con... "https://www.medrxiv.org/content/10.1101/2020.07.23.20160895v6?versioned=true#disqus_thread)"). Thank you very much for your reply to my statement. I think that an exchange is important, because this is the only way to get reasonable results. Therefore, please do not regard my comments as criticism, but as suggestions for improvement on how to achieve correct values. Since my statement is still valid with version 7, I answer to your answer, in which I comment here in version 7.


      Re: Re: The time of the determination of the death figures

      Here you seem to have misunderstood me. I meant that with your example wave of infections and starting the study shortly after the peak of the wave, there is the problem that antibodies have not yet been formed by many people by the time the study starts. By choosing the time of death then, you caught 95% of the deaths, but only a much smaller proportion of those infected. This leads to an underestimated numerator and thus an overestimated IFR.

      Just because it was also done that way in the Geneva seropaevelence study does not automatically mean it is correct. So there are also very much studies where the study date was chosen for the number of deaths. For example:

      https://www.who.int/bulleti...<br /> https://www.medrxiv.org/con... <br /> https://www.medrxiv.org/con...

      ?However, I agree with you that the Santa Clara County study should be taken with a grain of salt, as here the subjects were called via a Facebook ad and thus bias may have occurred.? As I said, I understand the idea of taking a later date for the number of deaths. However, the associated problems regarding the underestimation of the infected, which I wrote about in the previous answer, still remain.

      It is still incomprehensible that you calculate a difference of 22-24 days, but then take a value 28 days after the study midpoint. This puts them 4-6 days behind your own calculation and thus automatically increases the IFR. Why do you elaborately calculate the difference of 22-24 days to determine the correct time, but then don't use that value??? Let me open up another example. Let's say we are testing at the peak of an infection wave. But now we count all the dead who showed up after a certain time, but we don't take into account that a large number of people still got infected after that. Some of the counted dead will also have become infected after the study. Then we have recorded all the dead, but not all the infected. Or do you want to say that all the dead are from the first half of the infection wave and none from the second part of the infection wave (especially since that would lead to an IFR of 0% for the second part of the infection wave). As you can see, it is problematic if you assume the number of deaths in the much later course, because you then choose the denominator of the quotient too small and arrive at an IFR that is too high.

      In general, only deceased persons who are clear to have been infected before the latest time at which study participants may have become infected may then be included. This is not the time of the study, since the antibody tests can only be positive after some time following an infection.


      Re: Re: PCR tests from countries with tracing programs

      Is it really "PCR testing per confirmed case", not "PCR testing per capita" that is the important parameter? Let us assume two example scenarios for this purpose. Let's assume that we test every resident and at that time 1% of the population is in the status where the PCR test is positive. Then we currently know from everyone what their status is. But then we would only get 1 positive tested person out of 100 tests performed. This test would then not be taken because of the too low ratio of tests per positive case. And this, although we would have tested even everyone. Now let's assume the opposite case. We test in a country where we don't know exactly where how many people are infected. Now we test in one region and assume that this result is transferable for the whole country. But actually this region is not as affected as other regions, we just don't know. Now we do 10,000 tests and find 20 infected people there. Then we come up with a ratio of 1 positive test per 500 tests performed. That test would then be included in your selection, even though the ratio of infected is actually higher. Therefore, it is just not the "per confirmed case" that is the important parameter. Because if there is a high number of cases in the country, you could now double and triple test everyone and know very well and still this investigation would be excluded. At the same time, however, studies can be included with few tests and thus a high statistical uncertainty for the reasons mentioned earlier.??

      The comparison with South Korea is also problematic. 0 or 1 seropositive results are far too few to have any statistical significance. The statistical uncertainty here is simply too high. And, as already mentioned, the results of these investigations cannot be transferred across the board to the other investigations. ??

      Including reported case numbers from countries that have a tracking system that works well for you leads to an overestimation of IFR.


      Re: Re: Study selection

      That you screen out studies, based on recruitment I can understand. I think that is statistically correct. I also see the danger with recruitment that you can't get representative results. Therefore, it is also understandable that you want to see which studies are useful and which are not.<br /> Nevertheless, you just sort out the studies that have a low calculation of IFR and leave studies with high values in your study. This leads to a shift toward the high values. Furthermore, studies that are straight up deviant are more problematic because a larger shift is possible in that direction. Let's say there is a hypothetical virus with an IFR of actually 0.5%. Then we have a study with a value of 0.3% and a study with 1.5%. The high value in particular is further away from the actual value and thus shifts the calculated value upward. If you have an actual IFR of 0.5%, you can misestimate by a maximum of 0.5 percentage points on the downside and by 99.5 percentage points on the upside in theory. This is also not surprising because such distributions are right skewed. If I remove both, the study with the too low value and the study with the too high value, the actual value does not change. If I remove both, the calculated value shifts upwards, because a stronger shift is possible in this direction. This leads to an overestimation of the IFR.


      Re: Re: Adjustment of death rates for Europe due to excess mortality

      You write in your reply that this is not relevant because reported deaths were used and not excess mortality. In Appendix Q you write: <br /> "For example, the Belgian study used in our metaregression computed age-specific IFRs using seroprevalence findings in conjunction with data on excess mortality in Belgium“. You may not have applied this to other studies. However, you are using a study that did. Accordingly, this is crucial and has an impact on your result.


      Re: Re: Calculation of the IFR of influenza

      You nevertheless calculate an age-specific IFR for COVID-19 and calculate the IFR as it would look if there were an equal distribution across age groups, which in fact there is not. At the same time, you say what the IFR is for influenza, which, as shown, you understate. After all, the comparability of numbers due to changing life circumstances do not change in a short period of time. Therefore it is no problem to use the IFR for influenza of several years. Thus you suggest a comparability of the numbers. It is not possible to compare an IFR that assumes an equal distribution of age groups with an IFR that does not assume an equal distribution. However, this is exactly what is being suggested. By the way, it is not only the media, it was also taken up by Dr. Drosten. For another reason the comparability is difficult. Namely, an IFR is compared of influenza, where we could already protect the vulneable groups to some extent by vaccination and also an infection could have been gone through in the past, which helps to fight the disease and can therefore lead to fewer problems. However, to be honest, one can of course argue here that this is just the way the situation is. Therefore it is also understandable for me if one nevertheless makes such a comparison. Then, however, by assuming an equal distribution over the age structure for both viruses, or the actual distribution for both. By the way, there is another problem. There is a comparison of an estimated IFR with a measured one.

      ---------------------------------------------------


      Additional comment

      With the studies to date, it is very difficult to estimate how high the IFR actually is. This is because there are problems with all methods. If you take antibody studies, there is the problem that antibodies are not detectable in all infected people. If you take the reported numbers of cases, there is the problem of the dark field. How could one calculate a clean IFR? By actually testing a certain proportion of the population as a representative group on a regular basis. For example, you can test 1 per thousand of the population every week and see if they are positive for COVID-19. Then look at how many people have died over time from the group of positives. Those deceased could then be autopsied by default to determine whether they died from or with COVID-19. In doing so, one must then determine what period of time after infection is still valid to count as a COVID-19 dead person. After all, is a person who died 10 months after infection still a COVID-19 dead person? After all, it is the elderly who are dying. But it is not atypical that they would have died over time even without infection. Now imagine that a 94-year-old dies 10 months after an infection. Can one then still say whether it was due to COVID-19? In this case, one would probably have to look at the medical history before and after COVID-19 and also see what symptoms the deceased had after the infection. Only with such a procedure it is possible to calculate a clean IFR. For a correct comparability with influenza, this procedure would also have to be used for the calculation of the IFR of influenza. If you are really interested in a scientific comparability of the IFR, you should proceed in this way.

    1. 1 IntroductionCurrent AI ethics initiatives, especially when adopted in scientific institutes or companies, mostly embrace a principle-based approach (Mittelstadt, 2019). However, establishing principles alone does not suffice; they also must be convincingly put into practice. Most AI ethics guidelines do shy away from coming up with methods to accomplish this (Hagendorff, 2020). Nevertheless, recently more and more research papers appeared that describe steps on how to come “from what to how” (Eitel-Porter, 2020; Morley et al., 2020; Theodorou & Dignum, 2020; Vakkuri et al., 2019a). However, AI ethics still fails in certain regards. The reasons for that are manifold. This is why both in academia and public debates, many authors state that AI ethics has not permeated the AI industry yet, quite the contrary (Vakkuri et al., 2019b). Despite the mentioned reasons, this is due to current AI ethics discourses hardly taking considerations on moral psychology into account. They do not consider the limitations of the human mind, the many hidden psychological forces like powerful cognitive biases, blind spots and the like that can affect the likelihood of ethical or unethical behavior. In order to effectively improve moral decision making in the AI field and to live up to common ideals and expectations, AI ethics initiatives can seek inspiration from another ethical framework that is yet largely underrepresented in AI ethics, namely virtue ethics. Instead of focusing only on principles, AI ethics can put a stronger focus on virtues or, in other words, on character dispositions in AI practitioners in order to effectively put itself into practice. When using the term “AI practitioners” or “professionals”, this includes AI or machine learning researchers, research project supervisors, data scientists, industry engineers and developers, as well as managers and other domain experts.Moreover, to bridge the gap between existing AI ethics initiatives and the requirements for their successful implementation, one should consider insights from moral psychology because, up to now, most parts of the AI ethics discourse disregard the psychological processes that limit the goals and effectiveness of ethics programs. This paper aims to respond to this gap in research. AI ethics, in order to be truly successful, should not only repeat bullet points from the numerous ethics codes (Jobin et al., 2019). It should also discuss the right dispositions and character strengths in AI practitioners that can help not only to identify ethical issues and to engender the motivation to take action, but also—and this is even more important—to discover and circumvent one’s own vulnerability to psychological forces affecting moral behavior. The purpose of this paper is to state how this can be executed and how AI ethics can choose a virtue-based approach in order to effectively put itself into practice.2 AI Ethics—the Current Principled ApproachCurrent AI ethics programs often come with specific weaknesses and shortcomings. First and foremost, without being accompanied by binding legal norms, their normative principles lack reinforcement mechanisms (Rességuier & Rodrigues, 2020). Basically, deviations from codes of ethics have no or very minor consequences. Moreover, even when AI applications fulfill all ethical requirements stipulated, it does not necessarily mean that the application itself is “ethically approved” when used in the wrong contexts or when developed by organizations that follow unethical intentions (Hagendorff, 2021a; Lauer, 2020). In addition to that, ethics can be used for marketing purposes (Floridi, 2019; Wagner, 2018). Recent AI ethics initiatives of the private sector have faced a lot of criticism in this regard. In fact, industry efforts for ethical and fair AI are compared to past efforts of “Big Tobacco” to whitewash the image of smoking (Abdalla & Abdalla, 2020). “Big Tech”, so the argument, uses ethics initiatives and targeted research funds to avoid legislation or the creation of binding legal norms (Ochigame, 2019). Hence, avoiding or addressing criticism like that is paramount for trustworthy ethics initiatives.The latest progress in AI ethics research was configured by a “practical turn”, which was among other things inspired by the conclusion that principles alone cannot guarantee ethical AI (Mittelstadt, 2019). To accomplish that, so the argument, principles must be put into practice. Recently, several frameworks were developed, describing the process “from what to how” (Hallensleben et al., 2020; Morley et al., 2020; Zicari, 2020). Basically, this implies considering the context dependency in the process of realizing codes of ethics, the different requirements for different stakeholders, as well as the demonstration of ways of dealing with conflicting principles or values, for instance in the case of fairness and accuracy (Whittlestone et al., 2019). Ultimately, however, the practical turn frameworks are often just more detailed codes of ethics that use more fine-grained concepts than the initial high-level guidelines. For instance, instead of just stressing the importance of privacy, like the first generation of comprehensive AI ethics guidelines did, they hint to the Privacy by Design or Privacy Impact Assessment toolkits (Cavoukian, 2011; Cavoukian et al., 2010; Oetzel & Spiekermann, 2014). Or instead of just stipulating principles for AI, they differentiate between stages of algorithmic development, namely business and use-case development; design phase, where the business or use case is translated into tangible requirements for AI practitioners; training and test data procurement; building of the AI application; testing the application; deployment of the application and monitoring of the application’s performance (Morley et al., 2020). Other frameworks (Dignum, 2018) are rougher and differentiate between ethics by design (integrating ethical decision routines in AI systems (Hagendorff, 2021c)), ethics in design (finding development methods that support the evaluation of ethical implications of AI systems (Floridi et al., 2018)) and ethics for design (ensuring integrity on the side of developers (Johnson, 2017)). But, as stated above, all frameworks still stick to the principled approach. The main transformation lies in the principles being far more nuanced and less abstract compared to the beginnings of AI ethics code initiatives (Future of Life Institute, 2017). Typologies for every stage of the AI development pipeline are available. Differentiating principles solves one problem, namely the problem of too much abstraction. At the same time, however, it leaves some other problems open. Speaking more broadly, current AI ethics disregards certain dimensions it should actually be having. In organizations of all kinds, the likelihood of unethical decisions or behavior can be controlled to a certain extent. Antecedents for unethical behavior are individual characteristics (gender, cognitive moral development, idealism, job satisfaction, etc.), moral issue characteristics (the concentration and probability of negative effects, the magnitude of consequences, the proximity of the issue, etc.) and organizational environment characteristics (a benevolent ethical climate, ethical culture, code existence, rule enforcement, etc.) (Kish-Gephart et al., 2010). With regard to AI ethics, these factors are only partially considered. Most parts of the discourse are focused on discussing organizational environment characteristics (codes of ethics) or moral issues characteristics (AI safety) (Brundage et al., 2018; Hagendorff, 2020, 2021b), but not individual characteristics (character dispositions) increasing the likelihood of ethical decision making in AI research and development.Therefore, a successful ethics strategy should focus on individual dispositions and organizational structures alike, whereas the overarching goal of every measure should be the prevention of harm. Or, in this case: prevent AI-based applications from inflicting direct or indirect harm. This rationale can be fulfilled by ensuring explainability of algorithmic decision making, by mitigating biases and promoting fairness in machine learning, by fostering AI robustness and the like. However, in addition to listing these issues is asking how AI practitioners can be taught to intuitively keep them in mind. This would mean to transition from a situation of an external “ethics assessment” of existing AI products with a “checkbox guideline” to an internal process of establishing “ethics for design”.Empirical research shows that having plain knowledge on ethical topics or moral dilemmas is likely to have no measurable influence on decision making. Even ethics professionals, meaning ethics professors and other scholars of ethics, typically do not act more ethically than non-ethicists (Schwitzgebel, 2009; Schwitzgebel & Rust, 2014). Correspondingly, in the AI field, empirical research shows that ethical principles have no significant influence on technology developer’s decision making routines (McNamara et al., 2018). Ultimately, ethical principles do not suffice to secure prosocial ways to develop and use new technologies (Mittelstadt, 2019). Normative principles are not worth much if they are not acknowledged and adhered to. In order to actually acknowledge the importance of ethical considerations, certain character dispositions or virtues are required, among others, virtues that encourage us to stick to moral ideals and values.3 Basic AI Virtues—the Foundation for Ethical Decision MakingWestern virtue ethics has its roots in moral theories of Greek philosophers. However, after deontology and utilitarianism became more mainstream in modern philosophy, virtue ethics recently experienced a “comeback”. Roughly speaking, this comeback of scholarly interest in virtue ethics was initiated by Anscombe’s essay “Modern Moral Philosophy” (1958) but found prominent supporters and continued to grow by MacIntyre (1981), Nussbaum (1993), Hursthouse (2001) and many more. Virtue ethics also has a rich tradition in East and Southeast philosophy, especially in Confucian and Buddhist ethical theories (Keown, 1992; Tiwald, 2010). Virtue-based ethical theories treat character as fundamental to ethics, whereas deontology, arguably the most prevalent ethical theory, focusses on principles. But what are the differences between principles and virtues? The former is based on normative rules that are universally valid, the latter addresses the question of what constitutes a good person or character. While ethical principles equal obligations, virtues are ideals that AI practitioners can aspire to. Deontology-inspired normative principles focus on the action rather than the actor. Thus, principlism defines action-guiding principles, whereas virtue ethics demands the development of specific positive character dispositions or character strengths.Why are these dispositions of importance for AI practitioners? One reason is that individuals, who display traits such as justice, honesty, empathy and the like, acquire (public) trust. Trust, in turn, makes it easier for people to cooperate and work together, it creates a sense of community and it makes social interactions more predictable (Schneier, 2012). Acquiring and maintaining the trust of other players in the AI field, but also the trust of the general public, can be a prerequisite for providing AI products and services. After all, intrinsically motivated actions are more trustworthy in comparison to those which are simply the product of extrinsically motivated rule following behavior (Meara et al., 1996).One has to admit that a lot of ongoing AI basic research or very specific, small AI applications have such weak ethical implications that virtues or ethical values have no relevance at all. But AI applications that involve personal data, that are part of human–computer interaction or that are used on a grand scale clearly have ethical implications that can be addressed by virtue ethics. In the theoretical process of transitioning from an “uncultivated” to a morally habituated state, “technomoral virtues” like civility, courage, humility, magnanimity and others can be fostered and acquired (Vallor, 2016; Harris 2008a; Kohen et al., 2019; Gambelin, 2020; Sison et al., 2017; Neubert, 2017; Harris 2008b; Ratti & Stapleford, 2021). In philosophy, virtue ethics traditionally comprises cardinal virtues, namely fortitude, justice, prudence and moderation. Further, a list of six broad virtues that can be distilled from religious texts, oaths and other virtue inventories was put together by Peterson and Seligman (2004), whereas the virtues are wisdom, courage, humanity, justice, temperance and transcendence. Furthermore, in her famous book “Technology and the Virtues”, Vallor (2016, 2021) identified twelve technomoral virtues, namely honesty, self-control, humility, justice, courage, empathy, care, civility, flexibility, perspective, magnanimity and wisdom. The selection was criticized in secondary literature (Howard, 2018; Vallor, 2018) but remains arguably the most important virtue-based approach in ethics of technology. In the more specific context of AI applications, however, one has to sort out those virtues that are particularly important in the field of AI ethics. Here, existing literature and preliminary works are spare (Constantinescu et al., 2021; Neubert & Montañez, 2020).Based on patterns and regularities of the ongoing discussion on AI ethics, an ethics strategy that is based on virtues would constitute four basic AI virtues, where each virtue corresponds to a set of principles (see Table 1). The basic AI virtues are justice, honesty, responsibility and care. But how exactly can these virtues be derived from AI ethics principles? Why do exactly these four virtues suffice? When consulting meta-studies on AI ethics guidelines that stem from the sciences, industry, as well as governments (Fjeld et al., 2020; Hagendorff, 2020; Jobin et al., 2019), it becomes clear that AI ethics norms comprise a certain set of reoccurring principles. The mentioned meta-studies on AI ethics guidelines list these principles hierarchically, starting with the most frequently mentioned principles (fairness, transparency, accountability, etc.) and ending at principles that are mentioned rather seldom, but nevertheless repeatedly (sustainability, diversity, social cohesion etc.). When sifting through all these principles, one can, by using a reductionist approach and clustering them into groups, distill four basic virtues that cover all of them (see Fig. 1). The decisive question for the selection of the four basic AI virtues was: Does virtue A describe character dispositions that, when internalized by AI practitioners, will intrinsically motivate them to act in a way that “automatically” ensures or makes it more likely that the outcomes of their actions, among others, result in technological artefacts that meet the requirements that principle X specifies? Or, in short, does virtue A translate into behavior that is likely to result in an outcome that corresponds to the requirements of principle X? This question had to be applied for every principle that was derived from the meta-studies, testing by how many different virtues they can be covered. Ultimately, this process resulted in only four distinct virtues.Table 1 List of basic AI virtuesFull size tableFig. 1Full size imageUsing meta-studies on AI ethics guidelines as sources to distill four basic AI virtuesTo name some examples: The principle of algorithmic fairness corresponds to the virtue of justice. A just person will “automatically” be motivated to contribute to machine outputs that do not discriminate against groups of people, independently of external factors and guideline rules. The principle of transparency, as a second example, corresponds to the virtue of honesty, because an honest person will “automatically” be inclined to be open about mistakes, to not hide technical shortcomings, to make research outcomes accessible and explainable. The principle of safe AI would be a third example. Here, the virtue of care will move professionals to act in a manner that they do not only acknowledge the importance of safety and harm avoidance, but also act accordingly. Ultimately, the transition happens between deontological rules, principles or universal norms on the one hand and virtues, intrinsic motives or character dispositions on the other hand. Nevertheless, both fields are connected by the same objective, namely to come up with trustworthy, human-centered, beneficial AI applications. Just the means to reach this objective are different.As said before, the four basic AI virtues cover all common principles of AI ethics as described in prior discourses (Fjeld et al., 2020; Floridi et al., 2018; Hagendorff, 2020; Jobin et al., 2019; Morley et al., 2020). They are the precondition for putting principles into practice by representing different motivational settings for steering decision making processes in AI research and development in the right direction. But stipulating those four basic AI virtues is not enough. Tackling ethics problems in practice also needs second-order virtues that enable professionals to deal with “bounded ethicality”.4 Second-Order AI Virtues—a Response to Bounded EthicalityWhen using a simple ethical theory, one can assume that individuals go through three phases. First, individuals perceive that they are confronted with a moral decision they have to make. Secondly, they reflect on ethical principles and come up with a moral judgment. And finally, they act accordingly to these judgments and therefore act morally. But individuals do not actually behave this way. In fact, moral judgments are in most cases not influenced by moral reasoning (Haidt, 2001). Moral judgments are done intuitively, and moral reasoning is used in hindsight to justify one’s initial reaction. In short, typically, moral action precedes moral judgment. This leads to consequences for AI ethics. It shows that parts of current ethics initiatives can be reduced to plain “justifications” for the status quo of technology development—or at least they are adopted to it. For instance, the most commonly stressed AI ethics principles are fairness, accountability, explainability, transparency, privacy and safety (Hagendorff, 2020). However, these are issues for which a lot of technical solutions already exist and where a lot of research is done anyhow. Hence, AI ethics initiatives are simply reaffirming existing practices. On a macro level, this stands in correspondence with the aforementioned fact that moral judgments do not determine, but rather follow or explain prior decision making processes.Although explicit ethics training may improve AI practitioners’ intellectual understanding of ethics itself, there are many limitations restricting ethical decision making in practice, no matter how comprehensive one’s knowledge on ethical theories is. Many reasons for unethical behavior are resulting from environmental influences on human behavior and limitations through bounded rationality or, to be more precise, “bounded ethicality” (Bazerman & Tenbrunsel, 2011; Tenbrunsel & Messick, 2004). Bounded ethicality is an umbrella term that is used in moral psychology to name environmental as well as intrapersonal factors that can thwart ethical decision making in practice. Hence, in order to address bounded ethicality, AI ethics programs are in need of specific virtues, namely virtues that help to “debias” ethical decision making in order to overcome bounded ethicality.The first step to successively dissolve bounded ethicality is to inform AI practitioners not about the importance of machine biases, but psychological biases as well as situational forces. Here, two second-order virtues come into play, namely prudence and fortitude (see Table 2). In Aristotelian virtue ethics, prudence (or phrónēsis) guides the enactment of individual virtues in unique moral situations, meaning that a person can intelligently express virtuous behavior (Aristotle et al., 2012). As a unifying intellectual virtue, prudence also gains center stage in modern virtue-based approaches to engineering ethics (Frigo et al., 2021). In this paper, prudence plays a similar role and is used in combination with another virtue, namely fortitude. While both virtues may help to overcome bounded ethicality, they are at the same time enablers for living up to the basic virtues. Individual psychological biases as well as situational forces can get in the way of acting justly, honestly, responsibly or caringly. Prudence and fortitude are the answers to the many forces that may restrict basic AI virtues, where prudence is aiming primarily at individual factors, while fortitude addresses supra-individual issues that can impair ethical decision making in AI research and development.Table 2 List of second-order AI virtuesFull size tableIn the following, a selection of some of the major factors of bounded ethicality that can be tackled by prudence shall be described. This selection is neither exhaustive nor does it go into much detail. However, it is meant to be a practical overview that can set the scene for more in-depth subsequent analyses.Clearly, the most obvious factors of bounded ethicality are psychological biases (Cain & Detsky, 2008). It is common that people’s first and often only reaction to moral problems is emotional. Or, in other words, taking up dual-process theory, their reaction follows system 1 thinking (Kahneman, 2012; Tversky & Kahneman, 1974), meaning an intuitive, implicit, effortless, automatic mode of mental information processing. System 1 thinking predominates everyday decisions. System 2, on the other hand, is a conscious, logical, less error-prone, but slow and effortful mode of thinking. Although many decision making routines would require system 2 thinking, individuals often lack the energy to switch from system 1 to system 2. Ethical decision making needs cognitive energy (Mead et al., 2009). This is why prudence is such an important virtue, since it helps AI practitioners to transition from system 1 to system 2 thinking in ethical problems. This is not to say that the dual-process theory is without criticism. Recently, cognitive scientists have challenged its validity (Grayot, 2020), even though they did not abandon it in toto. It still remains a scientifically sound heuristic in moral psychology. Thus, system 2 thinking remains strikingly close to critical ethical thinking, although it does obviously not necessarily result in it (Bonnefon, 2018).The transition from system 1 to system 2 thinking in ethical problems can also be useful for mitigating another powerful psychological force, namely implicit biases (Banaji & Greenwald, 2013), that can impair at least two basic AI virtues, namely justice and care. Individuals have implicit associations, also called “ordinary prejudices”, that lead them to classify, categorize and perceive their social surroundings with accordance to prejudices and stereotypes. This effect is so strong that even individuals who are absolutely sure to not be hostile towards minority groups actually are exactly that. The reason for that lies in the fact that people succumb to subconscious biases that reflect culturally established stereotypes or discrimination patterns. Hence, unintentional discrimination cannot be unlearned without changing culture, the media, the extent of exposure to people from minorities and the like. Evidently, this task cannot be fulfilled by the AI sector. Nevertheless, implicit biases can be tackled by increasing workforce diversity in AI firms and by using prudence as a virtue to accept the irrefutable existence and problematic nature of implicit biases as well as their influence on justice in the first place.Another important bias that can compromise basic AI virtues and that can at the same time be overcome by prudence is in-group favoritism (Efferson et al., 2008). This bias causes people to sympathize with others who share their culture, organization, gender, skin color, etc. For AI practitioners, this means that AI applications which have negative side-effects on outgroups, for instance the livelihoods of clickworkers in South-east Asia (Graham et al., 2017), are rated less ethically problematic than AI applications that would have similar consequences for in-groups. Moreover, the current gender imbalance in the AI field might be prolonged by in-group favoritism in human resource management. In-group favoritism mainly stifles character dispositions like justice and care. Prudence, on the other hand, is apt to work against in-group favoritism by recognizing artificial group constructions as well as definitions of who counts as “we” and who as “others”, bolstering not only fair decision making, but also abilities to empathize with “distant” individuals.One further and important effect of bounded ethicality that can impair the realization of the basic AI virtues is self-serving biases. These biases cause revisionist impulses in humans, helping to downplay or deny past unethical actions while memorizing ethical ones, resulting in a self-concept that depicts oneself as ethical. When one asks individuals to rate how ethical they think they are on a scale of 0 to 100 related to other individuals, the majority of them will give themselves a score of more than 50 (Epley & Dunning, 2000). The same holds true when people are asked to assess the organization they are a part of in relation to other organizations. Average scores are higher than 50, although actually the average score would have to be 50. What one can learn from this is that generally speaking, people overestimate their ethicality. Moreover, self-serving biases cause people to blame other people when things go wrong, but to view successes as being one’s own achievement. Others are to blame for ethical problems, depicting the problems as being outside of one’s own control. In the AI sector, self-serving biases can come into play when attributing errors or inaccuracies in applications as being the result of others, when reacting dismissive to critical feedback or feelings of concern, etc. Moreover, not overcoming self-serving biases by prudence can mean to act unjustly and dishonestly, further compromising basic AI virtues.Value-action gaps are another effect of bounded ethicality revealed by empirical studies in moral psychology (Godin et al., 2005; Jansen & Glinow, 1985). Value-action gaps occur in the discrepancy between people’s self-concepts or moral values and their actual behavior. In short, the gaps mark the distance between what people say and what people do. Prudence, on the other hand, can help to identify that distance. In the AI field, value-action gaps can occur on an organizational level, for instance by using lots of ethics-related terms in corporate reports and press releases while actually being involved in unethical businesses practices, lawsuits, fraud, etc. (Loughran et al., 2009). Especially the AI sector is often accused of ethics-washing, hence of talking much about ethics, but not acting accordingly (Hao, 2019). Likewise, value-action gaps can occur on an individual level, for instance by holding AI safety or data security issues in high esteem while actually accepting improper quality assurance or rushed development and therefore provoking technical vulnerabilities in machine learning models. Akin to value-action gaps are behavioral forecasting errors (Diekmann et al., 2003). Here, people tend to believe that they will act ethically in a given situation X, while when situation X actually occurs, they do not behave accordingly (Woodzicka & LaFrance, 2001). They underestimate the extent to which they will indeed stick to their ideals and intentions. All these effects can interfere negatively with basic AI virtues, mostly with care, honesty and justice. This is why prudence with regard to value-action gaps is of great importance.The concept of moral disengagement is another important factor in bounded ethical decision making (Bandura, 1999). Techniques of moral disengagement allow individuals to selectively turn their moral concerns on and off. In many day-to-day decisions, people act contrary to their own ethical standards, but without feeling bad about it or having a guilty conscience. The main techniques in moral disengagement processes comprise justifications, where wrongdoing is justified as means to a higher end; changes in one’s definition about what is ethical; euphemistic labels, where individuals detach themselves from problematic action contexts by using linguistic distancing mechanisms; denial of being personally responsible for particular outcomes, where responsibility is attributed to a larger group of people; the use of comparisons, where own wrongdoings are relativized by pointing at other contexts of wrongdoings or the avoidance of certain information that refers to negative consequences of one’s own behavior. Again, prudence can help to identify cases of moral disengagement in the AI field and act as a response to it. Addressing moral disengagement with prudence can be a requirement to live up to all basic AI virtues.In the following, a selection of some of the major factors of bounded ethicality that can be tackled by fortitude shall be described. Here, supra-individual issues that can impair ethical decision making in AI research and development are addressed. Certainly, one of the most relevant factors one has to discuss in this context are situational forces. Numerous empirical studies in moral psychology have shown that situational forces can have a massive impact on moral behavior (Isen & Levin, 1972; Latané & Darley, 1968; Williams & Bargh, 2008). Situational forces can range from specific influences like the noise of a lawnmower that significantly affects helping behavior (Mathews & Canon, 1975) to more relevant factors like competitive orientations, time constraints, tiredness, stress, etc., which are likely to alter or overwrite ethical concerns (Cave & ÓhÉigeartaigh, 2018; Darley & Batson, 1973; Kouchaki & Smith, 2014). Especially financial incentives have a significant influence on ethical behavior. In environments that are structured by economic imperatives, decisions that clearly have an ethical dimension can be reframed as pure business decisions. All in all, money has manifold detrimental consequences for decision making since it leads to decisions that are proven to be less social, less ethical or less cooperative (Gino & Mogilner, 2014; Gino & Pierce, 2009; Kouchaki et al., 2013; Palazzo et al., 2012; Vohs et al., 2006). Ultimately, various finance law obligations or monetary factual constraints that a company’s management has to comply to can conflict with or overwrite AI virtues. Especially in contexts like this, virtue ethics can significantly be pushed into the background, although the perceived constraints lead to immoral outcomes. In short, situational forces can have negative impacts on unfolding all four basic AI virtues, namely justice, honesty, responsibility and care. In general, critics of virtue ethics have pointed out that moral behavior is not determined by character traits, but social contexts and concrete situations (Kupperman, 2001). However, situationist accounts are in fact entirely compatible with virtue ethics since it provides particular virtues like fortitude that are intended to counteract situational forces (and that can explain why some individuals deviate from expected behavior in classical psychological experiments like the Milgram experiment (Milgram, 1963)). Fortitude is supposed to help to counteract situational pressure, allowing the mentioned basic virtues to flourish.Similar to and often not clearly distinguishable from situational forces are peer influences (Asch, 1951, 1956). Individuals want to follow the crowd, adapt their behavior to that of their peers and act similarly to them. This is also called conformity bias. Conformity biases can become a problem for two reasons: First, group norms can possess unethical traits, leading for instance to a collective acceptance of harm. Second, the reliance on group norms and the associated effects of conformity bias induces a suppression of own ethical judgments. In other words, if one individual starts to misbehave, for instance by cheating, others follow suit (Gino et al., 2009). A similar problem occurs with authorities (Milgram, 1963). Humans have an internal tendency for being obedient to authorities. This willingness to please authorities can have positive consequences when executives act ethically themselves. If this is not the case, the opposite becomes true. For AI ethics, this means that social norms that tacitly emerge from AI practitioner’s behavioral routines as well as managerial decisions can both bolster ethical as well as unethical working cultures. In the case of the latter, the decisive factor is the way individuals respond to inner normative conflicts with their surroundings. Do they act in conformity and obedience even if it means to violate basic AI virtues? Or do they stick to their dispositions and deviate from detrimental social norms or orders? Fortitude, one of the two second-order virtues, can ensure the appropriate mental strength to stick to the right intentions and behavior, be it in cases where everyone disobeys a certain law but oneself does not want to join in, where managerial orders instruct to bring a risky product to the market as fast as possible but oneself insists on piloting it before release or where under extreme time pressure one insists on devoting time to understand and analyze training data sets.5 Ethics Training—AI Virtues Come into BeingIn traditional virtue ethics concepts, virtues emerge from habitual, repeated and gradually refined practice of right and prudent actions (Aristotle et al., 2012). At first, specific virtues are encouraged and practiced by performing acts that are inspired by “noble” human role-models and that resemble other patterns, narratives or social models of the virtue in question. Later, virtues are refined by taking the particularity of given situations into account. Regarding AI virtues, the proceeding is not much different (Bezuidenhout & Ratti, 2021). However, cultivating basic and second-order AI virtues means achieving virtuous practice embedded in a specific organizational and cultural context. A virtuous practice requires some sort of moral self-cultivation that encompasses the acquirement of motivations or the will to take action, knowledge on ethical issues, skills to identify them and moral reasoning to make the right moral decisions (Johnson, 2017). One could reckon that especially aforementioned skills or motivations are either innate or the result of childhood education. But ethical dispositions can be changed by education in all stages of life, for instance by powerful experiences, virtuous leaders or a certain work atmosphere in organizations. To put it in a nutshell, virtues can be trained and taught in order to foster ethical decision making and to overcome bounded ethicality. Most importantly, if ethics training imparts only explicit knowledge (or ethical principles), this will very likely have no effect on behavior. Ethics training must also impart tacit knowledge, meaning skills of social perception and emotion that cause individuals to automatically feel and want the right thing in a given situation (Haidt, 2006, p. 160).The simplest form of ethics programs comprise ethics training sessions combined with incentive schemes for members of a given organization that reward the abidance of ethical principles and punish their violation. These ethics programs have numerous disadvantages. First, individuals that are part of them are likely to only seek to perform well on behavior covered by exactly these programs. Areas that are not covered are neglected. That way, ethics programs can even increase unethical behavior by actually well-intended sanctioning systems (Gneezy & Rustichini, 2000). For instance, in case a fine is put on a specific unethical behavior, individuals who benefit from this behavior might simply weigh the advantage of the unethical behavior against the disadvantage of the fine. If the former outweighs the latter, the unethical behavior might even increase if a sanctioning system is in place. Ethical decisions would simply be reframed as monetary decisions. In addition to that, individuals can become inclined to trick incentive schemes and reward systems. Moreover, those programs solely focus on extrinsic motivators and do not change intrinsic dispositions and moral attitudes. All in all, ethics programs that comprise simple reward and sanctioning systems—as well as corresponding surveillance and monitoring mechanisms—are very likely to fail.A further risk of ethics programs or ethics training are reactance phenomena. Reactance occurs when individuals protest against constraints of their personal freedoms. As soon as ethical principles restrict the freedom of AI practitioners doing their work, they might react to this restriction by trying to reclaim that very freedom by all means (Dillard & Shen, 2005; Dowd et al., 1991; Hong, 1992). People want to escape restrictions, thus the moment when such restrictions are put in place—no matter whether they are justified from an ethical perspective or not—people might start striving to break free from them. Ultimately, “forcing” ethics programs on members of an organization is not a good idea. Ethics programs should not be decoupled from the inner mechanisms and routines of an organization. Hence, in order to avoid reactance and to fit ethics programs into actual structures and routines of an organization, it makes sense to carefully craft specific, unique compliance measures that take particular decision processes of AI practitioners and managers into account. In addition to that, ethics programs can be implemented in organizations with delay. This has the effect of a “future lock-in” (Rogers & Bazerman, 2008), meaning that policies achieve more support, since the time delay allows for an elimination of the immediate costs of implementation, for individuals to prepare for the respective measures and for a recognition of their advantages.Considering all of that, what measures can actually support AI practitioners and AI companies’ managers to strengthen AI virtues? Here, again, insights from moral psychology as well as behavioral ethics research can be used (Hines et al., 1987; Kollmuss & Agyeman, 2002; Treviño et al., 2006, 2014) to catalogue measures that bolster ethical decision making as well as virtue acquisition (see Tables 3 and 4). The measures can be vaguely divided into those that tend to affect single individuals and those that bring about or relate to structural changes in organizations. The following Table 3 lists measures that relate to AI professionals on an individual level.
    1. Virtue EthicsFirst published Fri Jul 18, 2003; substantive revision Tue Oct 11, 2022 Virtue ethics is currently one of three major approaches in normative ethics. It may, initially, be identified as the one that emphasizes the virtues, or moral character, in contrast to the approach that emphasizes duties or rules (deontology) or that emphasizes the consequences of actions (consequentialism). Suppose it is obvious that someone in need should be helped. A utilitarian will point to the fact that the consequences of doing so will maximize well-being, a deontologist to the fact that, in doing so the agent will be acting in accordance with a moral rule such as “Do unto others as you would be done by” and a virtue ethicist to the fact that helping the person would be charitable or benevolent. This is not to say that only virtue ethicists attend to virtues, any more than it is to say that only consequentialists attend to consequences or only deontologists to rules. Each of the above-mentioned approaches can make room for virtues, consequences, and rules. Indeed, any plausible normative ethical theory will have something to say about all three. What distinguishes virtue ethics from consequentialism or deontology is the centrality of virtue within the theory (Watson 1990; Kawall 2009). Whereas consequentialists will define virtues as traits that yield good consequences and deontologists will define them as traits possessed by those who reliably fulfil their duties, virtue ethicists will resist the attempt to define virtues in terms of some other concept that is taken to be more fundamental. Rather, virtues and vices will be foundational for virtue ethical theories and other normative notions will be grounded in them. We begin by discussing two concepts that are central to all forms of virtue ethics, namely, virtue and practical wisdom. Then we note some of the features that distinguish different virtue ethical theories from one another before turning to objections that have been raised against virtue ethics and responses offered on its behalf. We conclude with a look at some of the directions in which future research might develop. 1. Preliminaries 1.1 Virtue 1.2 Practical Wisdom 2. Forms of Virtue Ethics 2.1 Eudaimonist Virtue Ethics 2.2 Agent-Based and Exemplarist Virtue Ethics 2.3 Target-Centered Virtue Ethics 2.4 Platonistic Virtue Ethics 3. Objections to virtue ethics 4. Future Directions Bibliography Academic Tools Other Internet Resources Related Entries 1. Preliminaries In the West, virtue ethics’ founding fathers are Plato and Aristotle, and in the East it can be traced back to Mencius and Confucius. It persisted as the dominant approach in Western moral philosophy until at least the Enlightenment, suffered a momentary eclipse during the nineteenth century, but re-emerged in Anglo-American philosophy in the late 1950s. It was heralded by Anscombe’s famous article “Modern Moral Philosophy” (Anscombe 1958) which crystallized an increasing dissatisfaction with the forms of deontology and utilitarianism then prevailing. Neither of them, at that time, paid attention to a number of topics that had always figured in the virtue ethics tradition—virtues and vices, motives and moral character, moral education, moral wisdom or discernment, friendship and family relationships, a deep concept of happiness, the role of the emotions in our moral life and the fundamentally important questions of what sorts of persons we should be and how we should live. Its re-emergence had an invigorating effect on the other two approaches, many of whose proponents then began to address these topics in the terms of their favoured theory. (One consequence of this has been that it is now necessary to distinguish “virtue ethics” (the third approach) from “virtue theory”, a term which includes accounts of virtue within the other approaches.) Interest in Kant’s virtue theory has redirected philosophers’ attention to Kant’s long neglected Doctrine of Virtue, and utilitarians have developed consequentialist virtue theories (Driver 2001; Hurka 2001). It has also generated virtue ethical readings of philosophers other than Plato and Aristotle, such as Martineau, Hume and Nietzsche, and thereby different forms of virtue ethics have developed (Slote 2001; Swanton 2003, 2011a). Although modern virtue ethics does not have to take a “neo-Aristotelian” or eudaimonist form (see section 2), almost any modern version still shows that its roots are in ancient Greek philosophy by the employment of three concepts derived from it. These are arête (excellence or virtue), phronesis (practical or moral wisdom) and eudaimonia (usually translated as happiness or flourishing). (See Annas 2011 for a short, clear, and authoritative account of all three.) We discuss the first two in the remainder of this section. Eudaimonia is discussed in connection with eudaimonist versions of virtue ethics in the next. 1.1 Virtue A virtue is an excellent trait of character. It is a disposition, well entrenched in its possessor—something that, as we say, goes all the way down, unlike a habit such as being a tea-drinker—to notice, expect, value, feel, desire, choose, act, and react in certain characteristic ways. To possess a virtue is to be a certain sort of person with a certain complex mindset. A significant aspect of this mindset is the wholehearted acceptance of a distinctive range of considerations as reasons for action. An honest person cannot be identified simply as one who, for example, practices honest dealing and does not cheat. If such actions are done merely because the agent thinks that honesty is the best policy, or because they fear being caught out, rather than through recognising “To do otherwise would be dishonest” as the relevant reason, they are not the actions of an honest person. An honest person cannot be identified simply as one who, for example, tells the truth because it is the truth, for one can have the virtue of honesty without being tactless or indiscreet. The honest person recognises “That would be a lie” as a strong (though perhaps not overriding) reason for not making certain statements in certain circumstances, and gives due, but not overriding, weight to “That would be the truth” as a reason for making them. An honest person’s reasons and choices with respect to honest and dishonest actions reflect her views about honesty, truth, and deception—but of course such views manifest themselves with respect to other actions, and to emotional reactions as well. Valuing honesty as she does, she chooses, where possible to work with honest people, to have honest friends, to bring up her children to be honest. She disapproves of, dislikes, deplores dishonesty, is not amused by certain tales of chicanery, despises or pities those who succeed through deception rather than thinking they have been clever, is unsurprised, or pleased (as appropriate) when honesty triumphs, is shocked or distressed when those near and dear to her do what is dishonest and so on. Given that a virtue is such a multi-track disposition, it would obviously be reckless to attribute one to an agent on the basis of a single observed action or even a series of similar actions, especially if you don’t know the agent’s reasons for doing as she did (Sreenivasan 2002). Possessing a virtue is a matter of degree. To possess such a disposition fully is to possess full or perfect virtue, which is rare, and there are a number of ways of falling short of this ideal (Athanassoulis 2000). Most people who can truly be described as fairly virtuous, and certainly markedly better than those who can truly be described as dishonest, self-centred and greedy, still have their blind spots—little areas where they do not act for the reasons one would expect. So someone honest or kind in most situations, and notably so in demanding ones, may nevertheless be trivially tainted by snobbery, inclined to be disingenuous about their forebears and less than kind to strangers with the wrong accent. Further, it is not easy to get one’s emotions in harmony with one’s rational recognition of certain reasons for action. I may be honest enough to recognise that I must own up to a mistake because it would be dishonest not to do so without my acceptance being so wholehearted that I can own up easily, with no inner conflict. Following (and adapting) Aristotle, virtue ethicists draw a distinction between full or perfect virtue and “continence”, or strength of will. The fully virtuous do what they should without a struggle against contrary desires; the continent have to control a desire or temptation to do otherwise. Describing the continent as “falling short” of perfect virtue appears to go against the intuition that there is something particularly admirable about people who manage to act well when it is especially hard for them to do so, but the plausibility of this depends on exactly what “makes it hard” (Foot 1978: 11–14). If it is the circumstances in which the agent acts—say that she is very poor when she sees someone drop a full purse or that she is in deep grief when someone visits seeking help—then indeed it is particularly admirable of her to restore the purse or give the help when it is hard for her to do so. But if what makes it hard is an imperfection in her character—the temptation to keep what is not hers, or a callous indifference to the suffering of others—then it is not. 1.2 Practical Wisdom Another way in which one can easily fall short of full virtue is through lacking phronesis—moral or practical wisdom. The concept of a virtue is the concept of something that makes its possessor good: a virtuous person is a morally good, excellent or admirable person who acts and feels as she should. These are commonly accepted truisms. But it is equally common, in relation to particular (putative) examples of virtues to give these truisms up. We may say of someone that he is generous or honest “to a fault”. It is commonly asserted that someone’s compassion might lead them to act wrongly, to tell a lie they should not have told, for example, in their desire to prevent someone else’s hurt feelings. It is also said that courage, in a desperado, enables him to do far more wicked things than he would have been able to do if he were timid. So it would appear that generosity, honesty, compassion and courage despite being virtues, are sometimes faults. Someone who is generous, honest, compassionate, and courageous might not be a morally good person—or, if it is still held to be a truism that they are, then morally good people may be led by what makes them morally good to act wrongly! How have we arrived at such an odd conclusion? The answer lies in too ready an acceptance of ordinary usage, which permits a fairly wide-ranging application of many of the virtue terms, combined, perhaps, with a modern readiness to suppose that the virtuous agent is motivated by emotion or inclination, not by rational choice. If one thinks of generosity or honesty as the disposition to be moved to action by generous or honest impulses such as the desire to give or to speak the truth, if one thinks of compassion as the disposition to be moved by the sufferings of others and to act on that emotion, if one thinks of courage as mere fearlessness or the willingness to face danger, then it will indeed seem obvious that these are all dispositions that can lead to their possessor’s acting wrongly. But it is also obvious, as soon as it is stated, that these are dispositions that can be possessed by children, and although children thus endowed (bar the “courageous” disposition) would undoubtedly be very nice children, we would not say that they were morally virtuous or admirable people. The ordinary usage, or the reliance on motivation by inclination, gives us what Aristotle calls “natural virtue”—a proto version of full virtue awaiting perfection by phronesis or practical wisdom. Aristotle makes a number of specific remarks about phronesis that are the subject of much scholarly debate, but the (related) modern concept is best understood by thinking of what the virtuous morally mature adult has that nice children, including nice adolescents, lack. Both the virtuous adult and the nice child have good intentions, but the child is much more prone to mess things up because he is ignorant of what he needs to know in order to do what he intends. A virtuous adult is not, of course, infallible and may also, on occasion, fail to do what she intended to do through lack of knowledge, but only on those occasions on which the lack of knowledge is not culpable. So, for example, children and adolescents often harm those they intend to benefit either because they do not know how to set about securing the benefit or because their understanding of what is beneficial and harmful is limited and often mistaken. Such ignorance in small children is rarely, if ever culpable. Adults, on the other hand, are culpable if they mess things up by being thoughtless, insensitive, reckless, impulsive, shortsighted, and by assuming that what suits them will suit everyone instead of taking a more objective viewpoint. They are also culpable if their understanding of what is beneficial and harmful is mistaken. It is part of practical wisdom to know how to secure real benefits effectively; those who have practical wisdom will not make the mistake of concealing the hurtful truth from the person who really needs to know it in the belief that they are benefiting him. Quite generally, given that good intentions are intentions to act well or “do the right thing”, we may say that practical wisdom is the knowledge or understanding that enables its possessor, unlike the nice adolescents, to do just that, in any given situation. The detailed specification of what is involved in such knowledge or understanding has not yet appeared in the literature, but some aspects of it are becoming well known. Even many deontologists now stress the point that their action-guiding rules cannot, reliably, be applied without practical wisdom, because correct application requires situational appreciation—the capacity to recognise, in any particular situation, those features of it that are morally salient. This brings out two aspects of practical wisdom. One is that it characteristically comes only with experience of life. Amongst the morally relevant features of a situation may be the likely consequences, for the people involved, of a certain action, and this is something that adolescents are notoriously clueless about precisely because they are inexperienced. It is part of practical wisdom to be wise about human beings and human life. (It should go without saying that the virtuous are mindful of the consequences of possible actions. How could they fail to be reckless, thoughtless and short-sighted if they were not?) The second is the practically wise agent’s capacity to recognise some features of a situation as more important than others, or indeed, in that situation, as the only relevant ones. The wise do not see things in the same way as the nice adolescents who, with their under-developed virtues, still tend to see the personally disadvantageous nature of a certain action as competing in importance with its honesty or benevolence or justice. These aspects coalesce in the description of the practically wise as those who understand what is truly worthwhile, truly important, and thereby truly advantageous in life, who know, in short, how to live well. 2. Forms of Virtue Ethics While all forms of virtue ethics agree that virtue is central and practical wisdom required, they differ in how they combine these and other concepts to illuminate what we should do in particular contexts and how we should live our lives as a whole. In what follows we sketch four distinct forms taken by contemporary virtue ethics, namely, a) eudaimonist virtue ethics, b) agent-based and exemplarist virtue ethics, c) target-centered virtue ethics, and d) Platonistic virtue ethics. 2.1 Eudaimonist Virtue Ethics The distinctive feature of eudaimonist versions of virtue ethics is that they define virtues in terms of their relationship to eudaimonia. A virtue is a trait that contributes to or is a constituent of eudaimonia and we ought to develop virtues, the eudaimonist claims, precisely because they contribute to eudaimonia. The concept of eudaimonia, a key term in ancient Greek moral philosophy, is standardly translated as “happiness” or “flourishing” and occasionally as “well-being.” Each translation has its disadvantages. The trouble with “flourishing” is that animals and even plants can flourish but eudaimonia is possible only for rational beings. The trouble with “happiness” is that in ordinary conversation it connotes something subjectively determined. It is for me, not for you, to pronounce on whether I am happy. If I think I am happy then I am—it is not something I can be wrong about (barring advanced cases of self-deception). Contrast my being healthy or flourishing. Here we have no difficulty in recognizing that I might think I was healthy, either physically or psychologically, or think that I was flourishing but be wrong. In this respect, “flourishing” is a better translation than “happiness”. It is all too easy to be mistaken about whether one’s life is eudaimon (the adjective from eudaimonia) not simply because it is easy to deceive oneself, but because it is easy to have a mistaken conception of eudaimonia, or of what it is to live well as a human being, believing it to consist largely in physical pleasure or luxury for example. Eudaimonia is, avowedly, a moralized or value-laden concept of happiness, something like “true” or “real” happiness or “the sort of happiness worth seeking or having.” It is thereby the sort of concept about which there can be substantial disagreement between people with different views about human life that cannot be resolved by appeal to some external standard on which, despite their different views, the parties to the disagreement concur (Hursthouse 1999: 188–189). Most versions of virtue ethics agree that living a life in accordance with virtue is necessary for eudaimonia. This supreme good is not conceived of as an independently defined state (made up of, say, a list of non-moral goods that does not include virtuous activity) which exercise of the virtues might be thought to promote. It is, within virtue ethics, already conceived of as something of which virtuous activity is at least partially constitutive (Kraut 1989). Thereby virtue ethicists claim that a human life devoted to physical pleasure or the acquisition of wealth is not eudaimon, but a wasted life. But although all standard versions of virtue ethics insist on that conceptual link between virtue and eudaimonia, further links are matters of dispute and generate different versions. For Aristotle, virtue is necessary but not sufficient—what is also needed are external goods which are a matter of luck. For Plato and the Stoics, virtue is both necessary and sufficient for eudaimonia (Annas 1993). According to eudaimonist virtue ethics, the good life is the eudaimon life, and the virtues are what enable a human being to be eudaimon because the virtues just are those character traits that benefit their possessor in that way, barring bad luck. So there is a link between eudaimonia and what confers virtue status on a character trait. (For a discussion of the differences between eudaimonists see Baril 2014. For recent defenses of eudaimonism see Annas 2011; LeBar 2013b; Badhwar 2014; and Bloomfield 2014.) 2.2 Agent-Based and Exemplarist Virtue Ethics Rather than deriving the normativity of virtue from the value of eudaimonia, agent-based virtue ethicists argue that other forms of normativity—including the value of eudaimonia—are traced back to and ultimately explained in terms of the motivational and dispositional qualities of agents. It is unclear how many other forms of normativity must be explained in terms of the qualities of agents in order for a theory to count as agent-based. The two best-known agent-based theorists, Michael Slote and Linda Zagzebski, trace a wide range of normative qualities back to the qualities of agents. For example, Slote defines rightness and wrongness in terms of agents’ motivations: “[A]gent-based virtue ethics … understands rightness in terms of good motivations and wrongness in terms of the having of bad (or insufficiently good) motives” (2001: 14). Similarly, he explains the goodness of an action, the value of eudaimonia, the justice of a law or social institution, and the normativity of practical rationality in terms of the motivational and dispositional qualities of agents (2001: 99–100, 154, 2000). Zagzebski likewise defines right and wrong actions by reference to the emotions, motives, and dispositions of virtuous and vicious agents. For example, “A wrong act = an act that the phronimos characteristically would not do, and he would feel guilty if he did = an act such that it is not the case that he might do it = an act that expresses a vice = an act that is against a requirement of virtue (the virtuous self)” (Zagzebski 2004: 160). Her definitions of duties, good and bad ends, and good and bad states of affairs are similarly grounded in the motivational and dispositional states of exemplary agents (1998, 2004, 2010). However, there could also be less ambitious agent-based approaches to virtue ethics (see Slote 1997). At the very least, an agent-based approach must be committed to explaining what one should do by reference to the motivational and dispositional states of agents. But this is not yet a sufficient condition for counting as an agent-based approach, since the same condition will be met by every virtue ethical account. For a theory to count as an agent-based form of virtue ethics it must also be the case that the normative properties of motivations and dispositions cannot be explained in terms of the normative properties of something else (such as eudaimonia or states of affairs) which is taken to be more fundamental. Beyond this basic commitment, there is room for agent-based theories to be developed in a number of different directions. The most important distinguishing factor has to do with how motivations and dispositions are taken to matter for the purposes of explaining other normative qualities. For Slote what matters are this particular agent’s actual motives and dispositions. The goodness of action A, for example, is derived from the agent’s motives when she performs A. If those motives are good then the action is good, if not then not. On Zagzebski’s account, by contrast, a good or bad, right or wrong action is defined not by this agent’s actual motives but rather by whether this is the sort of action a virtuously motivated agent would perform (Zagzebski 2004: 160). Appeal to the virtuous agent’s hypothetical motives and dispositions enables Zagzebski to distinguish between performing the right action and doing so for the right reasons (a distinction that, as Brady (2004) observes, Slote has trouble drawing). Another point on which agent-based forms of virtue ethics might differ concerns how one identifies virtuous motivations and dispositions. According to Zagzebski’s exemplarist account, “We do not have criteria for goodness in advance of identifying the exemplars of goodness” (Zagzebski 2004: 41). As we observe the people around us, we find ourselves wanting to be like some of them (in at least some respects) and not wanting to be like others. The former provide us with positive exemplars and the latter with negative ones. Our understanding of better and worse motivations and virtuous and vicious dispositions is grounded in these primitive responses to exemplars (2004: 53). This is not to say that every time we act we stop and ask ourselves what one of our exemplars would do in this situations. Our moral concepts become more refined over time as we encounter a wider variety of exemplars and begin to draw systematic connections between them, noting what they have in common, how they differ, and which of these commonalities and differences matter, morally speaking. Recognizable motivational profiles emerge and come to be labeled as virtues or vices, and these, in turn, shape our understanding of the obligations we have and the ends we should pursue. However, even though the systematising of moral thought can travel a long way from our starting point, according to the exemplarist it never reaches a stage where reference to exemplars is replaced by the recognition of something more fundamental. At the end of the day, according to the exemplarist, our moral system still rests on our basic propensity to take a liking (or disliking) to exemplars. Nevertheless, one could be an agent-based theorist without advancing the exemplarist’s account of the origins or reference conditions for judgments of good and bad, virtuous and vicious. 2.3 Target-Centered Virtue Ethics The touchstone for eudaimonist virtue ethicists is a flourishing human life. For agent-based virtue ethicists it is an exemplary agent’s motivations. The target-centered view developed by Christine Swanton (2003), by contrast, begins with our existing conceptions of the virtues. We already have a passable idea of which traits are virtues and what they involve. Of course, this untutored understanding can be clarified and improved, and it is one of the tasks of the virtue ethicist to help us do precisely that. But rather than stripping things back to something as basic as the motivations we want to imitate or building it up to something as elaborate as an entire flourishing life, the target-centered view begins where most ethics students find themselves, namely, with the idea that generosity, courage, self-discipline, compassion, and the like get a tick of approval. It then examines what these traits involve. A complete account of virtue will map out 1) its field, 2) its mode of responsiveness, 3) its basis of moral acknowledgment, and 4) its target. Different virtues are concerned with different fields. Courage, for example, is concerned with what might harm us, whereas generosity is concerned with the sharing of time, talent, and property. The basis of acknowledgment of a virtue is the feature within the virtue’s field to which it responds. To continue with our previous examples, generosity is attentive to the benefits that others might enjoy through one’s agency, and courage responds to threats to value, status, or the bonds that exist between oneself and particular others, and the fear such threats might generate. A virtue’s mode has to do with how it responds to the bases of acknowledgment within its field. Generosity promotes a good, namely, another’s benefit, whereas courage defends a value, bond, or status. Finally, a virtue’s target is that at which it is aimed. Courage aims to control fear and handle danger, while generosity aims to share time, talents, or possessions with others in ways that benefit them. A virtue, on a target-centered account, “is a disposition to respond to, or acknowledge, items within its field or fields in an excellent or good enough way” (Swanton 2003: 19). A virtuous act is an act that hits the target of a virtue, which is to say that it succeeds in responding to items in its field in the specified way (233). Providing a target-centered definition of a right action requires us to move beyond the analysis of a single virtue and the actions that follow from it. This is because a single action context may involve a number of different, overlapping fields. Determination might lead me to persist in trying to complete a difficult task even if doing so requires a singleness of purpose. But love for my family might make a different use of my time and attention. In order to define right action a target-centered view must explain how we handle different virtues’ conflicting claims on our resources. There are at least three different ways to address this challenge. A perfectionist target-centered account would stipulate, “An act is right if and only if it is overall virtuous, and that entails that it is the, or a, best action possible in the circumstances” (239–240). A more permissive target-centered account would not identify ‘right’ with ‘best’, but would allow an action to count as right provided “it is good enough even if not the (or a) best action” (240). A minimalist target-centered account would not even require an action to be good in order to be right. On such a view, “An act is right if and only if it is not overall vicious” (240). (For further discussion of target-centered virtue ethics see Van Zyl 2014; and Smith 2016). 2.4 Platonistic Virtue Ethics The fourth form a virtue ethic might adopt takes its inspiration from Plato. The Socrates of Plato’s dialogues devotes a great deal of time to asking his fellow Athenians to explain the nature of virtues like justice, courage, piety, and wisdom. So it is clear that Plato counts as a virtue theorist. But it is a matter of some debate whether he should be read as a virtue ethicist (White 2015). What is not open to debate is whether Plato has had an important influence on the contemporary revival of interest in virtue ethics. A number of those who have contributed to the revival have done so as Plato scholars (e.g., Prior 1991; Kamtekar 1998; Annas 1999; and Reshotko 2006). However, often they have ended up championing a eudaimonist version of virtue ethics (see Prior 2001 and Annas 2011), rather than a version that would warrant a separate classification. Nevertheless, there are two variants that call for distinct treatment. Timothy Chappell takes the defining feature of Platonistic virtue ethics to be that “Good agency in the truest and fullest sense presupposes the contemplation of the Form of the Good” (2014). Chappell follows Iris Murdoch in arguing that “In the moral life the enemy is the fat relentless ego” (Murdoch 1971: 51). Constantly attending to our needs, our desires, our passions, and our thoughts skews our perspective on what the world is actually like and blinds us to the goods around us. Contemplating the goodness of something we encounter—which is to say, carefully attending to it “for its own sake, in order to understand it” (Chappell 2014: 300)—breaks this natural tendency by drawing our attention away from ourselves. Contemplating such goodness with regularity makes room for new habits of thought that focus more readily and more honestly on things other than the self. It alters the quality of our consciousness. And “anything which alters consciousness in the direction of unselfishness, objectivity, and realism is to be connected with virtue” (Murdoch 1971: 82). The virtues get defined, then, in terms of qualities that help one “pierce the veil of selfish consciousness and join the world as it really is” (91). And good agency is defined by the possession and exercise of such virtues. Within Chappell’s and Murdoch’s framework, then, not all normative properties get defined in terms of virtue. Goodness, in particular, is not so defined. But the kind of goodness which is possible for creatures like us is defined by virtue, and any answer to the question of what one should do or how one should live will appeal to the virtues. Another Platonistic variant of virtue ethics is exemplified by Robert Merrihew Adams. Unlike Murdoch and Chappell, his starting point is not a set of claims about our consciousness of goodness. Rather, he begins with an account of the metaphysics of goodness. Like Murdoch and others influenced by Platonism, Adams’s account of goodness is built around a conception of a supremely perfect good. And like Augustine, Adams takes that perfect good to be God. God is both the exemplification and the source of all goodness. Other things are good, he suggests, to the extent that they resemble God (Adams 1999). The resemblance requirement identifies a necessary condition for being good, but it does not yet give us a sufficient condition. This is because there are ways in which finite creatures might resemble God that would not be suitable to the type of creature they are. For example, if God were all-knowing, then the belief, “I am all-knowing,” would be a suitable belief for God to have. In God, such a belief—because true—would be part of God’s perfection. However, as neither you nor I are all-knowing, the belief, “I am all-knowing,” in one of us would not be good. To rule out such cases we need to introduce another factor. That factor is the fitting response to goodness, which Adams suggests is love. Adams uses love to weed out problematic resemblances: “being excellent in the way that a finite thing can be consists in resembling God in a way that could serve God as a reason for loving the thing” (Adams 1999: 36). Virtues come into the account as one of the ways in which some things (namely, persons) could resemble God. “[M]ost of the excellences that are most important to us, and of whose value we are most confident, are excellences of persons or of qualities or actions or works or lives or stories of persons” (1999: 42). This is one of the reasons Adams offers for conceiving of the ideal of perfection as a personal God, rather than an impersonal form of the Good. Many of the excellences of persons of which we are most confident are virtues such as love, wisdom, justice, patience, and generosity. And within many theistic traditions, including Adams’s own Christian tradition, such virtues are commonly attributed to divine agents. A Platonistic account like the one Adams puts forward in Finite and Infinite Goods clearly does not derive all other normative properties from the virtues (for a discussion of the relationship between this view and the one he puts forward in A Theory of Virtue (2006) see Pettigrove 2014). Goodness provides the normative foundation. Virtues are not built on that foundation; rather, as one of the varieties of goodness of whose value we are most confident, virtues form part of the foundation. Obligations, by contrast, come into the account at a different level. Moral obligations, Adams argues, are determined by the expectations and demands that “arise in a relationship or system of relationships that is good or valuable” (1999: 244). Other things being equal, the more virtuous the parties to the relationship, the more binding the obligation. Thus, within Adams’s account, the good (which includes virtue) is prior to the right. However, once good relationships have given rise to obligations, those obligations take on a life of their own. Their bindingness is not traced directly to considerations of goodness. Rather, they are determined by the expectations of the parties and the demands of the relationship. 3. Objections to virtue ethics A number of objections have been raised against virtue ethics, some of which bear more directly on one form of virtue ethics than on others. In this section we consider eight objections, namely, the a) application, b) adequacy, c) relativism, d) conflict, e) self-effacement, f) justification, g) egoism, and h) situationist problems. a) In the early days of virtue ethics’ revival, the approach was associated with an “anti-codifiability” thesis about ethics, directed against the prevailing pretensions of normative theory. At the time, utilitarians and deontologists commonly (though not universally) held that the task of ethical theory was to come up with a code consisting of universal rules or principles (possibly only one, as in the case of act-utilitarianism) which would have two significant features: i) the rule(s) would amount to a decision procedure for determining what the right action was in any particular case; ii) the rule(s) would be stated in such terms that any non-virtuous person could understand and apply it (them) correctly. Virtue ethicists maintained, contrary to these two claims, that it was quite unrealistic to imagine that there could be such a code (see, in particular, McDowell 1979). The results of attempts to produce and employ such a code, in the heady days of the 1960s and 1970s, when medical and then bioethics boomed and bloomed, tended to support the virtue ethicists’ claim. More and more utilitarians and deontologists found themselves agreed on their general rules but on opposite sides of the controversial moral issues in contemporary discussion. It came to be recognised that moral sensitivity, perception, imagination, and judgement informed by experience—phronesis in short—is needed to apply rules or principles correctly. Hence many (though by no means all) utilitarians and deontologists have explicitly abandoned (ii) and much less emphasis is placed on (i). Nevertheless, the complaint that virtue ethics does not produce codifiable principles is still a commonly voiced criticism of the approach, expressed as the objection that it is, in principle, unable to provide action-guidance. Initially, the objection was based on a misunderstanding. Blinkered by slogans that described virtue ethics as “concerned with Being rather than Doing,” as addressing “What sort of person should I be?” but not “What should I do?” as being “agent-centered rather than act-centered,” its critics maintained that it was unable to provide action-guidance. Hence, rather than being a normative rival to utilitarian and deontological ethics, it could claim to be no more than a valuable supplement to them. The rather odd idea was that all virtue ethics could offer was, “Identify a moral exemplar and do what he would do,” as though the university student trying to decide whether to study music (her preference) or engineering (her parents’ preference) was supposed to ask herself, “What would Socrates study if he were in my circumstances?” But the objection failed to take note of Anscombe’s hint that a great deal of specific action guidance could be found in rules employing the virtue and vice terms (“v-rules”) such as “Do what is honest/charitable; do not do what is dishonest/uncharitable” (Hursthouse 1999). (It is a noteworthy feature of our virtue and vice vocabulary that, although our list of generally recognised virtue terms is comparatively short, our list of vice terms is remarkably, and usefully, long, far exceeding anything that anyone who thinks in terms of standard deontological rules has ever come up with. Much invaluable action guidance comes from avoiding courses of action that would be irresponsible, feckless, lazy, inconsiderate, uncooperative, harsh, intolerant, selfish, mercenary, indiscreet, tactless, arrogant, unsympathetic, cold, incautious, unenterprising, pusillanimous, feeble, presumptuous, rude, hypocritical, self-indulgent, materialistic, grasping, short-sighted, vindictive, calculating, ungrateful, grudging, brutal, profligate, disloyal, and on and on.) (b) A closely related objection has to do with whether virtue ethics can provide an adequate account of right action. This worry can take two forms. (i) One might think a virtue ethical account of right action is extensionally inadequate. It is possible to perform a right action without being virtuous and a virtuous person can occasionally perform the wrong action without that calling her virtue into question. If virtue is neither necessary nor sufficient for right action, one might wonder whether the relationship between rightness/wrongness and virtue/vice is close enough for the former to be identified in terms of the latter. (ii) Alternatively, even if one thought it possible to produce a virtue ethical account that picked out all (and only) right actions, one might still think that at least in some cases virtue is not what explains rightness (Adams 2006:6–8). Some virtue ethicists respond to the adequacy objection by rejecting the assumption that virtue ethics ought to be in the business of providing an account of right action in the first place. Following in the footsteps of Anscombe (1958) and MacIntyre (1985), Talbot Brewer (2009) argues that to work with the categories of rightness and wrongness is already to get off on the wrong foot. Contemporary conceptions of right and wrong action, built as they are around a notion of moral duty that presupposes a framework of divine (or moral) law or around a conception of obligation that is defined in contrast to self-interest, carry baggage the virtue ethicist is better off without. Virtue ethics can address the questions of how one should live, what kind of person one should become, and even what one should do without that committing it to providing an account of ‘right action’. One might choose, instead, to work with aretaic concepts (defined in terms of virtues and vices) and axiological concepts (defined in terms of good and bad, better and worse) and leave out deontic notions (like right/wrong action, duty, and obligation) altogether. Other virtue ethicists wish to retain the concept of right action but note that in the current philosophical discussion a number of distinct qualities march under that banner. In some contexts, ‘right action’ identifies the best action an agent might perform in the circumstances. In others, it designates an action that is commendable (even if not the best possible). In still others, it picks out actions that are not blameworthy (even if not commendable). A virtue ethicist might choose to define one of these—for example, the best action—in terms of virtues and vices, but appeal to other normative concepts—such as legitimate expectations—when defining other conceptions of right action. As we observed in section 2, a virtue ethical account need not attempt to reduce all other normative concepts to virtues and vices. What is required is simply (i) that virtue is not reduced to some other normative concept that is taken to be more fundamental and (ii) that some other normative concepts are explained in terms of virtue and vice. This takes the sting out of the adequacy objection, which is most compelling against versions of virtue ethics that attempt to define all of the senses of ‘right action’ in terms of virtues. Appealing to virtues and vices makes it much easier to achieve extensional adequacy. Making room for normative concepts that are not taken to be reducible to virtue and vice concepts makes it even easier to generate a theory that is both extensionally and explanatorily adequate. Whether one needs other concepts and, if so, how many, is still a matter of debate among virtue ethicists, as is the question of whether virtue ethics even ought to be offering an account of right action. Either way virtue ethicists have resources available to them to address the adequacy objection. Insofar as the different versions of virtue ethics all retain an emphasis on the virtues, they are open to the familiar problem of (c) the charge of cultural relativity. Is it not the case that different cultures embody different virtues, (MacIntyre 1985) and hence that the v-rules will pick out actions as right or wrong only relative to a particular culture? Different replies have been made to this charge. One—the tu quoque, or “partners in crime” response—exhibits a quite familiar pattern in virtue ethicists’ defensive strategy (Solomon 1988). They admit that, for them, cultural relativism is a challenge, but point out that it is just as much a problem for the other two approaches. The (putative) cultural variation in character traits regarded as virtues is no greater—indeed markedly less—than the cultural variation in rules of conduct, and different cultures have different ideas about what constitutes happiness or welfare. That cultural relativity should be a problem common to all three approaches is hardly surprising. It is related, after all, to the “justification problem” (see below) the quite general metaethical problem of justifying one’s moral beliefs to those who disagree, whether they be moral sceptics, pluralists or from another culture. A bolder strategy involves claiming that virtue ethics has less difficulty with cultural relativity than the other two approaches. Much cultural disagreement arises, it may be claimed, from local understandings of the virtues, but the virtues themselves are not relative to culture (Nussbaum 1993). Another objection to which the tu quoque response is partially appropriate is (d) “the conflict problem.” What does virtue ethics have to say about dilemmas—cases in which, apparently, the requirements of different virtues conflict because they point in opposed directions? Charity prompts me to kill the person who would be better off dead, but justice forbids it. Honesty points to telling the hurtful truth, kindness and compassion to remaining silent or even lying. What shall I do? Of course, the same sorts of dilemmas are generated by conflicts between deontological rules. Deontology and virtue ethics share the conflict problem (and are happy to take it on board rather than follow some of the utilitarians in their consequentialist resolutions of such dilemmas) and in fact their strategies for responding to it are parallel. Both aim to resolve a number of dilemmas by arguing that the conflict is merely apparent; a discriminating understanding of the virtues or rules in question, possessed only by those with practical wisdom, will perceive that, in this particular case, the virtues do not make opposing demands or that one rule outranks another, or has a certain exception clause built into it. Whether this is all there is to it depends on whether there are any irresolvable dilemmas. If there are, proponents of either normative approach may point out reasonably that it could only be a mistake to offer a resolution of what is, ex hypothesi, irresolvable. Another problem arguably shared by all three approaches is (e), that of being self-effacing. An ethical theory is self-effacing if, roughly, whatever it claims justifies a particular action, or makes it right, had better not be the agent’s motive for doing it. Michael Stocker (1976) originally introduced it as a problem for deontology and consequentialism. He pointed out that the agent who, rightly, visits a friend in hospital will rather lessen the impact of his visit on her if he tells her either that he is doing it because it is his duty or because he thought it would maximize the general happiness. But as Simon Keller observes, she won’t be any better pleased if he tells her that he is visiting her because it is what a virtuous agent would do, so virtue ethics would appear to have the problem too (Keller 2007). However, virtue ethics’ defenders have argued that not all forms of virtue ethics are subject to this objection (Pettigrove 2011) and those that are are not seriously undermined by the problem (Martinez 2011). Another problem for virtue ethics, which is shared by both utilitarianism and deontology, is (f) “the justification problem.” Abstractly conceived, this is the problem of how we justify or ground our ethical beliefs, an issue that is hotly debated at the level of metaethics. In its particular versions, for deontology there is the question of how to justify its claims that certain moral rules are the correct ones, and for utilitarianism of how to justify its claim that all that really matters morally are consequences for happiness or well-being. For virtue ethics, the problem concerns the question of which character traits are the virtues. In the metaethical debate, there is widespread disagreement about the possibility of providing an external foundation for ethics—“external” in the sense of being external to ethical beliefs—and the same disagreement is found amongst deontologists and utilitarians. Some believe that their normative ethics can be placed on a secure basis, resistant to any form of scepticism, such as what anyone rationally desires, or would accept or agree on, regardless of their ethical outlook; others that it cannot. Virtue ethicists have eschewed any attempt to ground virtue ethics in an external foundation while continuing to maintain that their claims can be validated. Some follow a form of Rawls’s coherentist approach (Slote 2001; Swanton 2003); neo-Aristotelians a form of ethical naturalism. A misunderstanding of eudaimonia as an unmoralized concept leads some critics to suppose that the neo-Aristotelians are attempting to ground their claims in a scientific account of human nature and what counts, for a human being, as flourishing. Others assume that, if this is not what they are doing, they cannot be validating their claims that, for example, justice, charity, courage, and generosity are virtues. Either they are illegitimately helping themselves to Aristotle’s discredited natural teleology (Williams 1985) or producing mere rationalizations of their own personal or culturally inculcated values. But McDowell, Foot, MacIntyre and Hursthouse have all outlined versions of a third way between these two extremes. Eudaimonia in virtue ethics, is indeed a moralized concept, but it is not only that. Claims about what constitutes flourishing for human beings no more float free of scientific facts about what human beings are like than ethological claims about what constitutes flourishing for elephants. In both cases, the truth of the claims depends in part on what kind of animal they are and what capacities, desires and interests the humans or elephants have. The best available science today (including evolutionary theory and psychology) supports rather than undermines the ancient Greek assumption that we are social animals, like elephants and wolves and unlike polar bears. No rationalizing explanation in terms of anything like a social contract is needed to explain why we choose to live together, subjugating our egoistic desires in order to secure the advantages of co-operation. Like other social animals, our natural impulses are not solely directed towards our own pleasures and preservation, but include altruistic and cooperative ones. This basic fact about us should make more comprehensible the claim that the virtues are at least partially constitutive of human flourishing and also undercut the objection that virtue ethics is, in some sense, egoistic. (g) The egoism objection has a number of sources. One is a simple confusion. Once it is understood that the fully virtuous agent characteristically does what she should without inner conflict, it is triumphantly asserted that “she is only doing what she wants to do and hence is being selfish.” So when the generous person gives gladly, as the generous are wont to do, it turns out she is not generous and unselfish after all, or at least not as generous as the one who greedily wants to hang on to everything she has but forces herself to give because she thinks she should! A related version ascribes bizarre reasons to the virtuous agent, unjustifiably assuming that she acts as she does because she believes that acting thus on this occasion will help her to achieve eudaimonia. But “the virtuous agent” is just “the agent with the virtues” and it is part of our ordinary understanding of the virtue terms that each carries with it its own typical range of reasons for acting. The virtuous agent acts as she does because she believes that someone’s suffering will be averted, or someone benefited, or the truth established, or a debt repaid, or … thereby. It is the exercise of the virtues during one’s life that is held to be at least partially constitutive of eudaimonia, and this is consistent with recognising that bad luck may land the virtuous agent in circumstances that require her to give up her life. Given the sorts of considerations that courageous, honest, loyal, charitable people wholeheartedly recognise as reasons for action, they may find themselves compelled to face danger for a worthwhile end, to speak out in someone’s defence, or refuse to reveal the names of their comrades, even when they know that this will inevitably lead to their execution, to share their last crust and face starvation. On the view that the exercise of the virtues is necessary but not sufficient for eudaimonia, such cases are described as those in which the virtuous agent sees that, as things have unfortunately turned out, eudaimonia is not possible for them (Foot 2001, 95). On the Stoical view that it is both necessary and sufficient, a eudaimon life is a life that has been successfully lived (where “success” of course is not to be understood in a materialistic way) and such people die knowing not only that they have made a success of their lives but that they have also brought their lives to a markedly successful completion. Either way, such heroic acts can hardly be regarded as egoistic. A lingering suggestion of egoism may be found in the misconceived distinction between so-called “self-regarding” and “other-regarding” virtues. Those who have been insulated from the ancient tradition tend to regard justice and benevolence as real virtues, which benefit others but not their possessor, and prudence, fortitude and providence (the virtue whose opposite is “improvidence” or being a spendthrift) as not real virtues at all because they benefit only their possessor. This is a mistake on two counts. Firstly, justice and benevolence do, in general, benefit their possessors, since without them eudaimonia is not possible. Secondly, given that we live together, as social animals, the “self-regarding” virtues do benefit others—those who lack them are a great drain on, and sometimes grief to, those who are close to them (as parents with improvident or imprudent adult offspring know only too well). The most recent objection (h) to virtue ethics claims that work in “situationist” social psychology shows that there are no such things as character traits and thereby no such things as virtues for virtue ethics to be about (Doris 1998; Harman 1999). In reply, some virtue ethicists have argued that the social psychologists’ studies are irrelevant to the multi-track disposition (see above) that a virtue is supposed to be (Sreenivasan 2002; Kamtekar 2004). Mindful of just how multi-track it is, they agree that it would be reckless in the extreme to ascribe a demanding virtue such as charity to people of whom they know no more than that they have exhibited conventional decency; this would indeed be “a fundamental attribution error.” Others have worked to develop alternative, empirically grounded conceptions of character traits (Snow 2010; Miller 2013 and 2014; however see Upton 2016 for objections to Miller). There have been other responses as well (summarized helpfully in Prinz 2009 and Miller 2014). Notable among these is a response by Adams (2006, echoing Merritt 2000) who steers a middle road between “no character traits at all” and the exacting standard of the Aristotelian conception of virtue which, because of its emphasis on phronesis, requires a high level of character integration. On his conception, character traits may be “frail and fragmentary” but still virtues, and not uncommon. But giving up the idea that practical wisdom is the heart of all the virtues, as Adams has to do, is a substantial sacrifice, as Russell (2009) and Kamtekar (2010) argue. Even though the “situationist challenge” has left traditional virtue ethicists unmoved, it has generated a healthy engagement with empirical psychological literature, which has also been fuelled by the growing literature on Foot’s Natural Goodness and, quite independently, an upsurge of interest in character education (see below). 4. Future Directions Over the past thirty-five years most of those contributing to the revival of virtue ethics have worked within a neo-Aristotelian, eudaimonist framework. However, as noted in section 2, other forms of virtue ethics have begun to emerge. Theorists have begun to turn to philosophers like Hutcheson, Hume, Nietzsche, Martineau, and Heidegger for resources they might use to develop alternatives (see Russell 2006; Swanton 2013 and 2015; Taylor 2015; and Harcourt 2015). Others have turned their attention eastward, exploring Confucian, Buddhist, and Hindu traditions (Yu 2007; Slingerland 2011; Finnigan and Tanaka 2011; McRae 2012; Angle and Slote 2013; Davis 2014; Flanagan 2015; Perrett and Pettigrove 2015; and Sim 2015). These explorations promise to open up new avenues for the development of virtue ethics. Although virtue ethics has grown remarkably in the last thirty-five years, it is still very much in the minority, particularly in the area of applied ethics. Many editors of big textbook collections on “moral problems” or “applied ethics” now try to include articles representative of each of the three normative approaches but are often unable to find a virtue ethics article addressing a particular issue. This is sometimes, no doubt, because “the” issue has been set up as a deontologicial/utilitarian debate, but it is often simply because no virtue ethicist has yet written on the topic. However, the last decade has seen an increase in the amount of attention applied virtue ethics has received (Walker and Ivanhoe 2007; Hartman 2013; Austin 2014; Van Hooft 2014; and Annas 2015). This area can certainly be expected to grow in the future, and it looks as though applying virtue ethics in the field of environmental ethics may prove particularly fruitful (Sandler 2007; Hursthouse 2007, 2011; Zwolinski and Schmidtz 2013; Cafaro 2015). Whether virtue ethics can be expected to grow into “virtue politics”—i.e. to extend from moral philosophy into political philosophy—is not so clear. Gisela Striker (2006) has argued that Aristotle’s ethics cannot be understood adequately without attending to its place in his politics. That suggests that at least those virtue ethicists who take their inspiration from Aristotle should have resources to offer for the development of virtue politics. But, while Plato and Aristotle can be great inspirations as far as virtue ethics is concerned, neither, on the face of it, are attractive sources of insight where politics is concerned. However, recent work suggests that Aristotelian ideas can, after all, generate a satisfyingly liberal political philosophy (Nussbaum 2006; LeBar 2013a). Moreover, as noted above, virtue ethics does not have to be neo-Aristotelian. It may be that the virtue ethics of Hutcheson and Hume can be naturally extended into a modern political philosophy (Hursthouse 1990–91; Slote 1993). Following Plato and Aristotle, modern virtue ethics has always emphasised the importance of moral education, not as the inculcation of rules but as the training of character. There is now a growing movement towards virtues education, amongst both academics (Carr 1999; Athanassoulis 2014; Curren 2015) and teachers in the classroom. One exciting thing about research in this area is its engagement with other academic disciplines, including psychology, educational theory, and theology (see Cline 2015; and Snow 2015). Finally, one of the more productive developments of virtue ethics has come through the study of particular virtues and vices. There are now a number of careful studies of the cardinal virtues and capital vices (Pieper 1966; Taylor 2006; Curzer 2012; Timpe and Boyd 2014). Others have explored less widely discussed virtues or vices, such as civility, decency, truthfulness, ambition, and meekness (Calhoun 2000; Kekes 2002; Williams 2002; and Pettigrove 2007 and 2012). One of the questions these studies raise is “How many virtues are there?” A second is, “How are these virtues related to one another?” Some virtue ethicists have been happy to work on the assumption that there is no principled reason for limiting the number of virtues and plenty of reason for positing a plurality of them (Swanton 2003; Battaly 2015). Others have been concerned that such an open-handed approach to the virtues will make it difficult for virtue ethicists to come up with an adequate account of right action or deal with the conflict problem discussed above. Dan Russell has proposed cardinality and a version of the unity thesis as a solution to what he calls “the enumeration problem” (the problem of too many virtues). The apparent proliferation of virtues can be significantly reduced if we group virtues together with some being cardinal and others subordinate extensions of those cardinal virtues. Possible conflicts between the remaining virtues can then be managed if they are tied together in some way as part of a unified whole (Russell 2009). This highlights two important avenues for future research, one of which explores individual virtues and the other of which analyses how they might be related to one another.
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins and Schweinberger 2011, Jenkins, White et al. 2011, Mike Burton 2013, Burton, Kramer et al. 2016, Menon, Kemp and White 2018).

      We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

      Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales

      While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin and Watt 2009, Dumont, Roux-Sibilon and Goffaux 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.

      Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell, Sinha et al. 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Troje and Bulthoff 1996, Vuong, Peissig et al. 2005, Russell, Sinha et al. 2006, Russell, Biederman et al. 2007, Jiang, Dricot et al. 2009). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon and Goffaux 2024).

      For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.

      Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen, Essock et al. 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Keil 2008, Keil 2009, Goffaux and Greenwood 2016). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.

      We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.

      In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil 2008, Keil 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.

      Reviewer #1 (Recommendations for the authors):

      I generally found the paper to be very well-written, so I have only a few minor comments here.

      (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.

      Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.

      In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.

      Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin and Watt 2009, Goffaux and Greenwood 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken and Shapley 2003).

      We revised the data analysis section to include a justification for our use of a Gaussian model: ‘Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.’

      (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.

      We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.

      (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!

      The procedure section now starts with the description of the familiarization task.

      (4) p. 3 - "Culminates" doesn't seem like the right word here.

      We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.

      (5) p. 5 - I think "with the multiple" shouldn't have "the".

      Indeed, we removed the “the”.

      Reviewer #2 (Recommendations for the authors):

      I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.

      While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction. We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.

      References.

      Burton, A. M., R. Jenkins and S. R. Schweinberger (2011). "Mental representations of familiar faces." Br J Psychol 102(4): 943-958.

      Burton, A. M., R. S. Kramer, K. L. Ritchie and R. Jenkins (2016). "Identity From Variation: Representations of Faces Derived From Multiple Instances." Cogn Sci 40(1): 202-223.

      Dakin, S. C. and R. J. Watt (2009). "Biological "bar codes" in human faces." J Vis 9(4): 2 1-10.

      Dumont, H., A. Roux-Sibilon and V. Goffaux (2024). "Horizontal face information is the main gateway to the shape and surface cues to familiar face identity." PLoS One 19(10): e0311225.

      Goffaux, V. and J. A. Greenwood (2016). "The orientation selectivity of face identification." Scientific Reports 6(34204): 34204.

      Hansen, B. C., E. A. Essock, Y. Zheng and J. K. DeFord (2003). "Perceptual anisotropies in visual processing and their relation to natural image statistics." Network 14(3): 501-526.

      Jenkins, R., D. White, X. Van Montfort and A. Mike Burton (2011). "Variability in photos of the same face." Cognition 121(3): 313-323.

      Jiang, F., L. Dricot, V. Blanz, R. Goebel and B. Rossion (2009). "Neural correlates of shape and surface reflectance information in individual faces." Neuroscience 163(4): 1078-1091.

      Keil, M. S. (2008). "Does face image statistics predict a preferred spatial frequency for human face processing?" Proc Biol Sci 275(1647): 2095-2100.

      Keil, M. S. (2009). ""I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing." PLoS Comput Biol 5(3): e1000329.

      Menon, N., R. I. Kemp and D. White (2018). "More than a sum of parts: robust face recognition by integrating variation." R Soc Open Sci 5(5): 172381.

      Mike Burton, A. (2013). "Why has research in face recognition progressed so slowly? The importance of variability." Q J Exp Psychol (Hove) 66(8): 1467-1485.

      Ringach, D. L., M. J. Hawken and R. Shapley (2003). "Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression." Journal of neurophysiology 90(1): 342-352.

      Russell, R., I. Biederman, M. Nederhouser and P. Sinha (2007). "The utility of surface reflectance for the recognition of upright and inverted faces." Vision Res 47(2): 157-165.

      Russell, R., P. Sinha, I. Biederman and M. Nederhouser (2006). "Is pigmentation important for face recognition? Evidence from contrast negation." Perception 35(6): 749-759.

      Troje, N. F. and H. H. Bulthoff (1996). "Face recognition under varying poses: the role of texture and shape." Vision Res 36(12): 1761-1771.

      Vuong, Q. C., J. J. Peissig, M. C. Harrison and M. J. Tarr (2005). "The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles." Vision Res 45(10): 1213-1223.

    1. Reviewer #2 (Public review):

      I think this paper is an excellent and timely contribution. It clearly shows that learning overlapping relationships in a disjoint training schedule (where the overlaps are not encountered close together in time) appears to aid the formation of an integrated associative memory structure (a cognitive map) and supports generalisation. I believe the methods are sound and the results are clear. I only have a couple of methodological questions that may not warrant any changes to the paper (or only very minor changes/additions):

      (1) The mixed effects models did not include random slopes for the within-subject factors ("spatial manipulation" and "block"), and so the corresponding fixed effect inferences may be unsafe. Having said that, it is likely that including these slopes may not be warranted given their contribution to the model's fit. I recommend that the authors check this.

      (2) The mixed effects models for accuracy appear to model average performance across trials rather than using a generalised linear model with a (e.g.) logit link function and the binomial distribution to characterise performance. I think this is a little sub-optimal, as the latter is often more sensitive. Nonetheless, it is not in any way wrong; the results are clear enough as is, and there may be a good reason to avoid a non-linear link function, which can alter the interpretation of effects close to the ceiling and floor.

      I think the introduction and/or discussion would benefit from contrasting their results with Berens & Bird (2022, PLOS Comp Bio). In this paper, it is shown that blocking the training of discriminations in a linear hierarchy (what we call progressive training) substantially benefited transitive inference performance. This seems at odds with the author's finding that "participants struggle to integrate information across rows and columns, i.e. across groups of transitions that were trained separately in time".

      I would really like to know what the authors think about this discrepancy (or, indeed, whether they think there is one at all). Is it possibly because "progressive" learning is some combination of "grouping", "blocking" and "chaining" (where there is a structured overlap between adjacently trained relationships)? Or is it something else, e.g., that there is a fundamental difference between learning associations and discriminations (personally, I lean on this explanation)?

      Relevant to this, the authors note that their "findings do contradict recent reports from the category learning literature, where blocking seems to help learning and generalisation (Dekker et al., 2022; Flesch et al., 2018; Noh et al., 2016). It may be that where the goal is not to learn a complex knowledge structure - like a map - but simply to compress exemplars by mapping them onto a smaller number of labels - the benefits of blocking emerge." However, the benefit of progressive (blocked) training in my own work was observed in a task that required learning a complex/relational structure in the form of a transitive hierarchy, which theoretical accounts suggest depends on learning map-like representations (Whittington et al., 2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public review): 

      Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer for reviewing our manuscript and for the important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Fig. 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain some astrocytes, we think these contribute to the observed pro-inflammatory changes and emphasize the feasibility of capturing this response in organoids in vitro. This is also important because ADM is known to have anti-inflammatory properties and should be investigated as such in future studies focused on hypoxia-induced inflammation.

      In the manuscript, we included a few sentences in the discussion to address the lack of in-depth analyses of inflammation as a limitation of our study.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, and increased expression of RAMP2 receptors on neurons, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the current experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision, we plotted and included in the figures the data about the cell-type expression of ADM and its receptors in hCOs (Fig. S3)

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrome, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. The manuscript to discuss that previous studies show that failure of interneurons to migrate and reach their designated targets within the appropriate developmental window leads to their elimination through apoptosis. Decreased numbers (or abnormal development) of interneurons are associated with neurodevelopmental impairments and abnormal functional connectivity in the brain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the manuscript we used the single-cell RNA sequencing data and immunostainings to provide this information. As expected based on our previous reports, most cortical interneurons present in organoids are represented by calretinin (CALB2), somatostatin (SST) and calbindin (CALB1). These data are now presented in Fig. S3.

      Separately, we used available scRNA-seq data from developing human brain and showed that at ~20 PCW, the developing human brain has similar types of cortical interneurons. These data are now included in Fig. S5.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Fig. S1).

      We do agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hSOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we added data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We expanded our discussion to include more details and the need to validate these findings using in vivo models.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we speculate these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time, and thus this hypothesis remains speculative and only included in the discussion.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other interneuron developmental processes during cortical development. In the manuscript, we included text in the discussion about the likely effects of hypoxia on interneuron proliferation, maturation and circuit integration.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error; we corrected it in our revision.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake. 

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness. 

      In the revised version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation. The key additional experiment is the phenotype of reduced bacterial uptake in low serum, but not in high serum conditions. The authors suggest this could be due to the SM from serum itself affecting the entry. While this explanation is plausible, prolonged exposure of cells to low serum is well documented to alter several cellular functions, particularly in the context of this manuscript, lysosomal positioning, exocytosis and Ca2+ signaling. A better control here could be WT cells grown in low serum.

      As the reviewer suggested, we did culture both, WT control cells as well as ASM knock-outs, under low serum conditions before conducting the invasion assays. Hence, the detected effects on S. aureus invasion must be caused by lack of functional ASM in the mutant.

      We apologize that this did not become evident from the manuscript’s text. We thus included a change in line 259 which now reads:

      ”To test whether FBS confounded our invasion experiments, we cultivated WT as well as ASM K.O. cells in medium with reduced FBS concentration (1%) and determined the S. aureus invasion efficiency (Figure 2I).”

      If SM in serum can interfere, why do they see such pronounced phenotype on bacterial entry in WT cells upon chemical inhibition?

      We explain the differences between inhibitor-treated WT cells and ASM K.O.s by the severe accumulation of SM upon genetic ablation of ASM. We demonstrated this by HPLC-MS/MS measurements in Figure 2L. If cells were cultured in 10% FBS, an ASM K.O. resulted in approx. 4-times higher levels of cellular SM C18:0 when compared to WT cells, while amitriptyline treatment of WT cells had no effect, and ARC39 treatment increased SM C18:0 levels only by 2-fold. This likely results from different durations of SM accumulation in the cell pools which is caused either by complete absence of ASM (in case of the ASM K.O.) or only in the hour-range upon treatment with the inhibitors.

      Under low serum conditions, the severe SM C18:0 accumulation in the ASM K.O. was found decreased (from 4-fold to 2-fold when compared to WT cells; Figure 2M). Here, the WT cells used as reference also were cultured in the same manner as the ASM K.O. A similar pattern was observed for other SM species (Supp. Figure 3). This correlates with the S. aureus invasion phenotype in ASM K.O.: under high serum conditions (and resulting in severe SM accumulation) we did not detect an invasion defect, while under low serum conditions (resulting in only moderate SM accumulation) S. aureus invasion was reduced in the knock-outs when compared to WT cells cultured in the same conditions, respectively.

      While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      Since the comments starting with the line above are identical to the previous comments by the reviewer, we assume that these points of criticism still resound with the Reviewer, although we had agreed previously with the reviewer that we do not show formation of ceramide-enriched platforms, we had changed the manuscript accordingly in the previous revision round already (see also our comment below).

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We continue to share the reviewer’s desire to discriminate between ASM-dependent and ASMindependent processes, but the simultaneous occurrence of multiple pathways of bacterial uptake is currently the limiting factor and technological challenge in our laboratory, since these events happen rapidly. We do hope that we or others will be able to address these limitations in the future, for instance with the technologies suggested by the reviewer.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASMmediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ? 

      We here want to elaborate again, since our revision experiments demonstrate the ASM-dependency of the rapid uptake under low serum conditions – see also above. We were convinced that the genetic evidence of an S. aureus invasion phenotype in ASM K.O.s under these conditions would eliminate the reviewer’s concern about the role of ASM during the bacterial invasion (see also above). Our lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype observed by us.

      We agree with the reviewer, however, that it remains elusive why changes in the sphingolipidome increase ASM-independent S. aureus internalization by host cells. One explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus in certain cell types (3, 4). In other cell types, such as A549 cells, S. aureus invades in an αtoxin and caveolin-1 dependent fashion (5). It will be interesting to study, to what extent such processes as described by Goldmann and colleagues will depend on ASM. However, a characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript. 

      As to the centrality of the pathway: we cannot and do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were intrigued by our finding of an ASM dependent uptake pathway for S. aureus – especially its speed. In different as of yet still unidentified host cell types or cell lines such a pathway may pose a major entry point for pathogens. Alternatively, we may have identified an ASM-dependent mode of receptor uptake, with which the bacteria “piggyback” into the cells.

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We again want to add that we measured phagosomal escape of S. aureus in WT and ASM K.O. cells cultured in 1% FBS (low serum conditions) and compared it to escape rates obtained with host cells cultured in 10% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (see Author response image 1). This was addressed already during the manuscript’s first revision. We found that escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium.

      Author response image 1.

      We therefore think that prolonged absence of ASM has additional side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      As it is currently unclear in how far the prolonged absence of ASM activity affects cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      Knock-downs in our laboratory are based on the vector pLVTHM(6). Inducible knock-downs in the cells would require the introduction of an inducible Tet<sup>on</sup> system, which the cells currently do not harbor.

      However, it needs to be stated that for optimal gene knock-downs, the induction of this system has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (7). However, the course of infection in macrophages differs from non-professional phagocytes (8). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms (see also above). We thus already had changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also had added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) Reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) Increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we had added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (9).”

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection. 

      We again thank the reviewer for this suggestion. We already had included the following section in our discussion (then: line 593): “Since fluorescent calcium reporters allow to monitor this process microscopically, future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References for the purpose of this response letter:

      (1) Rappaport, J., C. Garnacho, and S. Muro, Clathrin-mediated endocytosis is impaired in type AB Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm, 2014. 11(8): p. 2887-95.

      (2) Rappaport, J., et al., Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm, 2015. 12(5): p. 1366-76.

      (3) Hoffmann, C., et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci, 2010. 123(Pt 24): p. 4280-91.

      (4) Tricou, L.-P., et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports, 2024. 14(1): p. 28643.

      (5) Goldmann, O., et al., Alpha-hemolysin promotes internalization of Staphylococcus aureus into human lung epithelial cells via caveolin-1- and cholesterol-rich lipid rafts. Cell Mol Life Sci, 2024. 81(1): p. 435.

      (6) Wiznerowicz, M. and D. Trono, Conditional suppression of cellular genes: lentivirus vectormediated drug-inducible RNA interference. J Virol, 2003. 77(16): p. 8957-61.

      (7) Li, C., et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal, 2018. 28(10): p. 916-934.

      (8) Moldovan, A. and M.J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol, 2019. 21(3): p. e12997.

      (9) Rühling, M., et al., Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio, 2025. 0(0): p. e03654-24.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The study does not explore or discuss how oral ingestion of Nora virus leads to the colonization of stem cells, which are located basally in the gut. This mechanism should be discussed.

      We have added an additional paragraph (4th) in the Discussion dealing with this issue and are further discussing the consequences of RNAi potentially not being functional in progenitor cells in the paragraph on antiviral responses.

      (2) The authors fail to detect Dicer-GFP fusion protein expression in stem cells, a finding that could explain why the virus persists in these cells. Further investigation is needed to determine whether RNAi functions are effective in stem cells compared to enterocytes. For clarification, the authors could cross esg-Gal4 UAS-GFP and Myo-Gal4 UAS-GFP with UAS GFP-RNAi and/or express a Dicer-GFP construct under a stem cell-specific driver.

      Actually, it is well-known in the Drosophila literature on the intestinal epithelium that RNAi functions well in progenitor cells as the technique has been widely used to understand the control of stem cell division and differentiation in tens of articles. We provide here just a few examples: Jiang et al., Nat Commun (2025) https://doi.org/10.1038/s41467-024-55255-1; Zhai et al., PLoS Genetics (2017) https://doi.org/10.1371/journal.pgen.1006854; Wu et al., https://doi.org/10.1371/journal.pgen.1009649.

      (3) The presentation of experimental parameters (e.g., pathogen type, temperature, time points) should be improved in the results section and at the top of the figures to enhance clarity. Additionally, details regarding the mode of oral infection (continuous exposure vs. single feeding on a filter) should be specified. Given that fly stock flipping frequency influences microbiota load (as noted in Broderick et al.), this should be reported, especially for lifespan studies.

      P. aeruginosa oral infection was always by continuous exposure, as detailed in the Mat.& Meth. section. Nora infection was done by exposure to the viral solution for 24h, as detailed in Mat. & Meth. The flipping frequency had also been reported in that section.

      (4) To confirm that enterocyte colonization requires stem cell proliferation and differentiation, the authors should analyze Nora virus localization in JAK-STAT-deficient flies infected with bacteria or toxicants. This would help determine whether the virus can infect enterocytes in the absence of enterocyte differentiation, but stimulation of stem cells.

      We now provide these data (pictures and quantification) in Fig.7 G-H and discuss them in the main text.

      (5) The study does not discuss the spatial distribution of Nora virus infection along the gut. Specifically, it remains unclear whether viral colonization is higher in gut regions R2 and R3, which contain proliferative stem cells. Addressing this could provide valuable insights into the virus's infection dynamics.

      We have now specified that Nora virus was detected only in the posterior midgut; we are now also providing a schematic illustration in Fig. S5J.

      Recommendations for the authors:

      Major Suggestion

      See weaknesses section for key areas requiring improvement.

      Minor Suggestions

      (1) Line 79: Mention Nox in the text. Key references on Nox include Jones (2013), Iatsenko (2018), and Patel (2016).

      Done.

      (2) Line 92: The long list of publications is unnecessary and can be shortened.

      We are not sure that many investigators are aware of the scope of our studies on host-pathogen relationships and this is the adequate place for a reminder.

      (3) Line 196: Cite Choi et al. (Aging Cell, 2008; 7:318-334. doi: 10.1111/j.1474- 9726.2008.00380.x) for the initial work on gut dysplasia during aging. However, note that dysbiosis in aging is demonstrated in Buchon et al. (2009, Genes and Development) and other studies.

      Done.

      (4) Line 265: It would be interesting to clarify whether the shortened lifespan of Norainfected flies after a clean injury is dependent on the microbiota.

      The shortened life span of Nora-infected flies is not due to the injury as demonstrated in Fig. S4F. Hence, the shortened lifespan is differentially affected by the microbiota according to nutrition conditions as documented in Fig. 3D-E.

      (5) Line 285: Clarify what is meant by "polyubiquitin promoter"-do the authors mean a ubiquitous Gal4 driver? Specify the Gal4 lines used in the result section.

      Done. The construct is a direct fusion of the ubiquitin p63E promoter to the Dicer-fluorescent protein sequences as described in Girardi et al., Sci Rep, 2015.

      (6) Line 347: Indicate the references aligning with the most recent studies on this topic.

      Done.

      (7) Line 373 and elsewhere: Mention studies that have shown the microbiota influence on lifespan, in relation to dietary richness.

      Done.

      (8) Line 588: Provide details on the method used for hemolymph collection.

      Done.

      (9) Line 964: Clarify the phrase "as previously shown"-where in this paper was it demonstrated?

      The legends have been rewritten and the phrase has been deleted.

      (10) Line 987: In "survival of non-infested with PA14," explicitly mention Nora to distinguish between different infections.

      Done.

      Figures & Experimental Details

      (11) Figures: Improve figure legends or add information at the top of figures, specifying:

      Number of flies used to monitor Nora virus titer.

      Temperature conditions. o Age of flies used in experiments.

      Done.

      (12) Figure 2E: The lifespan of Nora-negative flies appears very short. Was this lifespan assay conducted at 29{degree sign}C? What was the fly stock flipping rate?

      Correct, it was 29°C. As described in the Material and Methods section, the flies were flipped every two (29°C) to four days (25°C).

      (13) Figure 4C: Improve labeling on the plate for better clarity.

      Done.

      (14) Figure 6C: The figure legend on the right is difficult to interpret. Clarify what "+" indicates and explicitly write out the genotype. Is NP identical to NPG4G80?

      Done. NP is the NP1 driver. We usually use it in a version that also includes a Gal80<sup>ts</sup> transgene to express the gene of interest only at the adult stage.

      (15) Dissection Details: Clearly state which part of the gut was dissected-midgut, entire gut, {plus minus} Malpighian tubules. This should be specified in the results section.

      Done (no Malpighian tubules nor crop) for RTqPCR analyses.

      (16) Clean Injury: Provide more details in the results section regarding the injury site and needle size.

      Done.

      (17) Use "Abx" instead of "AntiB," as the former is more commonly recognized.

      Done.

      Reviewer #2 (Public review):

      The title does not seem to be fully supported by the data. While the authors convincingly show the increased sensitivity to Pseudomonas infection, effects on another tested bacterium, Serratia marcescens, were not significantly different between Nora-virus-infected and noninfected flies. Thus, effects of 'intestinal infection' seem to be too broad a claim.

      We agree with the reviewer and have accordingly modified the title, which now explicitly refers to P. aeruginosa.

      Also, whether the Nora virus increases sensitivity to oxidative stress is not so clear to me: the figure that supports this claim is the survival assay of Figure 5F. However, the difference in survival between control and paraquat-treated Nora (-) flies seems to be in the same order as between control and paraquat-treated Nora (+) flies. Rather, cause and effect seem to be the reverse: paraquat increases ISC proliferation, higher viral loads, and consequently shorter survival. I suggest rephrasing the title and conclusions accordingly.

      While we usually just directly compare Nora (+) vs. Nora (-) flies with the same conditions, we note that the difference of survival between control and paraquat-treated Nora (-) flies is of about 9 days, based on LT50 values whereas it is of 8 days for Nora(+) flies. This difference is of about two days when comparing Nora (+) to Nora (-) flies exposed to paraquat. Thus, Nora does contribute to an increased sensitivity to oxidative stress likely by the process highlighted by the reviewer and also by its own detrimental action on the homeostasis of the intestinal epithelium and associated disruption of its barrier function.

      Quantification of immunofluorescence microscopy is missing, rendering the images somewhat anecdotal. Quantification should be provided. It will then also be of interest to quantify the number of Nora (+) cells, and the Nora virus levels per infected cell (e.g. Figure 5H). Also, the claim that the Nora virus initially infects ISC and later (upon stress) infects enterocytes requires quantification.

      Missing quantifications of pictures have been added: Figs. S5E and 7H. We are not sure we understand the reviewer comment on “Nora virus levels per infected cell”: the signal we are seeing may correspond to aggregates of the virus and would be impossible to quantify reliably, e.g., in the right-most panel of Fig. 5H. Fig. 5I clearly shows that no Nora is detected in enterocytes of young 5-day-old flies in the absence of infectious or xenobiotic challenge.

      Genetic support for the role of the JAK-STAT pathway in driving ISC proliferation and supporting Nora virus replication is convincing. It would also be of interest to analyze other pathways implicated in ISC proliferation (e.g. JNK, EGFR), especially given the observations of Nigg et al, showing an involvement of STING/NF-kB and EGFR pathway in driving intestinal phenotypes of Drosophila A virus-infected flies (doi: 10.1016/j.cub.2024.05.009).

      We agree with the reviewer that these would be interesting experiments to perform, especially in the light of one hypothesis that antiviral defenses may prevent the initial infection of enterocytes as discussed at length in our updated discussion on host antiviral defenses. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions. In this work, we used the interference with the JAK-STAT pathway as a second tool to block the division of ISCs.

      Figure 5E: An intriguing observation is that GFP:Dicer2 seems to be unstable in Nora virusinfected cells. Here, GFP control driven by the same driver line would be required to confidently conclude that this is due to an effect on Dicer-2 specifically.

      Actually, this experiment was not performed using the Gal4-UAS system but a direct fusion. We do know that GFP is stable when expressed in enterocytes, e.g., Lee et al., Cell Host&Microbe (2016) DOI: 10.1016/j.chom.2016.10.010.

      Legends are mostly conclusive, and essential information about the experimental setup is missing in the captions of multiple figures, making the interpretation of the data difficult. See my private recommendations for suggestions to improve the data presentation.

      Done.

      Recommendations for the authors:

      Suggestions for the presentation of the data:

      (1) I found the names Ore-R(SC) and Ore-R(SM) for noninfected vs infected Ore-R flies not very intuitive. I suggest renaming them into something that makes the infection status clear.

      These notations refer to two distinct sub-strains that may reflect different origins with some likely genetic drift accounting for the distinct properties of the two sub-strains. As the ORE-R (SM) have different infection status: infested, cleaned, re-infected, we fear that this would not clarify the matter. Of note, ORE-R(SC) are refractory to Nora virus infection (Fig. S1I).

      (2) Please define the number of flies analyzed for survival assays in the legends.

      Done.

      (3) The authors provide conclusions in most of the figure legends, without providing an explanation of the experiment that was done. Conclusions should be used sparingly, if at all, in legends. Also, relevant information is often missing in the legends (time points after infection, Figure 2E food source, etc.). I suggest the authors carefully double-check their legends and rephrase the conclusive legends with descriptive ones.

      Done. The figure legends have been rewritten.

      (4) Several of the legends indicate that 'data represent the mean of biological triplicates' however some panels do not represent triplicates (e.g. Figure 1C-E). Please correct.

      Done.

      (5) Legends: which multiple comparison test was used for ANOVA?

      Done. Tukey’s post-hoc test was used for direct comparisons.

      (6) Line 888: black arrows are not shown in the figure.

      Corrected.

      (7) Figure 1F: legend on the figure seems incorrect (all are labeled Nora (+)); likewise for Figure 2C (all labeled Nora (-)).

      Corrected.

      (8) Materials and methods: please describe how the Nora virus antibody was raised (and specify on line 271 what viral protein is recognized).

      Done. As the whole virus was used for immunization, we cannot state which specific viral proteins are detected by the antibody.

      (9) Please define what is presented in the box plots (mean, range, whiskers, individual data points).

      Done.

      (10) Figure 4 and associated text (line 221): a brief explanation of the Smurf assay would be useful.

      Done.

      (11) Figure 4C: I did not find the picture of the agar plate informative, as similar information is conveyed in Figure 4D. Also, the labelling cannot be clearly read.

      Figure 4D provides a quantification of panel C. The readability has been improved.

      (12) Figure 4C: It is suggested that Nora-positive, smurf-negative flies were analyzed, but from Figure 4B it seems that these do not exist. Please explain.

      The data in Fig. 4B do not represent absolute numbers but percentages. Thus, there were at most 50% of SMURF-positive flies at the time of the assay, the rest being Smurf-negative yet Nora-positive.

      (13) The abbreviations PA14 and Db11 are used in several figures. I would suggest defining the abbreviation in the legend to facilitate interpretation.

      Done.

      (14) Figure 5A/5G: the Nora virus RNA levels in this figure are dramatically lower than the levels in other figure panels. Please check/correct.

      Done. The reviewer is indeed correct: we have forgotten to write that for these two panels, the loads are relative and not absolute as is the case in other panels. 5A: the load in whole flies was taken to be 1; 5G: untreated Nora-positive flies were taken to be 1.

      (15) Figure 6A: total number of AporTag positive cells are reported. Were the same number of total cells analyzed? Please define.

      We have not counted all of the cells in each midgut but provide the number of ApopTag positive cells per midgut. We thus make the assumption that the overall number of midgut cells is not varying much from one midgut to the other. Visual inspection of DAPI-stained nuclei did not reveal any obvious change in the density of enterocyte nuclei as illustrated in Fig. S6 (we guess that everyone in the field is making the same assumption when counting mitotic ISCs with PHH3 staining).

      (16) Figure 6C: I find the shades of blue difficult to distinguish and suggest to us other colors.

      Done.

      (17) There seems to be a large mismatch between the percentage of Nora virus-positive cells in Figures 5C, 6H and the images of Figures 5G and 5H. Why?

      We think there might be a mistake with the Figure numbers cited by the referee. We guess the point the referee was trying to raise is the difference of perceived Nora virus burden between Fig. 5H and Fig. 6G, a quite valid point. For Fig. 5H, we had measured the Nora-virus load by RTqPCR (Fig. 5G, relative burden) but had not quantified the images. This is now done and shown in Fig. 5I. In Fig. 5H, young flies were used and hence there was no Nora virus detected in ECs, as now quantified in Fig. 5I. For Fig. 6G, we had to use 30-day old intestines to be able to observe Nora virus in the enterocytes of the controls. We have now included this important point in the main text and in the Figure legends.

      (18) The Title of the legend in Figure 7 is not supported by the data as 'spread through the intestine' has not been analyzed. Please adjust.

      Done.

      (19) All figures in which ANOVA is used: I assume that anything not labeled with an asterisk was found to be non-significant? If so, this should be indicated in the manuscript.

      Actually, we have not highlighted obvious differences to maintain clarity (e.g., Fig. 1E between uncured Ore-R(SM) and cured Ore-R(SC). We thus have underlined the biologically relevant differences in the panels. The interested readr can refer to the primary data that are accessible on a data repository.

      (20) Figure 7C: the authors may want to contrast their finding that Upd3 was not upregulated in Nora virus-infected flies (in the absence of PA14) with the findings of Kuyateh et al, who did report upregulation of Upd3 (https://doi.org/10.3390/v15091849).

      We thank the reviewer for pointing out this study we were unaware of. We would like to point out that this article is difficult to follow as it is not 100% clear in which of the analyzed studies the induction of upd3 was observed and which exact experimental conditions were followed, e.g., young or old flies, whole flies or gut… We have looked in more detail at ref. 133 of this article, which refers to an unpublished study from the Hultmark laboratory that is however available online: (https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=15&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=Nora+virus&language=en&pid=diva2%3A1045375&aq=%5B%5B%5D%5D&sf=all&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=4587).

      In that study, flies were “infected” with Nora virus by expressing a cDNA clone injected into embryos. The problem is that for some unknown reasons the authors used Relish mutant flies. It is thus difficult to conclude as these flies are defective for the IMD and Sting pathways whereas our flies are wild-type. We were also interested to read that genes involved in midgut stem cells differentiation were expressed in flies harboring Nora virus, which is in keeping with the data of the present study. However, it is difficult to discuss this when we know little on the background of the studies analyzed by Kuyateh et al, in as much as our Discussion is already rather long.

      (21) Figure 7E: are the differences between control and Dome/Stat knockdown flies significantly different for Nora (+) flies (in the absence of Pseudomonas)? This is not clear from the data presentation.

      The answer to the question is positive: the JAK-STAT pathway also contributes to the maintenance of intestinal epithelium homeostasis in the absence of bacterial infection, that is presumably basal conditions. We have modified Fig. 7E to include more comparisons.

      Textual suggestions:

      (22) Line 25 strives > thrives

      Done.

      (23) Lines 150- 152, etc are not very informative. Also, some of the viruses analyzed are not "known contaminating viruses", but viruses used experimentally (VSV, IIV6, CrPV). I suggest adjusting the phrasing.

      Done.

      (24) Line 862: weaker fitness > lower fitness.

      Done.

      (25) Virology terms:

      (a) I suggest not using the term titer for qPCR readouts (which do not involve titration). Viral RNA level or viral RNA load would be more appropriate.

      Done.

      (b) I would propose rephrasing the Y-axis label of Figure 1C, E to Nora RNA load (same for other figures showing viral RNA).

      Done.

      (c) Infested: rather use the more accurate term infected.

      Done.

      (d) Contamination: rather use the term infection.

      We have modified some but not all occurrences of this word. We believe that it is important to use the word contamination when referring to enterocytes: the enterocytes are not infected by Nora; rather, differentiated infected ISCs become contaminated enterocytes. Infection refers to an active process whereas contamination refers to a state.

      (e) Proliferation: rather use the term replication.

      According to our US-English dictionary, proliferation refers to the “rapid reproduction of a cell, part, or organism”, which is the meaning we intend. Replication does not have this notion of speed of reproduction.

      (f) Drosophila should not be italicized in Drosophila A virus, following the ICTV convention that a "virus name should never be italicized, even when it includes the name of a host species or genus" https://ictv.global/faq/names.

      Done.

      (26) Line 873-975: please rephrase the legend of Figure 1F as the current one is not informative.

      Done.

      (27) Line 934: I suggest moving the justification of the time point chosen "= LT50 on the survival test in 935 Fig. 2E" to the main text.

      Done.

      (28) Line 936: with drop > with a drop.

      No longer relevant.

      (29) Line 940-941: the grammar of the sentence does not seem to be correct as it suggests that SDS induces Diptericin expression.

      No longer relevant.

      (30) Line 952-953; line 980: please correct mismatch singular/plural (antibody have, inhibition do).

      Done.

      (31) Line 422: "It will be interesting to determine whether the absence of a Dcr2 fluorescent proteins fusions in progenitor cells that we report in this study rules out a role for the RNAi pathway in intestinal host defense against the Nora virus". It would be of interest to discuss this finding in the context that virus-derived Nora virus siRNAs can be easily detected and that the viruses encode an RNAi antagonist (doi: 10.1371/journal.ppat.1002872).

      Done. We have updated the Discussion and propose a model whereby RNAi would prevent primary infection of enterocytes and then virus replication in proliferating progenitor cells would allow the virus to effectively inhibit the RNAi machinery when the infected progenitor cells become enterocytes.

      (32) Line 159: Nora virus phenotypes differ between laboratories. I would be interested to read the authors' speculations on why this would be the case.

      Our work shows that the effects of Nora virus depend significantly on several parameters we have identified: nutrition quality, age, exposure to abiotic or biotic stresses, and fly genotypes with the existence of Nora-refractory strains. These parameters as well as potential differences between laboratories are actually discussed in the second paragraph of the Discussion.

      (32) Line 175: capitalization of ORE-R vs Ore-R at other places in the manuscript.

      Done.

      (33) Line 185-194: PA14 and Pseudomonas are used interchangeably. Perhaps it is clearer to stick to a single term for consistency.

      PA14 is one clinical strain used to study P. aeruginosa. There are many others such as PAO1, which is also widely used. We have decided to write P. aeruginosa PA14 the first time we are using it in each figure legend, and use only PA14 afterwards.

      Reviewer #3 (Public review):

      The claim that Dcr2 is not abundant in ISCs because the protein is not stable is logically consistent and reasonable. Perhaps I missed this, but the authors could additionally knock down or use somatic CRISPR to delete Dcr2 in ISCs to test whether a lack of Dcr2 underlies sensitivity. In this experiment, the expectation would be that depleting Dcr2 in ISCs genetically would make little difference to susceptibility overall compared to controls. This is not an essential experiment request.

      We agree with the reviewer that these would be interesting experiments to perform. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions dealing with the specific steps of RNA interference that may be missing in progenitor cells.

      Recommendations for the authors:

      (1) Line 206-207 and 214-216: the order of ideas presented here is unintuitive. In Lines 206207, it is said that ABX treatment had no effect, which is counterintuitive to the nature of infection susceptibility. But this is resolved in Lines 214-216 when the reader realizes that S3G is fed on a sucrose solution, and so likely microbiota-depleted. Perhaps more could be said to clarify this in the main text, and/or swap the order of these observations so a casual reader is not confused about the nature and extent of the microbiota contributing to the sensitivity of Nora-infected flies.

      As suggested by the reviewer, we have clarified the text with respect to the food source and microbiota load; we emphasize that the microbiota plays a protective role in Nora-negative flies fed on sucrose solution even though the microbiota load is very low under these conditions. Of note, the microbiota is not depleted in sucrose-fed Nora-positive flies: we suspect that delaminating enterocytes may actually provide directly or more likely indirectly (peritrophic matrix) nutrients for the microbiota.

      (2) Line 262-265: the text may be a bit exaggerated given only 3 pathogens tested, one of which was a fungal natural infection breaching the cuticle and largely bypassing the gut. This could be re-phrased.

      The important point is that uninfected Nora-positive flies die with a LT50 of about 10 days even when noninfected; it has nothing to do with the number of pathogens tested. Thus, any infection that causes death with kinetics in this range may be misinterpreted in the absence of a relevant uninjured or clean injury control.

      (3) Line 379-382: I don't know if citing Schissel et al. is needed here. This paper's methods and data are highly problematic, as mentioned by the authors. This is not a highly cited paper, nor does it add value to the present discussion to cite it only to discredit it. Perhaps this can be left out and the field can move on quietly - naturally, this choice is the present authors', and this is just my view.

      We have actually cited this article at two other places and thus had not cited it “only to discredit it”. We have nevertheless removed the lines as suggested by the reviewer.

      (4) Line 404: perhaps clarify "Interestingly, mammalian stem cells..."

      Done.

      (5) Line 455: my understanding of digital PCR is that it is highly useful for detecting rare variants but not particularly better than qPCR for estimating loads/titres? This is not to say dPCR is worse, just that dPCR and primer-specific RT + qPCR are comparable if load/titre is desired. For instance, Qiagen actually recommends qPCR over dPCR specifically (and pretty much exclusively) for gene expression: https://www.qiagen.com/us/applications/digitalpcr/beginners/dpcr-vs-qpcr.

      (6) Perhaps Line 455 could drop the advocacy for digital PCR? I agree using dissected guts, or seemingly aged individuals per Figure 3B(?), is a valuable thing to point out. Maybe the aged individuals point could be added here? I guess the idea behind dissected guts is to have samples enriched in Nora virus.

      Cleaning Nora-positive strains is really difficult and we suspect that as long as there is one viral particle left, it may be sufficient to re-ignite the contamination of the strain. Our own experience with digital PCR on the expression of AMP-like molecules in the head of flies is that we found the approach to be more sensitive than classical RTqPCR (Xu et al., EMBO Rep, 2023).

    1. Reviewer #1 (Public review):

      Summary:

      This paper leverages 7T fMRI data from the Natural Scenes Dataset to investigate whether retinotopic coding, the position-selective organization of visual response structures, spontaneous resting-state interactions between the Default Network (DN) and the Dorsal Attention Network (dATN). Using individualized network parcellations and population receptive field (pRF) modeling, the authors show that DN voxels can be split into two subpopulations based on their response to visual stimulation: those with position-specific positive BOLD responses (+pRFs) and those with position-specific negative BOLD responses (-pRFs). Critically, these subpopulations relate differently to the dATN during rest: -pRFs are anticorrelated with the dATN, +pRFs are positively correlated, and non-retinotopic DN voxels show no coupling. The anticorrelation (and positive correlation) is enhanced when DN and dATN voxels share visual field preferences. An event-triggered analysis suggests that retinotopic coding shapes both "top-down" (DN-initiated) and "bottom-up" (dATN-initiated) spontaneous activity transients, supporting the claim that the retinotopic scaffold is intrinsic to the DN. These findings challenge the prevailing view of global DN-dATN antagonism and suggest retinotopic coding as an organizing principle for cross-network communication.

      Strengths:

      The central finding that what looks like network-level independence between DN and dATN decomposes into structured, bivalent interactions organized by voxel-level visual field preferences is a compelling demonstration that macro-scale network descriptions can hide meaningful substructure. The logic of the analysis is clean: pRF properties are estimated from retinotopic mapping data and then used to predict resting-state coupling in completely independent scanning sessions. This cross-session, cross-modality design rules out many circularity concerns.

      The use of individualized multi-session hierarchical Bayesian parcellation (Kong et al.) to define DN and dATN boundaries within each subject is the right methodological choice for this question. Network boundaries in posterior cortex, where DN and dATN interdigitate most closely, vary considerably across individuals, and group-average approaches would introduce exactly the kind of misassignment that would most confound the result.

      The matched-vs-random pRF analysis is well-controlled. The authors demonstrate that cortical distance between matched and randomly-matched dATN pRFs does not differ, effectively ruling out spatial proximity on the cortical surface as a confound. tSNR controls further show that signal quality differences do not drive the effect.

      The event-triggered analysis (Figure 3) is creative and adds genuine value. Showing that retinotopically-specific coupling persists during DN-initiated activity transients, not only dATN-initiated ones, is the key piece of evidence for the claim that the code is intrinsic to the DN rather than passively inherited through bottom-up visual drive.

      The result is observed consistently across all individual participants, which provides strong evidence for the robustness of the qualitative pattern despite the small sample size inherent to densely-sampled designs.

      Weaknesses

      (1) The nature of negative pRFs requires more scrutiny

      The entire interpretive framework depends on treating negative pRFs in the DN as genuine position-selective neural responses (suppression). However, negative BOLD signals are well known to arise from non-neural sources, specifically, vascular stealing (where activation in nearby tissue diverts blood from adjacent voxels) and macrovascular draining vein effects that produce spatially displaced signal inversions. These concerns are amplified at 7T, where T2*-weighted GE-EPI carries substantial macrovascular weighting. The DN and dATN interdigitate extensively in the posterior cortex, often within millimeters. A negative pRF in a DN voxel adjacent to a positive dATN voxel could, in principle, reflect the hemodynamic shadow of its neighbor rather than an independent neural response.

      The spatial dispersion control (matched vs. random pRFs have similar cortical distribution) is valuable but addresses long-range confounds, not *local* hemodynamic crosstalk. The reliability of sign and center position across runs is reassuring but does not exclude a vascular origin, as vascular architecture is itself stable across sessions. I would encourage the authors to test whether the matched-vs-random effect survives exclusion of voxels near large pial vessels (identifiable from T2* contrast or the venograms available in the NSD). These analyses would not be dispositive, but they would meaningfully strengthen the neural interpretation.

      (2) Amount of retinotopic mapping data and choice of pRF pipeline

      The NSD includes 6 runs of retinotopic mapping (~5 minutes each; 3 bar-aperture, 3 wedge/ring). The authors use only the 3 bar-aperture runs (~15 minutes total per subject) and fit their own pRFs using AFNI's 3dNLfim procedure, rather than using the pRF estimates provided as part of the NSD release (which were fitted using the analyzePRF toolbox with all 6 runs).

      Fifteen minutes of bar data is quite limited for reliable voxel-wise pRF estimation, especially in regions far from the early visual cortex, where signal-to-noise is inherently lower. Standard recommendations for robust pRF mapping in higher-order regions generally suggest substantially more data. The variance-explained threshold is close to the noise floor by design, meaning that a non-trivial number of the "retinotopic" DN voxels may be poorly estimated. Given that the core analyses depend on both the sign and the center position of these pRFs, the limited data is a significant concern.

      The authors do not explain why they chose to re-fit pRFs rather than use the NSD-provided estimates. If the motivation was methodological (e.g., the NSD pRF pipeline does not readily yield signed amplitude, or the bar-only fits were judged more appropriate for detecting negative responses), this should be made explicit. If the NSD-provided pRFs can reproduce the key findings, this would substantially increase confidence in the results. If they cannot, that divergence itself would be important to understand. I would ask the authors to address this choice and, if feasible, to report whether the core results replicate using the NSD-provided pRF estimates and/or whether using all 6 runs of retinotopy data changes the findings.

      (3) pRF model adequacy for the Default Network

      The isotropic Gaussian pRF model was developed for and validated in early and mid-level visual cortex, where it captures the dominant spatial selectivity of neuronal populations. In DN voxels where the model explains comparatively little variance, it is less clear that the model is capturing the right quantity. Specifically, the negative pRFs could conceivably be described by a model with a dominant suppressive surround (e.g., a difference-of-Gaussians model), in which what appears as a "negative pRF" in the standard model is actually the surround component of a center-surround mechanism whose center is poorly resolved. This distinction matters: a genuine inverted code (negative center response) implies a qualitatively different computation than inherited surround suppression from nearby visual cortex.

      The authors should consider discussing why the standard model is sufficient for the questions asked, or ideally, testing whether the sign distinction survives under alternative pRF model specifications.

      (4) Interpreting resting-state transients as top-down vs. bottom-up

      The event-triggered analysis labels high-amplitude DN pRF activations as "top-down events" and dATN activations as "bottom-up events." This is a reasonable inference given experience-sampling studies showing that rest involves alternation between internal and external attention, but it remains an inference. Without concurrent experience sampling, eye-tracking, or physiological monitoring, we cannot establish that a spontaneous DN transient reflects memory retrieval or internally-directed thought rather than a global arousal fluctuation. Similarly, dATN transients during rest could reflect covert shifts of spatial attention to remembered or imagined locations rather than bottom-up processing per se. I would ask the authors to soften this framing or to discuss what additional data would be needed to validate the top-down/bottom-up attribution.

      (5) The "retinotopic code" vs. "visual field bias" distinction

      The paper uses the language of a "retinotopic code" throughout and correctly distinguishes this from a "retinotopic map," noting that DN voxels do not form a continuous topographic representation on the cortical surface. This distinction deserves greater emphasis. In vision science, retinotopic maps carry computational significance through their topographic continuity and relationship to cortical wiring. A distributed collection of voxels with coarse visual field preferences but no cortical topography is a fundamentally different organizational feature. Recent reviews have drawn an explicit distinction between *retinotopic maps* and *visual field biases* (Groen, Dekker, Knapen & Silson, TiCS 2022), and the present findings may be more accurately characterized as the latter. Perhaps the authors think that the distinction is merely a signal-to-noise distinction, in which case I would invite them to clearly speak to this interpretation. In any case, this is not a criticism of the findings themselves, but clarity on this point would prevent conflation of two different organizational principles and would help position the work for both the vision and network neuroscience communities.

    1. Reviewer #3 (Public review):

      Summary:

      Environments change over time; therefore, optimal decision-making ought to discount older observations of the environment in favor of newer ones in a manner consistent with the amount of temporal instability. Computational models of perceptual decision-making model this temporal discounting with a 'leak' parameter that determines the rate at which older information is discarded. In this study, McGaughey and Gold examine the neurophysiological mechanisms that could underlie adaptation to different degrees of temporal instability. They developed a novel variant of the well-established perceptual decision-making random-dot-motion paradigm, in which the stimulus being evaluated was preceded by an 'adapting' stimulus with either high or low temporal stability. When the test stimulus was preceded by the adapting stimulus with lower temporal stability, NHPs showed reduced psychometric slopes, indicative of increased temporal discounting ('leak'). While the NHPs performed this task, single-unit neural activity was recorded in area MT, along with pupillometric data. The authors use these neural and pupil datasets to investigate two potential sources of adaptive discounting under varying amounts of temporal instability: sensory adaptation (changes in instantaneous evidence encoding), and arousal-related changes in evidence accumulation. MT neurons respond differently to the test stimulus under conditions of high vs low temporal stability of the adapting stimulus - when the adapting stimulus is more stable, MT neurons have larger and more selective responses to the test stimulus. In addition, evoked pupil responses to the test stimulus were modulated by the adapting stimulus. Both the strength of the difference in MT responses across contexts and the difference in pupil diameter across contexts were correlated with context-dependent modulation of the monkeys' behavior over sessions. The paper concludes that both sources appear to independently contribute to adaptive evidence accumulation, likely operating at different processing stages in the brain.

      Strengths:

      (1) While computational models of perceptual decision-making have been very useful for explaining behavior and neural responses in decision-making areas, we are still in search of some of the neural mechanisms that could implement such models. Studies such as this one, which aim to identify neural correlates of simplified model parameters, are quite crucial.

      (2) Analysis is generally careful and well-executed.

      (3) Prompts some interesting follow-up questions that could be answered with simultaneous recordings and causal manipulations, as the authors state in the Discussion - e.g., which areas are affected by arousal-related neuromodulation correlated with evoked pupil size and how.

      Weaknesses:

      (1) The task design may not be optimal. While the amount of time the monkey is exposed to each motion direction during the adapting stimulus is matched, it's hard to know if the reduced MT responses to the test stimulus are truly due to the greater frequency of switches during the HSF adapting stimulus or because the monkeys have been exposed to more repetitions of the stimulus. It's increased sensory adaptation in either case, but it makes it problematic to interpret this as temporal context-dependent adaptation specifically. I think this could potentially be partially addressed by an analysis that is in the paper, but could potentially be emphasized/fleshed out more, specifically the results shown in Figure 4D that seem to show that most of the reduction in neural response for adapting units occurs between the first and second stimuli.

      (2) The pupillometric analysis seems to be an indirect way of assessing whether the accumulator itself might be modulated by temporal context, but the link could be made clearer. The authors show that context-dependent behavior is related to pupil size, which is related to arousal/neuromodulation, but it would be helpful to have some idea of what neural mechanisms underlying adaptive decision-making are actually impacted by this neuromodulation. Lacking neural data to address this question (e.g., from a brain region proposed to be involved in the accumulation process), at least more discussion of this would be helpful. Essentially, I'm unsure of how to interpret the pupil results: the argument that temporal context affects instantaneous evidence encoding in MT that then drives the accumulator is very clear, but I am a bit confused about what, mechanistically, I should think about the effect of neuromodulation doing.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for taking the time to review our manuscript and for the insightful comments given us that will help to improve our manuscript. Please find below a point-by-point answer to each reviewer.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): **

      The authors have set up a mouse embryonic sensory neuron system to study impact of complete loss of frataxin (using a nice cre-based AAV approach). There is careful delineation of the phenotype of these cells upon complete frataxin loss using a significant range of relevant endpoints (e.g. OCR, oxidative stress, mitochondrial imaging at EM level). A major finding is the failure of neurons lacking frataxin to undergo full soma maturation - so smaller cells. In addition, AMPK is activated (maybe not surprisingly given the severe loss of mitochondrial function and drop in ATP). Solid mechanistic experiments reveal that AMPK activation when blocked prevents the suppression of soma size (we do not get the same data with regard to alanine supplementation). There are interesting studies with alanine that, in part, reverse indices of oxidative stress (mitochondrial stress, specifically). The experiments are well designed with mechanistic insight and the data clearly presented with appropriate statistical analysis. A major problem is the culture system. The labelling studies and soma size analysis reveal that this is not a truly representative population of DRG neurons. It seems all the small neurons are missing - I assume all trkA positive and GDNF-dependent neurons have been lost somewhere (this comprises 80% of the neurons at the lumbar level). The methods section covering the mouse DRG culture is sparse in terms of details and refers to a text book which I cannot access. Another issue is the background glucose concentration - growing such cells at 25mM is standard I know - but its still sub-optimal. Glucose at this concentration represents a hyperglycemic state - normal glucose is 5-10mM - its not really correct to term it glycolysis inhibitory since hexokinase, the rate limiting enzyme, has a Kd around 0.3-1mM glucose. When studying AMPK this system will exhibit suppressed AMPK activity/expression due to the high background glucose concentration of 25mM.*

      * Reviewer #1 (Significance (Required)):

      The use of this unrepresentative culture system does lower the significance. While large caliber sensory neuron, e.g. proprioceptive, dysfunction is important during development and into the adult it seems rather unfortunate that the authors ignore all other sensory neurons! Persons with Friedreich ataxia (FA) also suffer from small fiber abnormalities, e.g. pain, and these neurons actually express a higher density of mitochondria (since they are unmyelinated). So, when the authors state this model "faithfully recapitulates key hallmarks of FA...." I have to say I disagree. In terms of general significance the work is well performed with some good mechanistically strong studies, however, it does still contain a major purely descriptive component. The focus on AMPK is understandable but we learn nothing really novel about its function and role in sensory neurons. *

      We sincerely thank Reviewer #1 for the careful evaluation of our work and for the positive appreciation of the experimental design, mechanistic approach, and data presentation. We are grateful for the reviewer’s comments, which helped us clarify several aspects of the manuscript and improve the description of our culture system and metabolic conditions.

      Comment on alanine/ALA

      We would first like to clarify a terminology issue. In our study, we did not use alanine supplementation, but alpha-lipoic acid (ALA). We have checked and revised the text to avoid any possible ambiguity on this point.

      Comment on the DRG culture system and representation of sensory subtypes:

      We appreciate the reviewer’s concern regarding the representativeness of the embryonic dorsal root ganglia (DRG) culture system. We agree that this in vitro model does not fully reproduce the cellular diversity and maturation state of the in vivo DRG environment, and we have revised the manuscript to make this limitation more explicit. That said, we respectfully do not think our cultures are devoid of small sensory neurons. In the original submission, Supplementary Fig. 1D-E already showed a substantial population of CGRP-positive neurons__, supporting the presence of peptidergic small-diameter sensory neurons. In addition, we performed TrkA immunostaining,__ which showed that a large proportion of neurons in our cultures are also TrkA-positive. We can add these TrkA data to the revised manuscript if the reviewer and editor consider that this would strengthen the characterization of the culture system.

      More broadly, the reviewer raises an important point: dissociated embryonic DRG cultures maintained under simplified trophic conditions cannot be expected to preserve the full in vivo balance of mature sensory neuron subtypes. Embryonic and neonatal DRG neurons are known to depend strongly on trophic support in vitro, and sensory subtype maturation normally requires both neurotrophic cues and interactions with the native microenvironment. We therefore agree that our system should be viewed as a reductionist model of frataxin loss in developing sensory neurons rather than a complete reconstruction of the mature DRG. We have now expanded the methods section to better describe the culture conditions and revised the discussion to acknowledge more explicitly that future work using more complex conditions, such as combined trophic factor regimens, neuron–glia co-cultures, or organotypic approaches, may help preserve a more physiological sensory subtype composition.

      Comment on glucose concentration and “glycolysis-inhibitory” conditions:

      We thank the reviewer for prompting us to clarify this point. We agree that chronic exposure to 25 mM glucose can influence neuronal metabolism and AMPK signaling, and this issue has been discussed in the literature for neuronal culture systems. However, we believe there was a misunderstanding regarding the specific experiment referred to in our manuscript. In the condition that we termed “glycolysis-inhibitory,” the neurons were not maintained in high glucose. Rather, these experiments were performed in glucose-free medium supplemented with galactose, i.e. in the absence of glucose. Galactose substitution is commonly used to reduce ATP production from glycolysis and increase dependence on mitochondrial oxidative phosphorylation. We have revised the methods and results sections to make this point much clearer and now explicitly distinguish between low-glucose conditions and glucose-free/galactose conditions__.__

      Comment on significance and disease relevance:

      We appreciate the reviewer’s concern regarding the extent to which this model recapitulates the full spectrum of sensory pathology in FA. We agree that our culture system is rather artificial and might therefore not model the entire peripheral phenotype of FA.

      That being said, we believe the model remains highly relevant to a major and well-established component of FA neuropathology. Multiple neuropathological and clinical studies indicate that FA is characterized predominantly by a dorsal root ganglionopathy / sensory neuronopathy with marked involvement of large myelinated sensory neurons and their projections, which is central to the loss of proprioception and sensory ataxia that define the disease. Reviews of FA neuropathology consistently emphasize DRG hypoplasia/atrophy and loss of large myelinated fibers as hallmark features.

      We agree that small-fiber abnormalities have also been reported, including reduced intraepidermal nerve fiber density in some studies, and we do not wish to dismiss that aspect of the disease. However, the current literature still supports that the dominant and most characteristic peripheral lesion in FA affects large sensory neurons and large myelinated fibers more prominently than small fibers. We have therefore revised our wording and no longer state that the model “faithfully recapitulates” the full disease.

      * *Comment on novelty of AMPK findings:

      We agree that AMPK is a canonical metabolic stress sensor and that its activation in the context of severe mitochondrial dysfunction is not, by itself, unexpected. We have therefore revised the discussion to better frame the novelty of our study. In our view, the main contribution is not the mere observation of AMPK activation, but the demonstration, in frataxin-deficient primary sensory neurons, that AMPK activation is functionally linked to the defect in soma growth/maturation and that pharmacological AMPK inhibition can rescue this phenotype. We hope this distinction is now clearer in the revised manuscript.

      * Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In the Present study, the authors develop a new model of FA in cultured DRG neurons, and show its relation with Fe_S deficiency. It is also associated with defects in mTOR signaling, ALA synthesis and AMPKs

      The conclusions convincing and the work is thorough. The results are well presented and easily understood and repeatable.

      Reviewer #2 (Significance (Required)):

      While there have been hints at some of the findings ( references to AMPK), there have not been so well documented before. Thus they are important Is there any evidence of the present finding on cell size in the clinical literature ( pt size, cell size) in non DRG tissue? ( Patient size etc) Might the present findings reflect a developmental event that drives the spinal cord hypoplasia.*

      We sincerely thank Reviewer #2 for the very positive evaluation of our work. We are grateful for the recognition of the rigor, clarity, and reproducibility of the study, as well as for highlighting the relevance of our findings linking frataxin deficiency to Fe-S cluster impairment, mitochondrial dysfunction, and alterations in AMPK and mTOR signaling, as well as lipoic acid metabolism.

      We also thank the reviewer for the insightful comment regarding the potential relevance of our observations on reduced neuronal soma size.

      To our knowledge, there is no direct clinical evidence describing reduced neuronal cell size per se in patient tissues outside of the DRG. However, neuropathological studies of FA consistently report hypoplasia and atrophy of the DRG__, characterized by a marked reduction in the size and number of sensory neurons, particularly affecting large neurons. These features are widely interpreted as reflecting a developmental defect rather than purely degenerative loss.__

      More broadly, several studies have described spinal cord hypoplasia__,__ including reduced cross-sectional area of the cord and thinning of posterior columns, which are thought to arise early in disease progression. These observations support the idea that impaired neuronal growth and maturation may be a key component of the pathology.

      In this context, we agree with the reviewer that our findings may reflect a developmental mechanism contributing to the hypoplasia observed in FA__, __rather than solely a degenerative process. Our in vitro data showing reduced soma size in frataxin-deficient sensory neurons, together with the involvement of AMPK/mTOR signaling pathways known to regulate cellular growth, are consistent with this hypothesis.

      We have now revised the discussion to incorporate this point and to more explicitly propose that bioenergetic stress and AMPK activation in frataxin-deficient neurons may limit neuronal growth and maturation during development__,__ thereby contributing to the structural deficits observed in patients.

      At the same time, we have moderated our conclusions to emphasize that our model primarily captures cell-autonomous mechanisms in developing sensory neurons__,__ and that further in vivo studies will be required to directly establish the contribution of these mechanisms to human pathology.

      • Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In the present study, the authors develop a new model of Friedreich ataxia (FA), a disease caused by frataxin deficieny, using primary cultures of embryonic mouse Dorsal Root Ganglia neurons with complete frataxin depletion. This model reproduces key biochemical hallmarks of FA, including Fe-S enzyme deficiency, mitochondrial iron dysregulation, and oxidative stress. They also observe that these frataxin-deficient neurons exhibit a reduction in soma size. They claim that this defect is mediated by AMP-activated protein kinase (AMPK) hyperactivation and suppression of mTOR signaling, which occurs in response to mitochondrial dysfunction and redox imbalance. They are able to restore soma growth by genetic inhibition of AMPK or treatment with lipoic acid (ALA). The study is carried out meticulously, and the results are generally well presented, with the exception of a few specific experiments that will be noted below.

      Major points: - Mitochondrial iron was measured using the fluorescent iron sensor RPA. However, when using this probe loss of signal can be caused by either increased iron or by loss of membrane potential. Thus, as mitochondrial membrane potential is decreased in the model used, it can not be concluded from the results obtained that mitochondrial iron is increased. To confirm that mitochondrial iron is increased, authors should either use a dequenching approach (as indicated in Petrat F, et al., Biochem J. 2002 362:137-47), or use another mitochondrial iron specific probe.

      • Authors describe that ALA treatment improves mitochondrial function and reduces oxidative stress, and they hypothesize that restored mitochondrial activity may contribute to AMPK downregulation. However, to provide a more mechanistic insight into this observation, it would be advisable to assess whether the indicated treatment is able to restore mitochondrial functionality by performing a Seahorse assay

      • Authors state that their data supports a model in which full frataxin depletion first induces a deficit of Fe-S synthesis, subsequently triggering downstream consequences such as iron dysregulation and oxidative stress. This may be plausible for oxidative stress, as it has been measured at 15 div. However, as alterations in iron homeostasis have not been measured at 15 div. it can not be concluded that they appear later than deficiency in FeS proteins. The authors should measure TfR and FT-L expression at 15div, or alternatively indicate in the discussion that it cannot be concluded whether the alteration in iron metabolism occurs after the deficiency in Fe‑S proteins

      Minor points: Previous studies have reported dysregulation of the AMPK and mTOR signaling pathways in various models of Friedreich's ataxia. It would therefore be appropriate to highlight these findings in the discussion According to authors, Immunofluorescence confirmed efficient mitochondrial localization of mtLplA delivered via AAV9-mediated transduction (Fig. S5A). However, the image provided suggests partial co-localization. This should be acknowledged in the description of the results, or either provide further data or measures confirming such efficient mitochondrial localization.

      Reviewer #3 (Significance (Required)):

      General assessment: Authors present a new model of Friedreich ataxia (FA) in Dorsal Root Ganglia neurons. This new model offers the advantage of being conditional, allowing frataxin deficiency to be induced and enabling the analysis of the emergence of various alterations across different generations. However, it also presents the limitation of inducing a complete loss of frataxin, a condition that does not occur in patients, who typically exhibit only a partial deficiency of this protein. Although the experimental work presented is of generally good quality (aside from some minor issues previously noted), it remains unclear whether the study provides substantial advances to the field of Friedreich's ataxia. The conditional nature of the model would, in principle, allow for a deeper exploration of mechanistic aspects underlying how frataxin deficiency leads to the observed phenotypes; however, this potential is not fully exploited in the current manuscript. In this context, the proposed relationship among energy deficiency, AMPK hyperactivation, and treatment with lipoic acid would be considerably strengthened by analyzing the effects of this compound on mitochondrial respiration Advance: The effects of frataxin deficiency on DRGs had been previously addressed by other authors. In this new model, the authors describe a series of phenotypes, most of which have already been reported in other models of the disease (including models using DRGs). On the one hand, this reinforces the validity of the model, but on the other, it reduces the novelty of the observations presented.*

      • *

      We thank Reviewer #3 for the careful evaluation of our manuscript and for the constructive and insightful comments. We are grateful for the positive appreciation of the overall quality of the study and for the suggestions that helped us improve the rigor and clarity of our work.

      Major points:

      Iron probe

      We thank the reviewer for this important remark. We agree that RPA fluorescence depends both on mitochondrial membrane potential and iron-dependent quenching. To address this point, we performed iron modulation experiments. Treatment with a membrane-permeant iron chelator strongly increased RPA fluorescence in both CT and KO neurons, whereas iron loading with ferric ammonium citrate (FAC) decreased the signal in both conditions. These bidirectional changes demonstrate that RPA is efficiently targeted and remains fully responsive to mitochondrial iron in KO neurons, arguing against impaired probe loading as the primary cause of the reduced basal signal.

      Nevertheless, to exclude any potential contribution of mitochondrial membrane potential differences, we propose to complement these experiments with an independent mitochondrial iron probe, Mito-FerroGreen, which detects mitochondrial Fe²⁺ via a distinct mechanism, independent of mitochondrial membrane potential. We would need about 8 weeks to perform these experiments.

      Effect of ALA on mitochondrial function

      We thank the reviewer for this suggestion. We agree that assessing mitochondrial respiration would provide additional mechanistic insight into the effect of alpha-lipoic acid (ALA). In the original version, we had data showing that ALA treatment restores intracellular ATP levels, suggesting an improvement of mitochondrial function. However, we agree that this is not formal proof. We propose for a revised version to look at mitochondrial membrane potential as a proxy for mitochondrial function. While we agree that Seahorse-based analysis of oxygen consumption would be highly informative, these experiments require substantial time in primary DRG cultures and would significantly delay the revision. But if the reviewer or editor consider this essential, this could be performed.

      Temporal relationship between Fe-S deficiency and iron dysregulation

      We thank the reviewer for this important comment.

      In response, we have now analyzed markers of iron homeostasis (TFR1 and FRTL) at 15 DIV, the same time point at which Fe-S protein deficiency is already evident. These new data show that iron homeostasis is not significantly altered at this stage, supporting our interpretation that Fe-S deficiency precedes detectable changes in iron metabolism.

      We have included these new results in the revised manuscript (Fig. S2E) and clarified the temporal sequence in the results and discussion sections.

      Minor points:

      1. We thank the reviewer for this suggestion. We have expanded the discussion to better acknowledge previous studies reporting dysregulation of AMPK and mTOR signaling pathways in various models of Friedreich ataxia, and we now position our findings within this existing body of work.
      2. We thank the reviewer for this important observation. We agree that the immunofluorescence data indicate partial, rather than complete, co-localization of mtLplA with mitochondrial markers. We believe this is most likely due to high levels of mtLplA overexpression, leading to partial saturation of the mitochondrial import machinery and consequently incomplete mitochondrial targeting. This interpretation is supported by our western blot analysis (Fig. S5B), which shows the presence of two bands corresponding to processed (mitochondrial) and unprocessed (non-imported) forms of the protein. We have revised the text accordingly to more accurately reflect these observations. We thank the reviewer for the thoughtful evaluation of the significance of our work and for highlighting both the strengths and limitations of our model. We agree that our model, based on complete frataxin depletion, does not fully recapitulate the partial deficiency observed in patients with FA. However, we believe that this approach provides a valuable experimental advantage, allowing us to: precisely control the timing of frataxin loss, investigate early cellular events, and dissect cell-autonomous mechanisms in sensory neurons. We have revised the manuscript to more clearly acknowledge this limitation.

      Regarding novelty, we agree that several individual phenotypes observed in our study (e.g., Fe-S deficiency, oxidative stress, mitochondrial dysfunction) have been reported in previous models. However, we would like to emphasize that our model enables the integration of these features within a single conditional system in primary sensory neurons, and importantly allows us to uncover a functional link between bioenergetic stress, AMPK activation, and impaired neuronal growth.

      In particular, our data identify AMPK as a key mediator of soma size reduction, and demonstrate that its inhibition can rescue this phenotype. We believe this provides a novel mechanistic connection between mitochondrial dysfunction and neuronal growth regulation in frataxin-deficient sensory neurons.

      Finally, we have revised the discussion to better highlight both the strengths and limitations of the model, and to more clearly position our findings as contributing to the understanding of early pathogenic mechanisms and developmental aspects of sensory neuron dysfunction in FA.

    1. Tsze-kung asked, saying, ‘Is there one word which may serve as a rule of practice for all one’s life?’ The Master said, ‘Is not reciprocity such a word? What you do not want done to yourself, do not do to others.’” Confucius, Analects 15.23 [b9] (~500 BCE China)

      The quote from Confucius reminds me that when we interact with others, we should learn to put ourselves in their position. It made me think that a lot of conflicts or misunderstandings could be avoided if people just asked themselves, “Would I want to be treated this way?” I think that this quote is practical and it can be something you can apply in small, daily interactions. To me, it emphasizes empathy and mutual respect. It also makes me see the importance of understanding others and acting with consideration.

  3. Mar 2026
    1. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set [their original title]' Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I was also very concerned by the revisions.

      I expand briefly on these concerns and a few others for readers of the paper (see `The below comments relate to my original review'). Subsequent edits to the paper addressed some of these by providing a new figure and moving around the methods. Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants.

      Somewhat minor comments:<br /> (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.<br /> (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

    2. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

      We thank the editors and reviewers for their expert assessment of our findings and their interest in our conceptual framework. Below we respond to the specific reviewer and editor comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-PhenologySwitch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and postsolstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Thank you for your generous description of our study and the manuscript.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees.

      The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      We thank the reviewer for pointing out that we could improve our explanation of the different responses to July and August cooling in experiment 1. Whilst we incorporated this in the conceptual model and the figure caption (Fig. 1b), we now also address this topic in more depth in the discussion section, focussing on daylength and photosynthetic assimilation as the possible mediators of this change in responses (L350-371).

      For the early-season development effect vs the late-season temperature effect we can use the leaf-out day-of-year (as a proxy for development), and the summer cooling treatments (direct temperature effect) to assess the relative importance of these two components of our model. We have now included a variance partitioning analysis following this logic, see L246-252 for methods, L278-281 for results.

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      This question may reflect a misunderstanding regarding the light availability that we hope to address with improved clarification. The duration and intensity of the lighting in these experiments was always set to reflect the average conditions experienced in Zurich for those respective times of the year. Day length in spring is shorter than it is in summer, so the durations were simply adjusted to reflect this reality. The 13-hour, 4,300 lux conditions in experiment 1 were only for the April-May period, when we reduced developmental rates for the late-leafing trees (L125-129). In July, the photoperiod was set to 16 hours and light intensity was approximately 7,300 lux (L150-154). This is equitable to experiment 2–when treatments were applied in June and July–where photoperiod was 16 hours and light intensity approximately 6,900 lux (L206-207). These conditions reflect the average daylengths in Zurich, and the maximum light intensity output by the chambers.

      As mentioned in our initial author response, we do not think small differences in soil moisture levels should influence our conclusions. All pots were watered sufficiently to avoid water deficit, and all efforts were made to minimise differences in water availability. A Tukey honest significant difference test showed that only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate, difference = 6%, p < 0.05) had significantly different soil water content, a pair whose responses are not compared. We have added words to this effect in the figure legend of Fig. S1.

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      We agree that photoperiod likely plays a central role. Our conceptual model (Fig. 1) explicitly incorporates photoperiod as the framework within which temperature responses are regulated (L72-75, L627-629 & L638-641). The Solstice-as-Phenology-Switch hypothesis assumes that the annual progression of daylength sets the physiological “window” for trees’ responsiveness to temperature. Our experiments therefore focused on how temperature responses differ before versus after the solstice, while recognising that this reversal is likely enabled by the photoperiod signal. In other words, photoperiod provides the regulatory backdrop, and our results identify how diel and seasonal temperature cues are interpreted within that photoperiodic framework.

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”; L409-411) and explicitly call for follow-up studies across species and forest contexts (L413–414). At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and ground-based phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.

      Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Thank you for the kind comments. We appreciate your concerns regarding the severity of our treatments and the generalisability of our results, and you can find our detailed responses below.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants (L129-L133). We have added text in the Methods to clarify this aim (L129-131 & L156-161).

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods (L146-148) and Discussion (L345-346).

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions (L141145).

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that including more data on photosynthetic assimilation would be valuable for interpreting phenological responses. Indeed, it was our intention to collect this information. However, unfortunately, we experienced technical challenges with the equipment available to us during the experimental period, which prevented us from collecting a full dataset. Nevertheless, we were able to obtain measurements during pre-solstice cooling (now presented as Fig. S12, including data for all treatments), which show that cooling treatments strongly reduced assimilation rates compared to controls. Importantly, these strong reductions occurred across all cooling treatments, yet their phenological outcomes differed markedly, demonstrating that assimilation alone cannot explain the observed responses. As we discuss, our findings are consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1 (comment 4), our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously (L409411) and highlight the need for further research across species (L413–414).

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker (L179-195 for methods, L296-311 for results). On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, bud set occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech” (L1-2).

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. As mentioned above (see Reviewer 1 comment 3), photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of season timing?

      We interpret this concern as relating to the flexibility in reversal timing that we observed. Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21. Rather the hypothesis implies that reversal occurs around the solstice, when photoperiod cues cause tree individuals to shift from accelerating to decelerating their seasonal development. Our conceptual model (Fig. 1) explicitly incorporates this flexibility by showing how the timing of the reversal depends on developmental speed: Individuals that develop more slowly (or leaf out later) cross the compensatory point later in the summer, whereas fast developing individuals reach it earlier.

      Our experiments support this framework: pre-solstice full-day cooling delayed bud set, whereas post-solstice full-day cooling advanced it, with differences between early- and late-developing individuals consistent with the model. Moreover, the contrasting impacts of daytime vs. night time cooling demonstrate how diel conditions can further shape when the reversal is expressed. Thus, rather than contradicting the Solstice-as-Phenology-Switch hypothesis, our findings reinforce it and extend it by showing how flexibility arises from interactions between developmental progression, diel temperature responses, and photoperiod.

      We have added an additional section in the Discussion that elaborates on how our results support the Solstice-as-Phenology-Switch hypothesis (L416-432).

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      (1) The current strength of evidence is incomplete. Extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses could make the conclusions more solid.

      We agree with the vast majority of the reviewer comments and have made the relevant edits. We believe that these have dramatically improved the clarity of the manuscript. The revised analyses have not changed our conclusions, though we have toned down generalisations.

      (2) The Solstice as Switch hypothesis is about the effect of temperature warming. However, the two experiments did not simulate warming but rather cooling. Although a temperature difference can be obtained compared to the control in both cases, the impacts on plant physiology and phenology should still be different between the two scenarios.

      Thank you for raising this point, which requires clearer communication in our manuscript. The Solstice-as-Phenology-Switch hypothesis posits that changes in temperature before and after the summer solstice have opposite effects on the autumn phenology of northern forest trees. While the hypothesis has most often been framed in terms of warming, the underlying mechanism concerns whether development is accelerated or slowed relative to ambient conditions. In essence, we are exploring the effect of changes in temperature – not warming per se. In warmer springs, development begins earlier and/or proceeds faster, while in colder springs the opposite occurs; the same logic applies to post-solstice conditions. We have extended our explanation in the Introduction (L69-71).

      In our experiments, we applied cooling to create strong contrasts in developmental rates without damaging the trees. These treatments allow us to test the direction of phenological responses relative to ambient conditions. Thus, although we used cooling rather than warming, the results are directly informative for the Solstice-as Switch framework, which concerns the relative effect of temperature changes rather than the absolute direction of manipulation.

      (3) The number of groups for bud type and summer temperature treatment is too small to be used as a random effect; it would be more appropriate to treat them as fixed-effect terms.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9, see L271) and inferences are not altered. We also report the bud type effects for experiment 1 (L262-266) and experiment 2 (L292-293)

      (4) Please add more clarifications for Figure 4 about what this figure is for and how you derived this figure, whether the data were from your experiments or others.

      We have rewritten the caption for Figure 6 (Fig. 4 in the previous manuscript) to clarify where the data came from and how the figure was generated (L687-693). This figure serves as a visual guide to aid the understanding of the processes that may govern the patterns we have observed. Figure 6a uses data from previous studies on diel patterns in F. sylvatica, specifically growth (Zweifel et al., 2021) and photosynthetic assimilation rates (Urban et al., 2014). To aid visualisation, we linearly interpolated between measurements points, converted the values to a relative percentage (compared to observed maximum), and then smoothed the resulting curves. Based on the evidence from experiment 2, we suggest there may be a temperature threshold below which overwintering responses (e.g. bud set) are induced in F. sylvatica. Figure 6b depicts a theoretical diel pattern of this potential threshold. In simple terms, the threshold must be lower at night because nights are typically colder than days.

      Reviewer #2 (Recommendations for the authors):

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect, so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      See point (3) in reviewing editor’s recommendations for the authors.

      (2) Could the authors move the methods earlier and remind readers of them in the results?

      We have addressed this issue, please see detailed response under reviewer 2’s concerns.

      Urban O, Klem K, Holišová P, Šigut L, Šprtová M, Teslová-Navrátilová P, Zitová M, Špunda V, Marek MV, Grace J. 2014. Impact of elevated CO2 concentration on dynamics of leaf photosynthesis in Fagus sylvatica is modulated by sky conditions. Environmental Pollution 185: 271–280.

      Zweifel R, Sterck F, Braun S, Buchmann N, Eugster W, Gessler A, Häni M, Peters RL, Walthert L, Wilhelm M, et al. 2021. Why trees grow at night. New Phytologist 231: 2174–2185.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      (1) Analysis of signaling traces

      Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      We think the benefits of modeled signaling level are the conceptual accuracy to the extent possible with the data. It’s true that the assumptions brought-in may cause certain biases. We perform this and the simplest (raw data averaging, Fig.2). Intermediate results in between (such as the first derivative in Fig.3C) may correlate well or less well, but cannot be interpreted biologically.

      Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence." Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      Yes the segmentations measure intensity in a fixed volume inside a cell, therefore it’s a spatial average (concentration) and is susceptible to cell volume changes. This has been noted in the revision. The raw measurement does fluctuate and can decrease, we think the short-time-scale fluctuations are likely measurement variations/errors rather than underlying big changes in concentration.

      Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      Yes we agree. Unfortunately we don’t have the quantitative data required to better estimate Kaede mRNA stability. The timing of Cyc inhibition to the ceasing of ptch mRNA production is roughly estimated but not necessarily precise in this context.

      (2) Assignment of fates and correlations

      Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      This is a very insightful point. We did examine the posterior data again (cross-checked by 2 co-authors) to make sure the mixed situation has correct cell fate assignment. As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully. The heterogeneity argument is based on the verified tracking and final positioning of these cells.

      Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      We agree. Due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to enrich the tracks for this revision. We are aware of upcoming, independent studies with many more systematic tracks and analysis which will address these concerns. We have added the caveats the reviewer raised.

      Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      Thanks for these suggestions. We are limited by the measurement noise, coverage window of the traces and the number of tracks to make use of the full dynamics in a more informative manner.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here. We added this point to make our presentation more balanced.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      We’d refer readers to our earlier study Xiong et al., 2013 where ptch2:kaede, nkx2:gfp and olig2:gfp were plotted against position over time in single cell tracks. It was found that position was not a good predictor of signaling levels or cell fates at early stages when the cell fates were specified.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that. However, signaling dynamics is not necessarily a good function of position or time either, there is no evidence for that in our results here. The 83% correlation is thus striking for the posterior progenitors indicating a certain robust logic in the GRN to capture a strong (even short-lived) response to Shh, regardless of position or time. This is an interest possibility (we do not claim it a mechanism as we have not tested it with perturbations) that challenges the prevailing view in the field that these progenitors integrate Shh exposure over time, or that they acquire positional information by reading a gradient.

      The discussion has been modified to be more nuanced about these points.

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

      We quite agree. Together with the reviewer, we look forward to seeing the publication of some recent, independent progresses overcoming the challenges in our work by other colleagues.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Minor comments:

      y-axis label suddenly changes to Ptch2-reporter level in Figure 5. Is what is plotted different from what is seen as examples in Figure 3?

      Thanks! Figure 5 tracks are as Figure 3B, this has been annotated in the figure legends.

      There are random bounding boxes in some of the figures.

      Sometimes the m in "More dorsal" is stylized with a capital M and sometimes not. It is somewhat confusing as a name for cell types but it is fine if no alternative can be found.

      This study unfortunately does not include markers that distinguish the interneurons dorsal to pMNs. We categorized them collectively as “more dorsal”.

      Response-time is defined as "the amount of time with an above-basal Shh response". This seems to me as the definition of response duration. I would assume that response-time, means the time it takes until a response is first observed. Please consider changing this.

      We did not use “duration” because a response time course recorded in these tracks may include multiple durations (on and off). The duration of exposure/response has been specifically used in the field as a single period of response. So it’s a sum of active responding time here. Clarified in the text.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors address several possible setbacks of transforming the measured fluorescence intensity of the patched reporter into a readout of the Shh signaling activity over time, however, one aspect that isn't directly addressed is the potential effect of differences in the z position of analyzed cells. These could, at least in principle, be sufficient to introduce significant noise in the fluorescence measurements. Can the authors subset their datasets by initial, as well as average, z position and then re-examine the measured trends for both Shh activity and the intensity of the cell fate reporters used in the study?

      The zebrafish early neural plate/tube has a small thickness in z in dorsal-ventral imaging and the tissue is transparent. The depth-associated scattering contributes very little, if at all to the fluorescent signals in the imaged time window. This can be seen in the nuclear/membrane signal of the movies, which is largely uniform across the tissue in z in the neural tissue. It can also be seen that the notochord cells, further ventral, appears to be dimmer.

      (2) It is critical for the validity of this study that the intensity of the patched reporter introduced by the authors in 2012, and used again in this study, faithfully represents the signaling activity of Shh. In this study, the authors provide measurements of the transcriptional rate of Kaede and additional modeling for this purpose. However, an important point is to determine how sensitive is the reporter to changes in Shh signaling of different magnitudes?

      We consider this BAC reporter line a good (probably still the best live reporter) one as it resolves the endogenous gradient up to the dorsal interneuron domains (Huang et al., 2012, Xiong et al., 2013) and responds well to perturbations (Notch, Cyclopamine, etc). But it’s true that we don’t have information of how sensitive it responds to changes of different magnitude. As far as we know, there is no in vivo, single cell information of how Shh targets respond to signaling of different magnitudes.

      (3) To strengthen the previous point, it would be nice to extend the analysis in Figure 2, at least partially, using other readouts for Shh activity (e.g. GBS-GFP)?

      We have used a GBS-RFP line previously and found it to be lower resolution in terms of showing the DV gradient, compared to ptch2:kaede.

      (4) It is unclear to me what is the relevant time window during which cells respond to Shh in the anterior versus posterior domains to determine progenitor specification. This is a concern to me, since: i) the average heterogeneity of Shh activity seems to increase strongly in time (Figure 2A/C); and ii) it is important to exclude that the finding of heterogeneous relationship between Shh activity and fate choices is largely driven by later timepoints, where potentially its activity is no longer relevant for cell fate specification. Can this point be clarified when this data is introduced in the manuscript and further discussed?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that.

      (i) The ptch2:kaede reporter variability is higher in terms of magnitude (the signal gets brighter) in later times but the heterogeneity (overlap between difference cell fate groups) is lower in later times

      (ii) Similarly, the heterogenous relationship is more pronounced in early time points. Since we do not know exactly when the activity becomes no longer relevant (from our earlier studies we do think that the cells become specified early, when Shh signaling is noisy), we modelled the response profile and searched for a good predictor. The maximum response stands out, particularly as a good indicator for the posterior cells, suggests an early window/time of specification.

      Discussion has been modified to clarify these points.

      (5) Is the response of the patched reporter, as well as cell fate reporters, to defined concentrations of exogenously provided Shh heterogeneous, for instance, in in vitro experiments?

      Well-controlled (e.g., microfluidics and labeled Shh molecules) in vitro experiments will be fantastic future directions. Existing tissue explant + Shh dose approaches do not resolve the heterogeneity of exposure at single cell level but may be helpful in testing the limits and variabilities at different magnitudes.

      (6) The source of noise in this system is not entirely clear to me. The authors seem to attribute the heterogeneity they observe to the way cells respond to Shh, but can it be excluded that the morphogen profile is itself noisy to start with? It is currently difficult to distinguish between these two possibilities, given that the Shh activity reporter used in this study is itself a transcriptional output of the pathway. Can the distribution of Shh itself be analyzed (even if in immunostainings) during neural tube formation?

      Yes we fully agree. More quantitative analysis may help dissecting the sources of noise. The morphogen profile (particularly through time) will be great. Currently no reagent is available to achieve that. Studies using an engineered morphogen or tagged morphogen suggest that the pattern through tissue reasonably captures simple diffusion dynamics. However, at single cell level considerable randomness may still remain and difficult to quantitatively compare with still staining.

      (7) It is unclear to me how the authors define the ultimate cell fate of cells in their analysis in Figure 6. The brief description in the methods and in the manuscript seems to suggest that, in combination with marker expression, the cell position is used as a criteria to assign the fate to the progenitors - if this is the case, I guess the observed relationship in Figure 6 with LMDV distance is almost a control? This could be clarified for the readers.

      Yes indeed Figure 6 is a control as LMDV distances lead to final positions which form part of our determination of cell fates.

      As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully.

      The methods of fate determination are described in detail in methods.

      (8) The graphs in Figures 6 and 7 are difficult to interpret. What proportion, and absolute number, of cells are "mis specified" when the authors show the distinct colored lines in the pMN, LFP or more dorsal domains? How do the authors determine where each cell fate domain begins and ends to access for "mis-specified" cells? Can the authors also provide the corresponding experimental images in the figure?

      We apologize for the difficulties to interpret these figures. The graphs are a ranked list of all cells using the specified metric. The visual is to help generate an intuition of how mixed vs clear-cut the pattern is given the tested metric. They are not to be interpreted as the actual pattern in the tissue and there are no data images that show these patterns.

      (9) Given the experimental limitations/technical challenges discussed by the authors during the paper, the score of around 90% of predictability of cell fate choices is rather high in the anterior domain, suggesting a minor functional role for heterogeneity in this region. Even for the posterior domain, the score of 83% predictability based on the maximum response to Shh is still relatively high. In my view, this author's conclusions should be adjusted to make this difference clearer in the abstract and discussion, highlighting that the heterogeneity between Shh response and cell fate choices, particularly in the pMN fate, are stronger in the posterior domain affecting the precision of cell fate decisions particularly in this region. Can the authors further comment on potential mechanisms driving this difference?

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here.

      We have added the fact that the Shh response is still the main determinant of the pattern despite the heterogeneity in the Discussion. We also further discussed possibilities of the anterior posterior differences.

      (10) Following up from the previous point, the data in Figure 7 suggests that there might be different underlying mechanisms in how anterior and posterior cells interpret the Shh profile, with anterior cells potentially responding to the integrated concentration of Shh (since response time, average response, or maximum response to Shh all provide similar predictability scores for cell fate choices). In contrast, only the maximum response to Shh can provide a good prediction of posterior cell fate, consistent with a more instantaneous response to morphogen concentration (and thus potentially more error-prone measurement of the Shh profile?). This is a very interesting observation in my view. Could this be further tested?

      Thank you. Yes we found this very interesting too. We discussed the possibilities, including the reviewer’s suggestion that these cells may have different contexts or strategy to interpret the signal. It is also possible that the anterior cells use the same strategy (maximum response at an early time) and the subsequent response/duration do not matter to their fate commitment. A precise approach to shut down Shh response dynamics in single cells (e.g., optogenetics) will enable the test of these ideas. We hope following up studies will take such approaches.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Conceptual framing and interpretation:

      The central conclusion may require more precise framing to avoid potential overreach. The authors' interpretation equating "physical distance between TAD boundaries" with overall "TAD boundary architecture," and "transcriptional bursting events" with broader "gene activity," could benefit from clarification. This framing may not fully capture the temporal dynamics of transcription or the regulatory complexity within TADs. Furthermore, the broad conclusion of an uncoupled relationship appears to challenge extensive prior evidence from perturbation studies showing that disrupting TAD boundaries can alter gene expression. The authors' own observation of reduced gene activity upon RAD21 degradation suggests that global TAD disruption can affect transcription. A more precise and limited conclusion, acknowledging that their data demonstrate a lack of detectable correlation between boundary distance and bursting activity in their system, would be more accurate and help reconcile these findings with the existing literature.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16 of our Discussion, a separate section on the limitations of the study, noting that our conclusions are limited to TAD boundary distances and do not reflect the structure of TAD boundaries or of TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      (2) Technical methods and data presentation:

      (2.1) Accuracy and dimensionality of distance measurements: The manuscript does not clearly state whether distances are measured in 2D or 3D, nor does it sufficiently address precision limits. The stated Z-step size (1 µm) may be inadequate for accurately measuring sub-micron chromatin distances in 3D.

      We state in both the Results and Methods that our data represent 2D distances derived from maximal-intensity projections of 3D image stacks. We previously published a detailed analysis of the precision of this measurement approach applied to chromatin interactions and documented the effect of 2D vs 3D analysis on these types of measurements. This study by Finn et al., 2022 is cited in the text. We also show in Figure S3 and mention on p. 6 and 10 that we observe similar results using either 2D or 3D analysis.

      (2.2) Probe design and systematic error: The genomic coverage size of the BAC probes used for DNA FISH is not explicitly stated. Large probe coverage could inherently blur the precise spatial location of adjacent DNA loci. The reported average distance (~300 nm) may be influenced by the physical size of the probes, as well as systematic expansion or distortion introduced by sample fixation and FISH processing. Although such technical limitations are currently unavoidable, the authors should clarify how these factors might affect their ability to detect subtle distance changes.

      The genomic location and size of all probes are provided in Supplementary Table 1. We deliberately use relatively large BAC probes both to generate robust, highly reproducible signals and to eliminate effects arising from local chromatin behavior. In line with earlier characterization of BAC probes (Finn et al., Cell, 2019; Finn et al., Methods, 2022), we find a strong correlation between micro-C/Hi_C interaction frequency and distance measurements. Systematic errors such as sample fixation and FISH processing have previously been evaluated by comparison to live cell data (see Finn et al., 2019) and found to be negligible, especially as all our analyses involve pairwise comparisons, which would both be similarly affected by systematic errors. We discuss resolution limits due to probe size in our new section on study limitations on p. 16.

      (2.3) Data Visualization: The manuscript would benefit from including representative, zoomed-in regions of interest from the raw imaging data. This would allow readers to visually assess measured distance differences against background noise.

      Raw images for inspection at any magnification are available at https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      (2.4) Potential impact of resolution limits: In Figure 5, the micro-C data reveal a clear difference in interaction patterns inside versus outside the VARS2 locus TAD, yet the imaging data show no corresponding distance difference. This strongly suggests that the current imaging system, limited by optical resolution, probe size, and localisation accuracy, may be unable to resolve finer-scale spatial reorganizations associated with specific chromatin conformations (e.g., enhancer-promoter loops). The authors should explicitly discuss that their conclusion of "no coupling observed" may be constrained by the resolution and sensitivity of their method and does not preclude the possibility of detecting such associations with higher-precision measurements or in live-cell dynamics.

      We generally see good agreement between micro-C/Hi-C data and distance measurements. Specifically, we consistently find closer proximity of boundaries than non-boundaries and larger boundary distances for larger TADs than for smaller ones, as presented throughout the study. Contrary to the reviewer’s statement, this is also true for the VARS2 TAD, where we find statistically significant shorter boundary distances for boundary probes (350 nm) vs the outside control region (390 nm), which correlates with the difference in micro-C interaction score of 5847 vs 2308. These data are shown in Figure 3. Regardless, we mention the issue of resolution due to probe size in the study limitation section on p. 16.

      Reviewer #2 (Public review):

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences). In addition, the correlation analysis is asymmetric, primarily stratifying by transcriptional status and then comparing boundary distances. Given the central claim that boundary architecture does not influence gene activity, the analysis should be done from the opposite perspective (stratifying by boundary distance).

      We mention the limitations on resolution of our approach in our discussion of study limitations on p. 16. An example of an analysis of stratifying by boundary distance is presented in Figure S3C. The conclusion is the same as stratifying by activity status.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting but is lacking from the discussion.

      We mention the stronger effect of cohesion depletion compared to CTCF loss on gene expression in multiple locations in the Results and Discussion.

      On a related note, this approach primarily tests the role of boundary interactions rather than domain organization as a whole, and it should be acknowledged that internal domain structures are not directly assessed.

      We have modified statements throughout the manuscript to clearly indicate that our conclusions relate to boundary interactions rather than domain organization as a whole. We also discuss this in our section on study limitations.

      The comparison to work in other organisms (particularly the comparisons made to Drosophila) should be handled with care. The mechanisms underlying domain formation differ substantially across these systems, particularly regarding the differences in CTCF's role.

      We have modified our discussion of the data on Drosophila TADs, particularly as it relates to CTCF.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I couldn't locate the image data from figshare with the information provided (DOI: 10.6084/m9.figshare.30728354)

      The link has been updated

      https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      Reviewer #2 (Recommendations for the authors):

      Some of the conclusions overreach. I recommend revising the claims and discussion to focus solely on the proximity of boundaries, instead of TADs themselves. This would match better with your experiments.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16, a separate section on limitations of our study, noting that our conclusions are limited to TAD boundary distances and do not reflect on the structure of the TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      I do disagree with the interpretation of the data in some parts, particularly at the end, where you state that disruption of TADs does not impact gene activity. For example, "Altogether, these results demonstrate that disruption of TAD boundary architecture is insufficient to alter gene expression" doesn't seem to match the results. Sure, depletion of CTCF minimally impacted gene expression, but it also minimally impacted the boundary distances. I think it is interesting that depletion of RAD21 had a bigger impact on both gene expression and boundary distances, and this should be discussed.

      We have deleted this statement and now mention on p. 13 that RAD21 depletion affected gene expression, whereas loss of CTCF did not, and on p. 15 that loss of RAD21 had a greater impact on boundary distances than loss of CTCF. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      Related to this, I also recommend expanding the discussion of prior live-cell imaging work (ref 32) that showed that the fully formed CTCF loop is a rare event.

      We have expanded the discussion of prior live-cell imaging work in several locations.

      All the analysis is done from the perspective of the gene expression (e.g. group by expression and then measure distances). It would help to show that the inverse analysis is consistent (e.g. group by distances and measure gene expression).

      Analysis of data stratified by distance measurements is shown in Figure S3C.

      The discussion of the Drosophila work is strange, given that CTCF in Drosophila has a very different N-terminus, explaining why it doesn't really form loops. Sure, maybe it contributes to domains in some way, but probably no more than the dozens of other architectural proteins that have been found in that system. This work clearly focuses on CTCF-loop domains, so I would be specific about that. In the introduction, you do a good job of saying "in human cells, TADs are.... marked by binding sites for the CTCF protein". However, then you overgeneralize and state that TADs form via a process of loop extrusion. I think a simple statement before this to say that TADs in human cells have become somewhat synonymous with CTCF loop domains, and that is how you will use the term here. However, other organisms have TADs despite the lack of conservation of the CTCF protein.

      We have modified the text accordingly.

      On a related note, in the discussion, you cite two papers in Drosophila to state that "TADs form prior to the establishment of cell-type-specific gene expression programs", but that's not entirely accurate for those papers. They actually show that TADs occur coincident with ZGA, but loops form before that (ref 23: Espinola et al), or that there are indeed a few boundaries that show up before ZGA, but these correspond to RNA Polymerase (ref 24: Ing-Simmons et al.).

      We have corrected this statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      It is important to make a few key points about our work. First, our paper is largely a computational biophysics paper, augmented by experimental results. Generally speaking, computational biophysics work intends to achieve one of two things (or both). One is to provide more molecular level insight into various behaviors of biomolecular systems that have not been (or cannot be) provided by qualitative experimental results alone. The second general goal of computational biophysics it to formulate new hypotheses to be tested subsequently by experiment. In our paper, we have achieved both of these goals and then confirmed the key computational results by experiment.

      eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. While the modeling is technically sophisticated and the results are promising, some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete.

      Given our response below, regarding the rigor and “completeness” of our work, we do not feel that an editorial judgement of “leaving parts of the evidence incomplete” is justified.

      We also note that another recent experimental paper has validated essentially every prediction made in our eLife paper: https://www.biorxiv.org/content/10.64898/2026.01.05.697065v1

      We thus disagree that the evidence we have presented in our paper is incomplete.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement. 

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      We note (also in the manuscript) that it is difficult to compare the timescales obtained from coarse-grained MD simulations and experiments (“real time”) given that, by design, the CG simulations are accelerated to greatly enhance sampling. However, we can qualitatively compare the timescales of different CG simulations (without directly comparing the corresponding experimental timescales).

      We agree with the reviewer that the starting point of NPC-capsid and capsid-only simulations is different, as is the biological environment in which the rupture occurs. When analyzing the NPC-only and capsid-only simulations, what was striking to us was that at the NPC the capsid-LEN complex ruptures in a multicomponent environment, where several FG-NUPs compete to displace the LENs. It is well established in experiments that LEN has a detrimental effect on capsid integrity.

      In Figure 2, we plot the number of LEN molecules as a function of CG simulation time. The initial capsid-LEN complex was equilibrated without NPC and then placed at the cytoplasmic end of the NPC for docking. The number of LEN molecules for the capsid-only simulations and the NPC-docked simulations is nearly identical, and an insignificant number of LEN molecules unbind at the NPC. Hence, we added the following clarification:

      Page 10, paragraph 11

      “Note that the number of LEN molecules bound to the capsid for the free capsid and NPCdocked capsids are nearly identical. Hence, the disparity in timescale of lattice rupture is not only because of the effect of LEN on capsid lattice properties.”

      Is the time really comparable, given that the simulations have different starting points?

      Yes, the CG timescales of both the NPC and freely diffusing capsid unbiased simulations are comparable, since they were done using identical simulation settings.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      Our implicit solvent CG MD simulations are designed to access timescales far beyond the capabilities of the fully atomistic simulations. We reiterate here that it is difficult to directly compare the timescales obtained from CG MD simulations and experiments.

      As described in the text, there are 12 pentamers in the capsid (7 in the wide end and 5 in the narrow end). For the narrow end to rupture, all 5 pentamers should progressively dissociate. In our unbiased simulations (Fig. S5), in 25 us of CG time, we observe (partial) dissociation of one or two pentamers. Hence, our unbiased CG simulation timescales were not long enough to observe rupturing of the narrow end.

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      We modified (see below) the main text to include the details.

      Page 4, paragraph 1

      “We model LEN and CA interactions such that LEN molecules can only bind to CA hexamers, and all interactions to CA pentamers are turned off, as in experiments, CA selectively associates with hexamers (25, 36).”

      Reviewer #2 (Public review):

      Here, Hudait et al. use CG modeling to investigate the mechanism by which Lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling and thus mislead future readers. 

      We strongly disagree with these statements, and they do not reflect the facts. We provide a rebuttal to these statements in the “Author Response” statements below.

      (1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:

      (a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24). 

      (b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008). 

      (c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459). 

      (d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122). 

      (e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776). 

      (f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537). 

      The goal of our simulations (in combination with experiments from the Pathak group) is to provide molecular-level insight into the sequence of events of NPC docking of capsid and the effect of LEN binding leading to sequential dissociation of pentamers and leading to rupturing of the narrow end of the cone-shaped capsid. We also compare the events leading to capsid rupture at the NPC with the same for a freely diffusing capsid, akin to that in cytoplasm. The reviewer should carefully read the abstract of our paper. In fact, the above are all papers that present qualitative experimental results that help validate our model, but they do not provide details on the molecule-scale events. For example, the paper (10.1073/pnas.2420497122 written by our coauthors in the Pathak group) is extensively used to compare the behavior of LEN-bound capsid in the cytoplasm.

      (2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models. 

      We strongly disagree with this assessment. Our CG NPC model is largely a “bottomup” model derived from molecular scale interactions sampled in atomistic simulations (see our previous paper in PNAS https://doi.org/10.1073/pnas.2313737121). The reviewer appears to be ignorant of the “bottom-up” approach based on rigorous statistical mechanics to derive moleculescale model (please refer to a detailed review on bottom-up coarse-graining: J. Chem. Theory. Comput., 2022, 18. 5759-5791).

      Using the “bottom-up” CG model of the NPC, we predicted several molecular-level details of capsid import and docking to the NPC. Our key predictions were that there is an intrinsic capsid lattice elasticity and also the pleomorphic nature of the NPC channel is key for successful capsid docking https://doi.org/10.1073/pnas.2313737121). Our computational predictions have benn, for example, validated in a recently published paper by an experimental group: Hou, Z., Shen, Y., Fronik, S. et al. HIV-1 nuclear import is selective and depends on both capsid elasticity and nuclear pore adaptability. Nat Microbiol 10, 1868–1885 (2025). https://doi.org/10.1038/s41564025-02054-z). Our work is an excellent example of how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biological processes.

      We have now added the following statement:

      Page 3, Paragraph 1

      “Importantly, the computational predictions of capsid docking to the NPC central channel have been recently validated in a HIV-1 core import at the NPC using cryo-ET (33), demonstrating how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biomolecular processes.”

      (a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.

      The reviewer fails to recognize that the statement is based on the experimental results of LEN-bound capsid that remains bound to the NPC after rupture and fails to translocate to the nuclear side (from the Pathak group in the section “Ruptured LEN-viral complexes remain bound to the NPC”). The Reviewers’ comment is incorrect. 

      (b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.

      The concept of elasticity can extend down to the mesoscopic scale. Many examples can be found in the large number of elastic network models (ENMs) of proteins published by many authors. The reviewer also fails to comprehend the meaning of the effective spring constants in the HeteroENM model and how they relate to the response of the capsid to stress (e.g., in the NPC). Note, in the NPC central channel, the capsid encounters several nucleoporins (including disordered FG Nucleoporins that not have specific interactions to rest of the proteins), and also a confined environment. This environment can exert inward stress to the capsid, which is also reflected in stress on the capsid lattice. Furthermore, the cited computational AFM studies are very far from a realistic in vivo or even in vitro set of conditions. In contrast, our study presents a realistic environment which the capsid will encounter in NPC, and then these predictions are validated by experimental results.

      (c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations. 

      We presented the data of formation of striated patterns of lattice stress in the capsid that runs from capsid narrow end to the wide end in coarse-grained model (https://doi.org/10.1073/pnas.2313737121), and atomistic model (https://doi.org/10.1073/pnas.2117781119). Both of our papers are extensively cited in the current manuscript. Also, when the capsid is ruptured, one cannot visualize the striated patterns.

      (d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.

      Reviewer comment: “This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim.” – This comment by reviewer is deeply flawed and we strongly disagree. In our CG model there is no restriction on the number of LEN molecules that can bind to a CA hexamer. We again restate that, the experimental results on LEN binding to CA hexamers and inability of LEN to bind to pentamers were used as no allatom (AA) forcefield yet exists.

      The steric effect of the lack of third LEN binding to a hexamer is a likely hypothesis (which one is allowed to make). More importantly, an investigation of the steric effect of LEN binding to the CA hexamer is not the main goal of the manuscript.

      (e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.

      We disagree with this spurious statement, and we see no real contradiction. We have now added a minor clarification that LEN can inhibit efficient capsid binding at significantly high concentration.

      Page 6, Paragraph 1

      “Therefore, at significantly high concentration LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm.”

      (f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.

      This is both a misrepresentation of the simulations and a failure to understand them (as well as the supporting experiments) on the part of the reviewer. In the presence of LEN, the hexameric lattice is hyperstabilized. In contrast, the pentamers are not. As a consequence, the pentamers are dissociated. The pentamers at the narrow end are dissociated first, due to high curvature. The reviewer, from a point of being uninformed, simply speculates on what they think should happen. Moreover, as emphasized earlier and which the reviewer fails to comprehend is that ours is a “bottom-up CG model” so it predicts, not builds in, these effects.

      (g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.

      Unfortunately, the reviewer appears to be quite uninformed on the WTMetaD method and what it does. The chosen collective variable (CV) in our case is the coordination variable and the MetaD samples along that variable (the conditional free energy) as it is designed to do. The reviewer may wish to educate themself by reading Dama et al (https://doi.org/10.1103/PhysRevLett.112.240602). We also note that “emergent properties” are not along some other, uncoupled coordinate.

      (3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsidNPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation. 

      We disagree with this statement and point out where we have cited other work, including the ones mentioned above. However, our CG model is a largely bottom-up CG model which differs from other more ad hoc CG approaches (and some well-known CG models). We do not wish to emphasize the obvious flaws in those other CG approaches and models, since that is not the focus of our manuscript.

      (4) Other critiques, questions, concerns:

      (a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading. 

      The movie in question (capsid docking to NPC without LEN) is essential for comparison of LEN-binding dynamics. Different from our previous paper, we simulated significantly longer timescales of capsid docking and performed several additional analyses that is relevant to this paper. Moreover, the first section of the result is titled “Coarse-grained modeling and simulation”, hence we only present a summary of the CG models and key validation steps in this section.

      (b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?

      These simulations are the average from two replicas. “For all the plots, the solid lines are the mean values calculated from the time series of two independent replicas, and the shaded region is the standard deviation at each timestep.” This was mentioned in the original figure caption.

      (c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.

      It is difficult to discern a high curvature region of the capsid due to object overlap of different regions of the capsid. This is likely a case of “perspective distortion” in image processing.

      (d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsidNPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions. 

      This is due to the rate at which trajectory frames are saved for movie generation for faster processing of the movies. We added the following in movie caption: 

      “The movie frames correspond to snapshots every 250000 𝜏<sub>CG</sub>.” 

      (e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?

      We prefer to continue with the current normalization, since typically in the HIV-1 literature capsids are represented as a collection of hexamers and pentamers (rather than total number of CA monomers).

      (f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid. 

      We corrected the statement as follows:

      “Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate from the high curvature to low curvature end of the capsid.”

      (g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.

      We only show LEN in Figure 1, and in rest of the figures the bound LEN molecules are not shown for clarity. The defects are shown in a darker shade of orange (amber). 

      (h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.

      We have corrected the erroneous caption in Fig. S3. The color scheme in Fig. 2 and Fig. S3 are now consistent.

      (i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.

      We modified the statement as follows:

      “CG MD simulations complemented by the outcome of live-cell imaging demonstrate that LENtreated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated with the NPC after rupture.”

      (j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?

      In our previous paper, the difference from condensation state of RNP complex appear only in the pill-shaped capsid, and not in the cone-shaped capsid. In this manuscript, we only investigated the cone-shaped capsid.

      (k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.

      “What multitude of unfavorable interactions are the authors referring to?” We have now added the following sentence to clarify

      “Here we denote unfavorable CA-LEN interactions as all interactions other than the electrostatic and van der Waal interactions that lead to CA-LEN binding (17).”

      “In the distorted lattice, there is an increase of void volume is based on standard solid-state physics understanding. We added the word “likely” in the statement. “. This can likely be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed (41).”

      Moreover, in one of our previous manuscripts, we established that compressive or expansive strain induces more closely packed or expanded lattice (A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022)).

      (l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?

      The ideal lattice packing definition is provided in our previous manuscripts: 1. A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022), 2. A. Hudait, G. A. Voth, HIV-1 capsid shape, orientation, and entropic elasticity regulate translocation into the nuclear pore complex. Proceedings of the National Academy of Sciences 121, e2313737121 (2024).

      These manuscripts are cited in the previous statement. The ideal lattice packing is defined based on lattice separations in each core (in cryo-ET and atomistic simulations) using a local order parameter, which measures the near-neighbor contacts of a particle. Moreover, the ideal packing reference is calculated from all available capsid shapes (cone, ellipsoid, and tubular), and takes into account different curvatures.

      (m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?

      We have added a statement as follows:

      “Based on our analysis, we hypothesize that LEN binding hyperstabilzes the CA hexamerhexamer interactions relative to CA hexamer-pentamer interaction.”

      (n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating. 

      We have now changed “uncoating” to “rupture” throughout the manuscript.

      (o) The authors state: " LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.

      We have modified the statement as follows

      “We hypothesize that LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC.”

      (p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamerhexamer interactions, as well as more nucleation of defects at the hexamer-hexamer Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We have fixed the run-on sentence.

      The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We disagree with the reviewer. The statement was intended to provide a comparison between free capsid and NPC-bound capsid.

      (q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).

      We disagree with the interpretation of the reviewer. Our interpretation based on our results is LEN binding accelerates capsid rupture (from pentamer-rich high curvature ends), and the rest of the broken hexameric lattice is hyperstabilized. Ultimately, lattice rupture will lead to release the RNP, and hence the intended goal of the drug is achieved.

      (r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.

      The simulations were performed in LAMMPS, which is open source. This software is already stated in the Methods. Input data is provided upon request.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: In part B, it appears the middle panel was screenshotted from a ppt, given the red line underneath Lenacapavir. You can export it to an image instead.

      The figure is fixed.

      (2) Figure 6: In part A, the LEN_d in the graph is illegible. Also, in the panel next to it, it also appears to have been screenshotted from a ppt.

      The figure is fixed.

      (3) Page 6: There's an errant quotation mark at the end of a paragraph.

      Removed the errant quotation

      Reviewer #2 (Recommendations for the authors):

      The code used to perform bottom-up solvent-free CG modeling simulations is not made available.

      This is not true. LAMMPS was used as stated in Methods.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      We thank the reviewer for their time and effort in improving the quality of our manuscript.

      We would like to point out that the results presented in the previous Fig. S12 (now Fig. S13) are from different ages of the mice and restricted to the prefrontal cortex, compared to the previous report (Florence Authier, et al., Dis Model Mech. 2023) where we showed OGT and OGA mRNA/protein expression in total brain homogenates. In this previous study, we observed a significant reduction in OGT protein levels while OGT mRNA levels were significantly increased in the brains of 3 months old mutant C921Y compared to WT controls. However, in our current study (Figure S12, now S13), OGA and OGT mRNA/protein expression have been a) restricted to the pre-frontal cortex and b) are from 4 months old male mice. Therefore, a direct comparison of findings from total brain vs. prefrontal cortex would be speculative. In our present work, OGT protein levels are not changed in the pre-frontal cortex, while OGT mRNA levels are increased (similarly to the total brain data), albeit not significantly.

      It is plausible that the different levels of OGT protein expression in total brain (previous study) and prefrontal cortex (current study) potentially reflect regional differences in the regulation of OGT protein levels/stability, since OGT mRNA levels are increased in both cases. This notion is also supported by additional analyses in three other brain regions (hippocampus, striatum and cerebellum) and these data are now included in Figures S13 and S14.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      Following the reviewers’ suggestion, we have performed additional analyses to identify the cellular composition of the observed nodular dysplasia using neuronal and glial markers. These new analyses indicate that the nodular collections in the layers II/III were predominantly neurons, for example see cresyl violet (Fig. 6E). Moreover, we have also performed immunofluorescence imaging using NeuN and GFAP (Fig. 6G-H), which reflect that the dystrophic collections are predominantly neurons. To further corroborate these findings, we have also performed multiplex IHC analyses, presented in Fig. S12, which indicate that: i) the nodular cortical malformations were populated by neurons and oligodendrocytes and ii) predominantly affected layers II-V, as reflected by the distribution of neuronal markers Reelin and POU class 3 homeobox 2 (POU3F2), and collectively (Fig. 6 and Fig. S12) reflect neuronal disorganisation due to migration defects rather than differentiation defects. We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular features; however, tissue from defined stages of development is not available. 

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      We agree with the referee that these experiments would further strenghten the work. However, we respectfully point out that the inference that altered proteins must themselves be O-GlcNAc modified is not necessarily correct. For instance, O-GlcNAcylation of unknown protein kinase X, E3 ligase/DUB, Y or transcription factor Z could indirectly affect these pathways/proteins. Nevertheless, we have performed further experiments to explore whether Wnt/β-catenin and mTOR signalling are functionally affected, as pointed out by the referee. In the qPCR analyses, we did not observe significant changes in expression of Wnt target genes (Cdkn1a, Ccnd1, Myc, Ramp3, Tfrc), neither in protein levels of key proteins involved in Wnt/β-catenin (non-phosphorylated β-catenin) and mTOR (phosphorylated rpS6) signalling by western blots (data not shown). These results suggest that both pathways are not functionally deregulated in prefrontal cortex of adult OGT<sup>C921Y</sup> mice to a significant extent.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular dynamics; however, tissue from defined stages of development is not available. As stated above, we want to share our current findings with the scientific and patient communities in a timely manner, and the suggested experiments could form the foundation of a follow up study in the future.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

      Thank you for the comment. To avoid misleading the readers, we have removed panel A from the previous version of Figure 8 and updated the version of record.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Statistics including exact p-values have been included in the main text for all key questions where appropriate.

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1F, the y-axis labels and scale values are partially obscured by graphical elements, compromising accurate interpretation of the data range.

      Panel 1F has been adjusted to make the y-axis label visible.

      (2) Regarding the histological analyses in Figure 6, the current H&E staining and Luxol Fast Blue myelin staining results lack age-matched wild-type control samples processed in parallel, which undermines experimental comparability. To enhance methodological rigor, control group staining results should be displayed adjacent to each experimental group image.

      The original Figure 6 already contained comparison between WT and OGT<sup>C921Y</sup> tissues. The Figure has been updated with additional data from the WT and C921Y mutant groups shown side by side.

      Reviewer #2 (Recommendations for the authors):

      (1) I believe that Figures S1 and S2 were switched during the submission. The legends are correct, so the authors should just be careful with the order when they upload the final versions.

      Figures S1 and S2 have been re-ordered.

      (2) On page 18, the authors state, "Although no significant changes in the expression of OGT were observed in OGTC921Y cortex (Figure S12A, C), there was a significant increase in OGT/OGA protein ratio in OGTC921Y mice (Fig. S12D). As a functional consequence, global O-GlcNAcylation of proteins in the brain was drastically impaired in the OGTC921Y brain compared to WT (Figure S12E, F).

      To me, this statement suggests that the incorrect ratio of OGT to OGA is responsible for the altered O-GlcNAc levels. I think this is missing important information. The authors are, I'm sure, aware that OGT and OGA expression is linked to O-GlcNAc levels. I think it would be better to describe the situation here as the tissue attempting to respond to lower OGT activity by lowering OGA levels. However, the tissue is not fully successful, resulting in lower overall O-GlcNAc levels as seen by RL2. If the difference were only driven by the OGT/OGA ratio, one would expect increased O-GlcNAc levels due to decreased OGA. I think it is important to point out more details here for non-expert readers.

      Thank you for the insightful comment, we have included these aspects in the revised text, please see page 20.

      (3) I am a little surprised that the authors did not explore differences in O-GlcNAc-modified proteins through a more targeted enrichment of these proteins for analysis of potential modification differences, in addition to just changes in protein abundance.

      We agree that these experiments would further strengthen the work. However, it is not known yet whether OGT-CDG is caused by loss of O-GlcNAc modification on specific proteins or due to as yet to decipher mechanisms (e.g. OGT interactome, HCF1 processing, feedback on OGA levels) which we are not able to confirm in the current manuscript. Therefore, as a starting point, we have performed whole proteome analysis to establish candidate hypothesis which could lead to discovering cellular and molecular mechanisms underlying OGT-CDG. Lastly, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents high-resolution cryoEM structures of VPS34-complex II bound to Rab5A at 3.2A resolution. The Williams group previously reported the structure of VPS34 complex II bound to Rab5A on liposomes using tomography, and therefore, the previous structure, although very informative, was at lower resolution.

      The first new structure they present is of the 'REIE>AAAA' mutant complex bound to RAB5A. The structure resembles the previously determined one, except that an additional molecule of RAB5A was observed bound to the complex in a new position, interacting with the solenoid of VPS15.

      Although this second binding site exhibited reduced occupancy of RAB5A in the structure, the authors determined an additional structure in which the primary binding site was mutated to prevent RAB5A binding ('REIE>ERIR'). In this structure, there is no RAB5A bound to the primary binding site on VPS34, but the RAB5A bound to VPS15 now has strong density. The authors note that the way in which RAB5A interacts with each site is distinct, though both interfaces involve the switch regions. The authors confirm the location of this additional binding site using HDX-MS.

      The authors then determine multiple structures of the wild-type complex bound to RAB5A from a single sample, as they use 3D classifications to separate out versions of the complex bound to 0, 1, or 2 copies of RAB5A. Overall, the structure of VPS34-Complex II does not change between the different states, and the data indicate that both RAB5A binding sites can be occupied at the same time.

      The authors then design a new mutant form of the complex (SHMIT>DDMIE) that is expected to disrupt the interaction at the secondary site between VPS15 and RAB5A. This mutation had a minor impact on the Kd for RAB5A binding, but when combined with the REIE>ERIR mutation of the primary binding site, RAB5A binding to the complex was abolished.

      Comparison of sequences across species indicated that the RAB5A binding site on VPS15 was conserved in yeast,while the RAB5A binding site on VPS34 is not.

      The authors tested the impact of a corresponding yeast Vps15 mutation (SHLITY>DDLIEY) predicted to disrupt interaction with yeast Rab5/Vps21, and found that this mutant Vps15 protein was mislocalized and caused defective CPY processing.

      The authors then compare these structures of the RAB5A-class II complex to recently published structures from the Hurley group of the RAB1A-class I complex, and find that in both complexes the Rab protein is bound to the VPS34 binding site in a somewhat similar manner. However, a key difference is that the position of VPS34 is slightly different in the two complexes because of the unique ATL14L and UVRAG subunits in the class I and class II complexes, respectively. This difference creates a different RAB binding pocket that explains the difference in RAB specificity between the two complexes.

      Finally, the higher resolution structures enable the authors to now model portions of BECLIN1 and UVRAG that were not previously modeled in the cryoET structure.

      Strengths:

      Overall, I found this to be an interesting and comprehensive study of the structural basis for the interaction of RAB5A with VPS34-complex II. The authors have performed experiments to validate their structural interpretations, and they present a clear and thorough comparative analysis of the Rab binding sites in the two different VPS34 complexes. The result is a much better understanding of how two different Rab GTPases specifically recruit two different, but highly similar complexes to the membrane surface.

      Weaknesses:

      No significant weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The work by Spokaite et al describes the discovery of a novel Rab5 binding site present in complex II of class III PI3K using a combination of HDX and Cryo EM. Extensive mutational and sequence analysis define this as the primordial Rab5 interface. The data presented are convincing that this is indeed a biologically relevant interface, and is important in defining mechanistically how VPS34 complexes are regulated.

      This paper is a very nice expansion of their previous cryo-ET work from 2021, and is an excellent companion piece on high-resolution cryo-EM of the complex I class III complex bound to Rab1 from the Hurley lab in 2025. Overall, this work is of excellent technical quality and answers important unexplained observations on some unexpected mutational analysis from the previous work.

      They used their increased affinity VPS34 mutant to determine the 3.2 ang structure of Rab5 bound to VPS34-CII. Clear density was seen for the original Rab5 interface, but an additional site was observed. Based on this structure, they mutated out the VPS34 interface, allowing for a high-resolution structure of the Rab5 bound at the VPS15 interface.

      They extensively validated the VPS15 interface in the yeast variant of VPS34, showing that the Vp215-Rab5 (VPS21) interface identified is critical in controlling complex II VPS34 recruitment.

      The major strengths of this paper are that the experiments appear to be done carefully and rigorously, and I have very few experimental suggestions.

      Here is what I recommend based on some very minor weaknesses I observed

      (1) My main concern has to do a little bit with presentation. My main issue is how the authors use mutant description. They clearly indicate the mutant sequence in the human isoform (for example, see Figure 2A, VPS15 described as 579-SHMIT-583>DDMIE); however, when they shift to the yeast version, they shift to saying VPS15 mutant, but don't define the mutant, Figure 2G). I would recommend they just include the same sequence numbering and WT to mutant replacement every time a new mutant (or species) is described. It is always easier to interpret what is being shown when the authors are jumping between species, when the exact mutant is included. This is particularly important in this paper, where we are jumping between different subunits and different species, so a clear description in the figure/figure legends makes it much easier to read for non-specialists.

      The reviewer has made an excellent point here. To clarify the yeast mutation, we have revised the manuscript main text to refer to the yeast mutant as SHLITY>DDLIEY, and we have added this to the legend for Figs. 2F,G.

      (2) The HDX data very clearly shows that Rab5 is likely able to bind at both sites, which back ups the cryo EM data nicely. I am slightly confused by some of the HDX statements described in the methods.

      (3) The authors state, "Only statistically significant peptides showing a difference greater than 0.25 Da and greater than 5% for at least two timepoints were kept." This seems to be confusing as to why they required multiple timepoints, and before they also describe that they required a p-value of less than 0.05. It might be clearer to state that significant differences required a 0.25 Da, 5%, and p-value of <0.05 (n=3). Also, what do they mean by kept? Does this mean that they only fully processed the peptides with differences?

      (4) They show peptide traces for a selection in the supplement, but it would be ideal to include the full set of HDX data as an Excel file, including peptides with no differences, as there is a lot of additional information (deuteration levels for everything) that would be useful to share, as recommended from the Masson et al 2019 recommendations paper. This may be attached, but this reviewer could not see an example of it in the shared data dropbox folder.

      We have revised the HDX method description to clarify. All peptides were kept and fully processed. However, for the results displayed, we have illustrated only peptides meeting the criteria described.

      The Excel file for all peptides (as recommended by Masson et al) was deposited with PRIDE, with the identifier with the dataset identifier PXD061277, in addition, we have included this excel file in our supplementary material.

      Reviewer #3 (Public review):

      Summary:

      The manuscript of Spokaite et al. focuses on the Vps34 complex involved in PI3P production. This complex exists in two variants, one (class I) specific for autophagy, and a second one (class II) specific for the endocytic system. Both differ only in one subunit. The authors previously showed that the Vps34 complexes interact with Rab GTPases, Rab1 or Rab5 (for class II), and the identified site was found at Vps34. Now, the authors identify a conserved and overlooked Rab5 binding site in Vps15, which is required for the function of the Class II complex. In support of this, they show cryo-EM data with a second Rab5 bound to Vps15, identify the corresponding residues, and show by mutant analysis that impaired Rab5 binding also results in defects using yeast as a model system.

      Overall, this is a most complete study with little to criticize. The paper shows convincingly that the two Rab5 binding sites are required for Vps34 complex II function, with the Vps15 binding site being critical for endosomal localization. The structural data is very much complete.

      Weaknesses:

      What I am missing are a few controls that show that the mutations in Vps15 do not affect autophagy. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following PhoΔ60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex.

      One of the first noted features of the VPS34 complexes was that the ATG14-containing complex (VPS34-CI) is important for autophagy, while the VPS38 (yeast orthologue of UVRAG) subunit characteristic of VPS34-CII is important for endocytic sorting (PMID 11157979). However, the VPS34, VPS15 and BECLIN1 subunits are required are present in both complexes, as such, mutations of them may affect both processes.

      We agree with the reviewer that is an important undertaking to examine the effect of the SHLITY>DDLIEY mutation in yeast Vps15 on autophagy. However, the focus of the current manuscript is VPS34-complex II and RAB5 interaction/activation. An autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      The reviewer has raised an excellent question, which was addressed briefly in the introduction to the manuscript. We have now somewhat expanded on these issues near the end of the discussion in the revised manuscript. In our previously published study, we found that soluble RAB5-GTP did not stimulate the complex II activity (supplementary figure 2b of PMID: 33692360). This is consistent with our finding in this manuscript showing that RAB5 did not cause large conformational changes in solution. However, our previous single-molecule study showed that once complex II is recruited to the membrane by RAB5, and RAB5 increases the turnover rate on membranes, indicating an additional allosteric activation (Figure 7 of PMID: 33137306). This study indicated that the primary the role of RAB5 is to anchor complex II on the membrane. Once the complex is anchored on the membrane by RAB5, the kinase domain is in the vicinity of its substrate, PI, leading to higher turnover.

      The Echelon Class III PI3K ELISA Kit (Echelon, K-3000) comes with a soluble PI, diC8 to measure the VPS34 activity, and it is certainly active with this soluble substrate. However, if the substrate is in membranes, the VPS34 activity is greatly dependent on the character of the membrane.

      I also found the last paragraph of the results section a bit out of place, even though this is a nice observation that the N-terminal part of BECLIN has these domains. However, what does it add to the story?

      The reviewer is correct that the high-resolution features of BECLIN1 at the base of the V-shaped complex that we observed are not related to RAB5 binding, but they are characteristic of VPS34-CII and likely to be important for the specific role of VPS34-CII. This is the first high-resolution structure of the VPS34-CII that has been reported, and we believe it would be irresponsible not to briefly describe them, since they are unique to VPS34-CII. For this reason, we have placed this section at the end of the results, and we now clarify that we do not see a relevance to RAB5 function, but we describe the arrangement of a region (the BH3) that has been functionally noted in many previous studies, in the absence of a structure.

      Reviewing Editor Comments:

      Please address the following suggestions for minor changes to the manuscript. Use your best scientific judgment in addressing the comments and describe the modifications together with your reasoning in a cover letter. We look forward to seeing the revised version of this very nice study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I found a portion of the description of the cryoEM complexes on the top of page 9 to be redundant with similar descriptions near the top of page 7, and it was not clear to me at first that these were describing the same structures. Part of my confusion was due to the redundancy, including the statement near the bottom of page 7: 'Models were built and refined for all RAB5associated VPS34-CII assemblies', and then the similar statement on page 9: 'We fit and refined atomic models into both densities'. I believe these are describing the same models? To clarify for the reader, perhaps on page 9, the authors could begin this part with a statement such as "as described above", and eliminate the redundant descriptions.

      The reviewer is correct. Both sections describe the same set of cryo-EM classes from the same sample. The only difference is what we analysed in the two sections: number of RAB5s bound in the first section and the effect of RAB5 binding in the second section. We have revised the text to make this clear, and to make the second section more succinct.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors show nicely that a mutation in Vps15 disrupts binding to Vps21 in vivo, with defects in the endocytic pathway as analyzed by CPY sorting. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following Pho∆60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex. If the authors were to find evidence that this Vps15 mutant also affects autophagy, it would indicate that there is possibly also another Rab1 binding site in Vps15.

      As we stated above, an autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      (2) It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      As in our response to reviewer #3 above, this point was addressed in previous publications and was described in the introduction to our manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study provides compelling evidence that fever-like temperatures enhance the export of Plasmodium falciparum transmembrane proteins, including the cytoadherence protein PfEMP1 and the nutrient channel PSAC, to the red blood cell surface, thereby increasing cytoadhesion. Using rigorous and well-controlled experiments, the authors convincingly demonstrate that this effect results from accelerated protein trafficking rather than changes in protein production or parasite development. These findings significantly advance our understanding of parasite virulence mechanisms and offer insights into how febrile episodes may exacerbate malaria severity.

      We thank all reviewers for their constructive feedback on our manuscript.

      We believe we have addressed all the questions in the rebuttal below in writing, including planned experiments we will perform to strengthen the conclusions of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature.

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      New Results:

      We performed sorbitol sensitivity assays on >20 hours post-infection iRBCs following heat stress in the presence and absence of the PSAC inhibitor furosemide. These additional experiments were added to the supplementary figures (Supplementary Figure 3). Importantly, sorbitol-mediated lysis of iRBCs, with or without prior heat stress, was reduced when furosemide was present, demonstrating that the observed effect is likely PSAC-dependent. We also observed that uninfected RBCs did not lyse with sorbitol, regardless of heat stress, confirming that the effect is specific to infected cells.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These images include co-staining with the P. falciparum proteins KAHRP and SPB1 to assess possible co-localisations. Furthermore, following the reviewer’s suggestion, we have softened the statement regarding PF3D7_1039000-HA to better reflect the data, changing “...does not colocalise” to “...does not strongly colocalise”.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and possibly including Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found in the context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, here we focused on enhanced protein export using multiple complementary approaches, and have chosen to address rigidity questions in a different study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, a second time point in many of the assays (for example, 36 hrs or later) would be useful to determine if heat stress simply accelerates trafficking of proteins to the RBC or if instead it results in an overall increase in trafficking.

      As mentioned earlier: We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). The end level of VAR2CSA is the same in both conditions, but at 24 hours post infection it is higher following heat stress, indicating that trafficking is accelerated.

      In the text, the authors frequently mention changes in the parasites' phenotype in response to heat stress; however, the way it is described is a bit ambiguous and can be confusing. For example, on page 3, they state that "Following heat stress, significantly more iRBCs (57.6% +/-19.4%) cytoadhered.....". From this sentence, it is not initially clear if the end result is cytoadherence of 57.6% of iRBCs or if this refers to an increase of 57.6%. This could be stated explicitly (e.g., "an increase of 57.6% +/- 19.4%") to avoid confusion. Similar descriptions of the results are found throughout the paper.

      We agree this is confusing and altered the text accordingly.

      The authors might consider citing and discussing the paper from Andrade et al (Nat Med, 2020, 26:1929-1940), which describes longer circulation times (less cytoadherence) by parasites in the dry season (asymptomatic patients) than in febrile patients in the wet season (stronger cytoadhesion of younger stages). This would seem to be consistent with the data presented here.

      We are aware of the Andrade study, but chose not to cite it in this context since the reported differences in cytoadhesion appear more consistent with PfEMP1 expression levels, as hypothesized by the authors, than with altered trafficking.

      Reviewer #2 (Recommendations for the authors):

      General comments on the text:

      (1) "Approximately 10% of the proteins encoded by P. falciparum are predicted to be exported beyond the parasite plasma membrane (PPM) into the parasitophorous vacuole lumen (PVL) and subsequently across the parasitophorous vacuole membrane (PVM) into the RBC cytosol."

      To my knowledge, it has not been really demonstrated that all exported proteins take this route (transfer step in the PVL), and how transmembrane proteins transfer from the parasite to the erythrocyte is still poorly understood. I recommend that the authors rephrase this for precision.

      We agree with this reviewer and will change the statement.

      Changes:

      We have clarified these statements to accurately reflect the current understanding of protein export. Approximately 10% of P. falciparum encoded proteins are predicted to be exported beyond the parasite plasma membrane, with many thought to pass through the parasitophorous vacuole lumen (PVL) and parasitophorous vacuole membrane (PVM) into the RBC cytosol, although the exact routes for transmembrane proteins are not fully understood.”

      (2) "Charnaud et al. 25, but not Cobb et al. 26, found HSP70x to be essential for normal PfEMP1 trafficking, although both studies concluded that HSP70x is dispensable for intraerythrocytic parasite growth at 37 {degree sign}C."

      The trafficking block in Charnaud is likely due to a delay in parasite development and cannot thus really be directly related to PfEMP1 trafficking.

      Charnaud et al., report: “Microscopy of Giemsa stained IE indicated that ΔHsp70-x appeared similar to CS2 with no obvious abnormalities (Fig 2c). To more accurately quantify changes in maturation through the cell cycle, the DNA content of parasites stained with ethidium bromide was measured by flow cytometry (Fig 2d). This indicated that most parasites had the same DNA content at each timepoint and were maturing at the same rate.”

      Thus, we cannot conclude that the trafficking phenotype reported in the Charnaud study can be attributed to a growth delay. This is also supported by only minor changes in the transcriptome, which would likely be more widely perturbed if there was a significant growth delay. However, we will change the statement “Charnaud et al., found HSP70x to be essential for normal PfEMP1 trafficking”, to ”…important for PfEMP1 trafficking” to more precisely reflect the data.

      (3) "NanoLuciferase (NanoLuc) fusion proteins and compartment-specific isolation confirmed a greater abundance of PfEMP1 in the RBC cytosol following heat stress."

      Please see my comments about the differentiation between soluble and TM-containing proteins. One would expect that PfEMP1 is membrane-integrated, and thus should not be found in the cytosol (implying a soluble form).

      See our response above.

      (4) "Importantly, heat stress did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1)."

      The authors should constrain this statement to the time frame in which the heat-shock was given. Previous publications have shown a speeded-up development only in younger-stage parasites, which the authors did not study.

      We will re-phrase.

      Changes:

      We have rephrased the sentence to clarify the time window of heat stress: ”Importantly, heat stress between 16-24 hours post-invasion did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1).” The supplementary figure title has also been updated to match.

      (5) I recommend that the authors include line numbers. This makes the reviewers' lives much easier.

      We agree and apologize for this oversight.

      We now added line numbers.

      Reviewer #3 (Recommendations for the authors):

      (1) All the experiments have been performed to a very high standard, and I have no major questions about the results. However, the paper would go up to the next level if the effect of fever temperatures on the stiffness of the iRBCs had been investigated by measuring the passage of iRBCs through an artificial spleen where a bed of metal spheres mimics interendothelial splenic slits.

      See our comment from above.

      (2) With respect to Figures 5E, 6C, and 6E, why was there not a decrease in bioluminescence levels at 39 {degree sign}C for Sap and NP40 to match the increase in EqtII?

      The assay is not performed as a sequence of permeabilisation steps. Instead, samples are split into three parallel treatments: one with EqtII, one with Saponin, and one with NP40. The protein measured in each case reflects the total released under that specific condition rather than being cumulative. Therefore, the NP40 fraction includes proteins from the Saponin-accessible compartment, the EqtII-accessible compartment, and the parasite cytosol.

      (3) In the Supplementary gene maps, I could not read the white text on the black gene boxes.

      We apologize: these have not converted well and will be altered with the revised version.

      Changes

      We have significantly increased the size of all fonts within the gene maps and improved the resolution of the figures to improve readability.

      (4) In Figure S6, why does HSP70-x look different between parts C and D IFAs, with the latter showing much more export?

      We agree these IFAs are not optimal and we will provide better images.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These figures now include multiple images of HA-tagged staining to more accurately represent the observed localisation and export patterns.

      (5) Would the authors care to comment on what kinase might be additionally phosphorylating at 39 {degree sign}C?

      We presume these are Maurer’s clefts FIKK kinases as most of the hyperphosphorylated proteins are MC residents. However, without directly testing for this using conditional KO parasite lines, we cannot exclude that host kinases are also playing a role.

      (6) Could the additional assembly of PSAC at the iRBC membrane be important for survival at 39 {degree sign}C?

      We have tested to see if nutrient uptake helps parasite survival during heat stress in the presence of furosemide and lower nutrient concentrations, but did not see a difference in growth following heat stress compared to control temperature conditions.

      New Results:

      We have added a new supplementary figure (Supplementary Figure 4) detailing experiments testing parasite growth under altered nutrient availability using two approaches (sub-lethal furosemide concentrations or reduced-nutrient RPMI) and with or without a 40°C heat stress applied between 16-24 hpi.

      The main text now references this data: “Culturing parasites in sub-lethal furosemide concentrations or in reduced nutrient media lead to reduced parasitaemia (Supplementary Figure 4). However, the parasitaemia is not further reduced following heat stress. This shows that increased PSAC levels/activity do not enhance parasite survival under conditions of limited nutrient availability either from furosemide-induced nutrient deprivation or a reduced nutrient media composition.”

      These experiments show that nutrient uptake does not improve parasite survival during heat stress compared to control temperature conditions.

      (7) Would the authors like to speculate on how higher temperatures increase the transport of exported proteins with TMDs?

      There are many possible explanations, one of which is that unfolding of the hydrophobic TMD domains is favoured at elevated temperatures. However, we have no data to support this hypothesis and therefore refrained from particularly stating this possibility.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Quantitative interactome mapping of skeletal muscle insulin resistance Ng et al present a series of proteomics/interactomics studies in skeletal muscle to identity insulin regulated complexes/interactions and changes ot these in insulin resistant muscle. More mechanistically, the Authors focus on changes in interactions involving chaperones in the ER/SR, presenting interesting data on the effect of PDIA6 overexpression alters insulin sensitivity in muscle ex vivo.

      Major Comments:

      The section entitled "Validating the regulation of PPIs with insulin resistance in C2C12 myotubes with quantitative XL-MS". This is not really a validation of th previous data as presented, but more an orthologous assay that helped pinpoint the interest in the ER. Suggest adjusting the title.

      Figure 3B - the "decrease" in AS160 pS588 regulation appears to be due to increased basal, not decreased phosphorylation in after insulin. This should be commented on or clarified.

      PDIA6 is down-regulated in muscle from people with T2D - so why did the authors decide to overexpress PDIA6? I note this rationale is explained in the discussion, and could be articulated better in the results.

      Figure 5J and K. The TA muscles are substantially larger from PDIA6 OE mice. Are the muscle fibres also larger? Tbhis relates to the normalisation of data in K. This appears to be normalised to g tissue. If so, is the difference between control, and OE mice being driven by the increase in muscle mass - with uptake per muscle or per fibre the same?

      Minor Comments:

      For the PCP-MS data form C2C12 cells. The authors use an analysis of AUC to assess protein abundance, which, as they state, is important for chronic treatments if total protein is not separately quantified. However, the analysis of changes in protein distribution is less clear from the text in the results section. Intuitively, a profile that is normalised to total intensity in all fractions would provide a protein abundance-independent read-out for changes in protein distribution. Does the "local analysis" capture this same information? Could the Authors provide a little more information here?

      Figure 1M - are the Authors sure that VPS41 should be in this panel. It doesn't seem to be insulin regulated, and the arrow appears to refer to movement between insulin sensitive and insulin resistant.

      Figure 1N - "This includes an array of TBC1 domain-containing proteins (TBC1D15, 195 TBC1D17, TBC1D8B) that are consistently reduced with IR". Do the Authors mean the abundance was less, or that complex formation was reduced?

      Optional. In general, there is a lot of text discussing the literature around proteins highlighted in the analysis. This is useful to an extent, but the Authors might consider streamlining this a little (perhaps moving some of the information ot supp tables?).

      Why do the Authors think the crosslinking MS was not able to capture acute PPI changes like the PCP-MS was?

      For the EDL crosslinking data. Are the Authors able to provide a comparison with C2C12 data - to highlight the differences and similarities between tissue and the cell model? This may be a challenge if the authors think most differences may be technical.

      Please check - "reduces free-glycerol levels essential for fatty acid synthesis". Glycerol does not directly contribute to FA synthesis. But is needed for triglyceride synthesis.

      Do the Authors think that the change in PDIA6 interactions may be a general/indirect indication of changes in ER redox and/or protein misfolding in insulin resistance?

      Is PDIA6 an ER luminal protein? If so, it being phosphorylated is interesting.

      Referees cross-commenting

      Similarly, reviewer #1 raises important points on the description of key parts of the analysis, that will need to be addressed. I think we agree that the manuscript emcpmpasses a great deal of data, and that it is somewhat difficult to follow why PDIA6 was selected for validation. Overall, the reviews pick up on different aspects of the manuscript that could be improved.

      Significance

      Overall, the strength of the paper is in the underlaying proteomics workflows and analysis. The work presented of very high technical quality, and I have no doubt the data presented will be of use to the field beyond the analysis in this current publication.

      However, a weakness is doubts over the relevance of the data on PDIA6 overexpression in muscle insulin resistance.

      This will be of interest to those in the proteomics, interactomics and metabolism fields.

      My expertise is in glucose metabolism, insulin signalling and insulin resistance.

    1. [[Aria Khodaverdi p]] on [[Martijn Aslander p]] lls. Compares it to [[Doug Engelbart Demo]] and [[Vannevar Bush As We May Think 20210304173014]] but light on examples that trigger his fascination

    1. Temporary characters must cease operation as soon as practicable and cannot be transferred to another person.

      What is the reasoning behind this? Example - with Amity changes, we have created a part-time Ambassador character that is written by one of our writers in our group, under my overall direction. Sometimes I may write for this character too. This helps the campaign region and overall direction for IC story lines. I think the wording on (d) could be improved, and I'd like to see this provision relaxed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which significantly contributed to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewer’s comments and describe the revisions that were incorporated.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We addressed this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-α levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1β prior to challenge is important, and this should be addressed in subsequent studies. We included a discussion on the potential link between elevated IL-1β levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3–6 × 10<sup>2</sup> TCID<sub>50</sub>/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We incorporated these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections was revised accordingly, and the discussion was extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we clarified our conclusions where needed and ensured that interpretations were better aligned with the data shown.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Introduction, more details on the experimental model would be appreciated. A short summary of findings obtained with this model in previous works from the authors would help to better understand the context of the study.

      Basic information on the model was added in the Introduction section of the revised manuscript.

      (2) In Figure 1, the addition of more time points on the x-axes would help the interpretation of the figures.

      We agree and have added extra time points to the x-axes.

      (3) To better understand the results in Figure 2A, a figure showing cytokine levels post-Estonia infection of only challenged pigs would help, indicating protected and non-protected animals as in Figure 2C. This figure would be better linked to the corresponding dot plot (Figure 2B).

      Our statistical analyses in Figure 2A are based on using both challenged and non-challenged pigs to assess differences between SPF and farm pigs. We prefer not to remove the non-challenged pigs in order to avoid losing statistical power. Moreover, even when non-challenged and challenged pigs are displayed in the plots, upregulation of IFN-α and IL-8 can be visualized and remains consistent with the positive and negative correlates of protection shown in Figure 2C.

      (4) Dark red colour associated with SPF non-protected is difficult to differentiate from light red in some figures.

      We thank the reviewer for this remark. To preserve the color scheme across the paper, we changed the circle data points to squares for the non-protected SPF pig in the most crowded figures: Figures 1–3 and Supplementary Figures 2 and 8.

      (5) In Supplementary figures 12-16, grouping of the animal numbers (SPF vs farm) would facilitate the interpretation of the results.

      Information on the animal numbers for each group (SPF vs. farm) has been added to the figure captions.

      (6) Are the results shown in Figure 8 based on absolute scores as mentioned? Results from 0 dpc are not shown. Is that correct?

      That is correct. BTM expression values are absolute and could not be normalized, as RNA was not isolated either immediately before the challenge or on day 0 post-challenge. This information is now clarified in the figure captions.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors use the words "predicted" and "predicts" although they haven't used any methods to show that this is true, such as a multivariate analysis. I don't think correlation coefficients are sufficient to indicate prediction. This needs to be fixed.

      We agree with this and have made changes in the text to avoid this impression.

      (2) "Lower baseline immune activation was linked to increased protective immunity." Presumably, the authors mean prior to challenge, not prior to "vaccination"?

      In this sentence written in the Abstract, we refer to baseline immune activation in the steady state, i.e., prior to any infection, as demonstrated in a previous study by Radulovic et al. (2022). The sentence was adapted accordingly. This concept is further explored in the Discussion section.

      (3) The abstract mentioned the comparison between farm and SPF pigs, but didn't provide any context for those findings. It could be added here.

      In the new version, we have added information on this model in the Introduction section.

      (4) Figure legends need N to be indicated. For example, the viral load figures don't appear to be representative of all 9 or 5 animals. Is there a reason why not all were challenged, and how were those 5 challenged selected?

      Numbers of animals in each group were added to the figure captions. We have also provided details regarding the animals sacrificed at different time points of the experiment in the ‘Animal experiment’ section of the Methods.

      (5) 1A doesn't have a legend to indicate whether dark or light color indicates sampling.

      Fair point. We have added the information to the figure.

      (6) For Figure 3C, it's not clear how the correlation is presented. The legend indicates in writing that the color indicates the outcome it correlates with, but the legend suggests that it is r.

      The method of presenting correlation data is consistent across all figures, including Figure 3C. The color reflects the direction and strength of the correlation, corresponding to the r coefficient obtained from correlating immunological parameters with clinical scores. We have clarified this description in the figure caption to improve readability.

      (7) For some of the correlation data in 2D and 3C, it would be nice to provide the plots in the supplemental. Also, are there enough data points for a robust interpretation of correlation curves?

      We agree that providing the plots will improve clarity and have included them in the supplementary material. While we acknowledge that the number of data points is modest, we believe it is sufficient to support a robust interpretation of the correlation curves. Corresponding p-value cutoffs are noted in the figure captions.

      (8) The figure 2C method of indicating significance is confusing. There must be a clearer way to present this figure.

      Analyzing statistical significance for the dataset shown in Figure 2C is challenging due to the small number of animals. We carefully considered alternative ways of presenting statistical significance, however, given the limited group sizes, we believe that the current approach provides the most transparent and informative representation of the data.

      For clarity, we divided the animals into SPF and farm groups, as well as into protected (4 SPF, 2 farm pigs) and non-protected (1 SPF, 3 farm pigs) categories, and performed both group-based (unpaired t-test) and time-based (mixed-effects analysis) comparisons. All significant differences were added to the plots so that readers could directly visualize the observed trends and compare them with the correlation analysis presented in Figure 2D.

      (9) Please note that "viremia" means the presence of a virus specifically in the blood. Other descriptions of viral load should be used if this was not measured.

      We have clarified this in the text. When referring to organs, we use the term “viral loads.”

      (10) The way of putting a square around boxes that are significant can be misleading when a box is surrounded by other significant comparisons. Like for Figure 6B - probably all of these are really significant, but I can't tell for sure.

      Good point. We changed rectangles to circles for better readability of the figures.

      (11) There is a potential argument that these correlates of protection might only be valid for this specific vaccine. It should be noted that comparisons of multiple vaccines would be needed before assuming the correlates are broadly relevant.

      We agree with this statement and address it in the Discussion section.

      (12) For the circled pathways in Figure 9, it is not clear from the diagram if there is a directionality to the involvement of those pathways. Modulated or induced?

      When discussing pathways identified by transcriptome analysis, we are always referring to their induction, as this is based on the normalized enrichment score (NES). We have now specified this in the figure caption.

      (13) The authors speculate about NK cells, but this is based on transcriptional pathways identified and the literature. Is there any indication from the flow cytometry data whether activated NK cells versus NKT cells are associated with protection? Also, the memory phenotype of those cells?

      Regarding NK cells, the BTM analysis was corroborated by the flow cytometry data shown in Supplementary Figure 8. NK cells were defined as CD3<sup>-</sup>CD8α<sup>+</sup>. Specific markers to distinguish NKT cells or to assess memory phenotypes were not included in our panel.

      (14) In the discussion, "Our study demonstrates that T cell activation represents a robust correlate of protection against ASFV" doesn't indicate whether they mean after vaccination or after challenge. Re-using the same time points throughout the manuscript compounds this confusion.

      In this case, we mean that T cell activation upon immunization/vaccination and challenge correlates with protection. This information has been added to the sentence. Although some time points overlap between the immunization and challenge phases, we consistently use “dpi” and “dpc” to clearly distinguish them.

      (15) Flow cytometry gating strategies should be provided in the supplemental, particularly since this species is less frequently studied using flow cytometry; it would be helpful to understand gating and expression levels of key markers.

      We have provided the gating strategy in Supplementary Figure 7, which is also referenced in the “Flow cytometry and hematology analysis” section of the Methods.

      (16) Some of the discussion is a bit long and repetitive - e.g. the parts on antibodies and the last paragraph with multiple other parts of the discussion and manuscript.

      While we agree that some sections are extensive, we think that this level of detail is necessary to integrate the different datasets and to place our findings in the context of previous literature.

    1. Author response:

      eLife Assessment

      This study uses a Bayesian framework to characterize latent brain state dynamics associated with memory encoding and performance in children, as measured with functional magnetic resonance imaging. The novelty of the approach offers valuable insights into memory-related brain activity, but the consideration of developmental changes in memory and brain dynamics, and the evidence to support the proposed mapping between specific states and distinct aspects of memory, are incomplete. This work will be of interest to researchers interested in cognitive neuroscience and the development of memory.

      We are grateful to the editor and reviewers for their positive feedback and constructive evaluation. Their comments have identified important areas where the manuscript can be strengthened. Below, we outline our planned revisions.

      Reviewer #1 (Public review):

      Zeng et al. characterized the dynamic brain states that emerged during episodic encoding and the reactivation of these states during the offline rest period in children aged 8-13. In the study, participants encoded scene images during fMRI and later performed a memory recognition test. The authors adopted the BSDS approach and identified four states during encoding, including an "active-encoding" state. The occupancy rate of, and the state transition rates towards, this active-encoding state positively predicted memory accuracy across participants. The authors then decoded the brain states during pre- and post-encoding rests with the model trained on the encoding data to examine state reactivation. They found that the state temporal profile and transition structure shifted from encoding to post-encoding rest. They also showed that the mean lifetime and stability (measured with self-transition probability) of the "default-mode" state during post-encoding rest predict memory performance. How brain dynamics during encoding and offline rest support long-term memory remains understudied, particularly in children. Thus, this study addresses an important question in the field. The authors implemented an advanced computational framework to identify latent brain states during encoding and carefully characterized their spatiotemporal features. The study also showed evidence for the behavioral relevance of these states, providing valuable insights into the link between state dynamics and successful encoding and consolidation.

      We thank Reviewer #1 for the positive feedback on our study. And we would like to thank you for the reviewer's constructive feedback. We plan to incorporate detailed methodological justifications and a thorough limitation analysis. We also plan to enhance the overall logical coherence of the manuscript, ensuring a more robust and scientifically sound presentation.

      Weaknesses:

      (1) If applicable, please provide information on the decoding performance of states during pre- and post-encoding rests. The Methods noted that the authors applied a threshold of 0.1 z-scored likelihood, and based on Figure S2, it seems like most TRs were assigned a reinstated state during post-encoding rest. It would be useful to know, for the decodable TRs, how strong the evidence was in favor of one state over others. Further, was decoding performance better during post- vs. pre- encoding rest? This is critical for establishing that these states were indeed "reinstated" during rest. The authors showed individual-specific correlations between encoding and post-encoding state distribution, which is an important validation of the method, but this result alone is not sufficient to suggest that the states during encoding were the ones that occurred during rest. The authors found that the state dynamics vary substantially between encoding and rest, and it would be helpful to clarify whether these differences might be related to decoding performance. I am also curious whether, if the authors apply the BSDS approach to independently identify brain states during rest periods (instead of using the trained model from encoding), they find similar states during rest as those that emerged during encoding?

      We plan three additional analyses to strengthen the evidence for state reinstatement during rest: First, we will report quantitative decoding confidence metrics for each decoded time point, including the log-likelihood between the winning state and the next-best state. We will compare these distributions between pre- and post-encoding rest to test whether decoding quality differs between conditions, as the reviewer suggests. Second, we will provide a more detailed characterization of the decoding process, including the proportion of TRs that survive the log-likelihood threshold of 0.1 during pre- vs. post-encoding rest and whether this proportion relates to memory performance. Third, we will train an independent BSDS model directly on the rest data (rather than using the encoding-trained model) and assess the degree of correspondence between the independently discovered rest states and the encoding states in terms of amplitude profiles and covariance structures. Convergence between the two approaches would provide strong validation that the encoding-defined states genuinely re-emerge at rest. Together with our evidence from our previous analyses, these additional analyses will strengthen our claims.

      (2) During post-encoding rest, the intermediate activation state (S1) became the dominant state. Overall, the paper did not focus too much on this state. For example, when examining the relationship between state transitions and memory performance, the authors also did not include this state as a part of the analyses presented in the paper (lines 203-211). Could the author report more information about this state and/or discuss how this state might be relevant to memory formation and consolidation?

      We thank the reviewer for this suggestion. During encoding, S1 had the lowest occupancy (~10%) and showed no significant relationship with memory performance, which led us to interpret it as a non-essential transient configuration. In the revision, we will provide a more thorough characterization of S1, and conduct correlation analyses to probe whether its dynamic properties during post-encoding rest correlate with individual memory performance.

      (3) Two outcome measures from the BSDS model were the occupancy rate and the mean lifetime. The authors found a significant association with behavior and occupancy rate in some analyses, and mean lifetime in others. The paper would benefit from a stronger theoretical framing explaining how and why these two different measures provide distinct information about the brain dynamics, which will help clarify the interpretation of results when association with behavior was specific to one measure.

      We thank the reviewer for this suggestion. Occupancy rate and mean lifetime, while related, capture fundamentally different aspects of brain state dynamics. Occupancy rate reflects the total proportion of time the brain spends in a given state, capturing the overall prevalence of that configuration across the scanning session. Mean lifetime, by contrast, measures the average uninterrupted duration of each state visit, indexing the temporal stability or persistence of a given network configuration once it is entered. Critically, two states could have identical occupancy rates but very different mean lifetimes, a state visited frequently but briefly versus one visited rarely but sustained, implying distinct underlying neural dynamics. In the context of memory, high occupancy of the active-encoding state may reflect repeated engagement of encoding-optimal circuits, while long mean lifetime of the default-mode state during rest may reflect sustained consolidation-related processing. We will expand the theoretical framework in the revised manuscript to articulate these distinctions and connect them to extant findings suggesting that temporal stability versus frequency of state visits may have dissociable behavioral correlates in working memory and episodic memory (He et al., 2023; Stevner et al., 2019).

      (4) For performance on a memory recognition test, d' is a more common metric in the literature as it isolates the memory signal for the old items from response bias. According to Methods (line 451), the authors have computed a different metric as their primary behavioral measure (hits + correction rejections - misses - false alarms). Please provide a rationale for choosing this measure instead. Have the authors considered computing d' as well and examining brain-behavior relationships using d'?

      Our primary memory recognition metric computed as (hits + correct rejections − misses − false alarms) / total trials, provides an unbiased linear estimate of discrimination ability that is mathematically consistent with d' in directional effects. We selected this measure because it is particularly robust with limited trial counts per condition (Verde et al., 2006; Wickens, 2001). Nonetheless, we agree that reporting d' is important for comparability with the broader literature. In the revision, we will compute d' for each participant and conduct parallel brain–behavior correlation analyses to demonstrate that our findings are robust across both metrics.

      (5) While this study examined brain state dynamics in children, there was no adult sample to compare with. Therefore, it is hard to conclude whether the findings are specific to children (or developing brains). It would be helpful to discuss this point in the paper.

      We thank the reviewer for raising this point. While several studies have documented memory-related replay and reinstatement in adults at both the regional and systems levels(Tambini et al., 2017; Wimmer et al., 2020), few have examined whether analogous state-level reinstatement occurs in children. Our study was motivated by this gap: we sought to test whether children show dynamic brain state reinstatement mechanisms similar to those described in adults. However, we acknowledge that without a direct adult comparison, we cannot determine whether the observed patterns are unique to children or reflect general principles of episodic memory organization. In the revised manuscript, we will: (a) frame the study more carefully as examining whether established state-level consolidation mechanisms also operate during childhood, (b) discuss findings in relation to adult studies, and (c) include exploratory analyses of age-related variability in both memory performance and BSDS dynamics within our sample, while acknowledging that the narrow age range (8–13) and small sample size limit the power of such developmental analyses. We will clearly identify the absence of an adult comparison as a limitation.

      Reviewer #2 (Public review):

      This paper investigates the latent dynamic brain states that emerge during memory encoding and predict later memory performance in children (N = 24, ages: 8 -13 years). A novel computational approach (Bayesian Switching Dynamic Systems, BSDS) discovers latent brain states from fMRI data in an unsupervised and parameter-free manner that is agnostic to external stimuli, resulting in 4 states: an active-encoding state, a default-mode state, an inactive state, and an intermediate state. The key finding is that the percentage of time occupied in the active-encoding state (characterized by greater activity in hippocampal, visual, and frontoparietal regions), as well as greater transitions to this state, predicts memory accuracy. Memory accuracy was also predicted by the mean lifetime and transitions to the default-mode state (characterized by greater activity in medial prefrontal cortex and posterior cingulate cortex) during post-encoding rest. Together, the results provide insights into dynamic interactions between brain regions that may be optimal for encoding novel information and consolidating memories for long-term retention.

      We thank Reviewer #2 for recognizing the novelty and broader utility of our methodology and for noting that the manuscript is well-written and concise.

      Weaknesses:

      (1) The study focuses on middle childhood, but there is a lack of engagement in the Introduction or Discussion about what is known about memory development and the brain during this period. Many of the brain regions examined in this study, particularly frontoparietal regions, undergo developmental changes that could influence their involvement in memory encoding and consolidation. The paper would be strengthened by more directly linking the findings to what is already known about episodic memory development and the brain.

      We thank the reviewer for this suggestion. In response, we will substantially expand the Introduction and Discussion to situate our findings within the developmental cognitive neuroscience literature on episodic memory. In particular, we will address the protracted developmental trajectory of frontoparietal regions, the well-documented maturation of hippocampal–cortical connectivity during middle childhood, and how these developmental changes may influence the brain state configurations we observed (He et al., 2023; Ryali et al., 2016). This will provide the necessary developmental context for interpreting our state dynamics results.

      (2) A more thorough overview of the BSDS algorithm is needed, since this is likely a novel method for most readers. Although many of the nitty-gritty details can be referenced in prior work, it was unclear from the main text if the BSDS algorithm discovered latent states based on activation patterns, functional connectivity, or both. Figure 1F is not very informative (and is missing labels).

      We thank the reviewer for this suggestion. We agree that a more accessible overview of the BSDS algorithm (Lee et al., 2025; Taghia et al., 2018) is needed. In the revision, we will expand the Methods and provide a concise algorithmic overview in the main text that clarifies the following key points: (a) BSDS operates on multivariate time series from the ROIs and infers latent brain states defined jointly by their mean activation patterns (amplitude vectors) and inter-regional covariance matrices (functional connectivity); (b) it employs a hidden Markov model framework with Bayesian inference and automatic relevance determination to identify the number of states without manual specification; and (c) state assignments are made at each TR, yielding a temporal sequence that enables computation of occupancy rates, mean lifetimes, and transition probabilities. We will also revise Figure 1F to include appropriate labels and a clearer schematic of the model's inputs, latent structure, and outputs.

      (3) A further confusion about the BSDS algorithm was whether it necessarily had to work on the rest data. Figure 4A suggests that each TR was assigned one of the four states based on the maximum win from the log-likelihood estimation. Without more details about how this algorithm was applied to the rest data, it is difficult to evaluate the claim on page 14 about the spontaneous emergence of the states at rest.

      The key methodological point is that the BSDS model, once trained on encoding data, can be applied to new (rest) time series via log-likelihood estimation: for each TR during rest, the model computes the log-likelihood of each state given the observed multivariate signal, and the state with the maximum log-likelihood is assigned to that TR. This "decoding" approach tests whether the spatial configurations learned during encoding are present during rest, rather than fitting new states de novo. We applied a threshold to the log-likelihood values to exclude TRs where the evidence for any single state was weak, thus controlling for potential misassignment. We will substantially clarify this process in the revised Methods and main text, and as described in our response to Reviewer #1 point 1, we will also conduct additional analyses to address the concerns raised.

      (4) Although the BSDS algorithm was validated in prior simulations and task-based fMRI using sustained block designs in adults, it is unclear whether it is appropriate for the kind of event-related design used in the current study. Figure 1G shows very rapid state changes, which is quantified in the low mean lifetime of the states (between 1-3 TRs on average) in Figure 4C. On the one hand, it is a strength of the algorithm that it is not necessarily tied to external stimuli. On the other hand, it would be helpful to see simulations validating that rapid transitions between states in fMRI data are meaningful and not due to noise.

      This is an important methodological question. The rapid state changes observed in our event-related design (mean lifetimes of 1–3 TRs) differ from the longer state durations typically observed with block designs(He et al., 2023; Zeng et al., 2024), where sustained cognitive demands stabilize brain configurations. We believe these rapid transitions are consistent with the inherent dynamics of event-related encoding, where each trial involves rapid shifts between sensory processing, memory binding, and attentional engagement. Several considerations support the meaningfulness of these transitions: (a) the identified states have interpretable amplitude profiles consistent with well-established memory-related brain systems; (b) state dynamics show statistically significant, directionally consistent correlations with subsequent memory performance; and (c) the transition structure during encoding is distinct from that observed during rest, indicating sensitivity to task demands. Nonetheless, we acknowledge the concern about noise and will conduct additional analyses in the revision to address the concerns raised.

      (5) The Methods section mentions that participants actively imagined themselves within the encoded scenes and were instructed to memorize the images for a later test during the post-encoding rest scan. This detail needs to be included in the main text and incorporated into the interpretation of the findings, as there are likely mechanistic differences between spontaneous memory replay/reinstatement vs. active rehearsal.

      We thank the reviewer for this suggestion. We will include these experimental details in the main text and incorporate it into the interpretation of our findings in the context of spontaneous memory replay/reinstatement vs. active rehearsal (Liu et al., 2019; Wimmer et al., 2020).

      (6) Information about the general linear model used to discover the 16 ROIs that showed a subsequent memory effect are missing, such as: covariates in the model (motion, etc.), group analysis approach (parametric or nonparametric), whether and how multiple-comparisons correction was performed, if clusters were overlapping at all or distinct, if the total number of clusters was 16 or if this was only a subset of regions that showed the effect.

      We apologize for the missing methodological details. In the revised manuscript, we will provide complete information on the general linear model used to identify the 16 ROIs, including: the event regressors and parametric modulators included in the model, nuisance covariates (motion parameters, white matter and CSF regressors), the group-level analysis approach and statistical thresholding, the method for multiple-comparisons correction, whether the 16 ROIs represent all significant clusters or a subset, and whether any clusters were spatially overlapping. We will also clarify how peak voxels were selected for ROI definition.

      Reviewer #3 (Public review):

      This paper uses a novel method to look at how stable brain states and the transitions between them promote memory formation during encoding and post-encoding rest in children. I think the paper has some weaknesses (detailed below) that mean that the authors fall short of achieving their aims. Although the paper has an interesting methodological approach, the authors need better logic, and are potentially "double dipping" in their results - meaning their logic is circular. I think the method that they are using could be useful to the broader neuroimaging community, although they need to make this argument clearer in the paper.

      We thank Reviewer #3 for recognizing the novelty of our approach and its potential utility for the broader neuroimaging community.

      (1) The authors use children as their study subjects but fail to reconcile why children are used, if the same phenomena are expected to be seen in adults (or only children), and if and how their findings change with age across an age range that ranges from middle childhood into early adolescence. They need to include more consideration for the development of their subject population. The authors should make it clear why and how memory was tested in children and not adults. Are adults and children expected to encode and consolidate in a similar manner to children? Do the findings here also apply to adults? How was the age range of 8-13-year-old children selected? Why didn't the authors look at change with age? Does memory performance change with age? Do the BSDS dynamics change with age in the authors' sample?

      Our study was motivated by the observation that while adult studies have documented memory replay and reinstatement, very little is known about whether these dynamic state-level mechanisms operate during middle childhood, a period characterized by substantial improvements in episodic memory ability and ongoing maturation of frontoparietal and hippocampal–cortical circuits. The age range of 8–13 was defined a priori based on typical developmental classifications of middle childhood through early adolescence, representing a period when episodic memory abilities are developing rapidly.

      In response to the reviewer's specific questions: (a) we will conduct exploratory analyses testing whether memory accuracy, BSDS state dynamics (occupancy, mean lifetime, transitions), and brain–behavior correlations vary as a function of age within our sample; (b) we will clearly discuss whether adults are expected to show similar patterns, drawing on the extant adult literature; and (c) we will acknowledge as a limitation that our sample size (N = 24) and narrow age range provide limited statistical power for detecting continuous age-related changes, and that a dedicated cross-sectional or longitudinal developmental design would be needed to draw firm conclusions about developmental trajectories. Please also see responses to Reviewer #1 point 5 and Reviewer #2 point 1.

      (2) The authors look for brain state dynamics within a preselected set of ROIs that are selected because they display a subsequent memory effect. This is problematic because the state that is most associated with subsequent memory (S3, or State 3) is also the one that shows most activity in these regions (that have already been a priori selected due to displaying a subsequent memory effect). This logic is circular. It would be helpful if they could look at brain state dynamics in a more ROI agnostic whole brain approach so that we can learn something beyond what a subsequent memory analysis tells us. I think the authors are "double dipping" in that they selected regions for further analysis based on a subsequent memory association (remembered > forgotten contrast) and then found states within those regions showing a subsequent memory effect to further analyze for being associated with subsequent memory. Would it be possible instead to do a whole-brain analysis (something a bit more agnostic to findings) using the BSDS framework, and then, from a whole-brain perspective, look for particular brain states associated with subsequent memory? As it stands, it looks like S3 (state 3) has greater overall activation in all brain regions associated with subsequent memory, so it makes sense that this brain state is also most associated with subsequent memory. The BSDS analysis is therefore not adding anything new beyond what the authors find with the simple subsequent memory contrast that they show in Figure 1C. This particularly effects the following findings: (a) active-encoding state occupancy rate correlated positively with memory accuracy, (b) transitions to the active-encoding state were beneficial / Conversely, transitions toward the inactive state (S4) were detrimental, with incoming transitions showing negative correlations with memory accuracy / The active-encoding state serves as a "hub" configuration that facilitates memory formation, while pathways leading to this state enhance performance and transitions away from it impair encoding.

      We appreciate this critique, which raises an important concern about analytical circularity.

      a) Why BSDS adds information beyond the static subsequent memory contrast. The reviewer notes that S3 (the active-encoding state) shows high activation in the same regions selected by the subsequent memory contrast, and therefore questions whether BSDS provides new information. We respectfully argue that BSDS captures dimensions of neural organization that a static contrast cannot. Specifically: (a) the subsequent memory contrast identifies which regions are differentially active for remembered vs. forgotten items, averaged across the entire encoding session, it provides no temporal information about when or for how long these regions are co-active; (b) BSDS reveals the moment-to-moment temporal evolution of brain states, including the duration and stability of each configuration (mean lifetime), which independently predicts behavior; (c) BSDS uniquely captures transition dynamics, the rates and patterns of switching between states, which we show are predictive of memory in ways not derivable from the contrast map (e.g., transitions from S2→S3 positively predict memory, transitions toward S4 negatively predict memory); and (d) BSDS characterizes the full covariance structure among regions within each state, revealing distinct connectivity patterns (e.g., the high clustering coefficient and global efficiency of S3), which are not captured by univariate activation contrasts. Thus, while the ROI selection is informed by the subsequent memory effect, the information BSDS extracts from those regions, temporal dynamics, transition patterns, and multivariate covariance, is orthogonal to the information used for selection.

      b) Additional validation. To directly address the circularity concern empirically, we will conduct additional analysis using ROIs from previous studies (e.g. network templates) / meta-analyses/Neurosynth ROIs (He et al., 2023; Meer et al., 2020; Taghia et al., 2018), without resorting to selection based on the subsequent memory contrast.

      (3) The task used to test memory in children seems strange. Why should children remember arbitrary scenes? How this was chosen for encoding needs to be made clear. There needs to be more description of the memory task and why it was chosen. Why was scene encoding chosen? What does scene encoding have to do with the stated goal of (a) "Understanding how children's brains form lasting memories", (b) "optimizing education" and (c) "identifying learning disabilities"? What was the design of the recognition memory test? How many novel scenes were included in the test, and how were they chosen? How close were the "new" images to previously seen "old" images? Was this varied parametrically (i.e., was the similarity between new and old images assessed and quantified?)

      Scene encoding was chosen for several reasons: (a) scenes are rich, complex stimuli that engage the hippocampal–parahippocampal memory system, eliciting robust subsequent memory effects suitable for BSDS modeling; (b) scene encoding recruits distributed networks spanning visual cortex, MTL, and frontoparietal regions, enabling detection of multi-region brain states; and (c) scene encoding paradigms have been widely used in both adult and developmental studies of episodic memory and replay(Tambini et al., 2017; Tompary et al., 2017), facilitating comparison with prior work.

      Regarding the recognition test: participants viewed 200 images (100 old, 100 new), with novel scenes drawn from the same categories (buildings and natural scenes) but chosen to be perceptually distinct from studied images. Similarity between old and new images was not parametrically manipulated or quantified: we will note this limitation. We will also expand the main text to include full task details and have deleted claims about implications for educational optimization and learning disability identification (see also Reviewer #3 point 7).

      (4) They ultimately found four brain states during encoding. It would be helpful if they could make the logic and foundation for arriving at this number clear.

      The number of brain states is not predetermined by the user but is automatically determined by the BSDS algorithm through Bayesian automatic relevance determination (ARD). The model is initialized with a maximum number of possible states, and during inference, states that contribute minimally to explaining the data are effectively pruned, their associated parameters are driven to near-zero by the ARD prior. In our data, the model converged on four states. This is a key advantage of BSDS over conventional HMM approaches, which require the user to specify the state number a priori. We will clarify this process in the revised Methods and Results, referencing the original BSDS methodology paper (Taghia et al., 2018) for full mathematical details.

      (5) There is already extant work on whether brain states during post-encoding rest predict memory outcomes. This work needs to be cited and referred to. The present manuscript needs to be better situated within prior work. The authors should look at the work by Alexa Tompary and Lila Davachi. They have already addressed many of the questions that the authors seek to answer. The authors should read their papers (and the papers they cite and that cite them) and then situate their work within the prior literature.

      We agree that the manuscript must be better situated within the existing literature on post-encoding rest and memory consolidation. We will revise the Introduction and Discussion to further discuss with the foundational work in adults by Tompary & Davachi (2017, Neuron; 2024, eLife) on consolidation-related hippocampal–mPFC representational overlap, as well as Tambini & Davachi (2013, PNAS; 2019, Trends in Cognitive Sciences) on hippocampal persistence during post-encoding rest and awake reactivation(Tambini et al., 2019; Tambini et al., 2017; Tompary et al., 2017). We will explicitly discuss how our BSDS-based approach to state-level reinstatement complements and extends these earlier findings, which largely focused on region-specific pattern similarity or hippocampal–cortical connectivity, by characterizing reinstatement at the level of dynamic, whole-network configurations.

      (6) The authors should back up the claim that "successful episodic memory formation critically depends on the temporal coordination between these systems. Brain regions must coordinate their activity through dynamic functional interactions, rapidly reconfiguring their activity and connectivity patterns in response to changing cognitive demands and stimulus characteristics." Do they have any specific evidence supporting this claim?

      The claim that episodic memory depends on temporal coordination and dynamic functional interactions is supported by several lines of evidence: (a) within our study, the significant correlations between state transition rates and memory performance directly demonstrate that dynamic inter-state communication predicts memory outcomes; (b) studies showing that hippocampal–prefrontal theta coherence during encoding predicts subsequent memory (e.g., Zielinski et al., 2020)(Zielinski et al., 2020); and (c) recent work demonstrating that rapid reconfiguration of large-scale brain networks supports cognitive functions including working memory (Shine et al., 2018; Braun et al., 2015)(Braun et al., 2015; Shine et al., 2018) and episodic encoding (Phan et al., 2024)(Phan et al., 2024) We will revise this passage to include specific citations and to make clear that our own transition–behavior correlations constitute direct evidence for this claim.

      (7) These claims seem overstated: "this work has broad implications for understanding memory function in children, for developing educational interventions that enhance memory formation, and enabling early identification of children at risk for learning disabilities." Can the authors add citations that would support these claims, or if not, remove them?

      We thank the reviewer for raising this point. We agree that the current framing overstates the practical implications. We have now removed these claims and remark on future studies that are needed here.

      References

      (1) Braun, U., Schafer, A., Walter, H., Erk, S., Romanczuk-Seiferth, N., Haddad, L., . . . Bassett, D. S. (2015). Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc Natl Acad Sci U S A, 112(37), 11678-11683.

      (2) He, Y., Liang, X., Chen, M., Tian, T., Zeng, Y., Liu, J., . . . Qin, S. (2023). Development of brain-state dynamics involved in working memory. Cerebral Cortex.

      (3) Lee, B., Young, C. B., Cai, W., Yuan, R., Ryman, S., Kim, J., . . . Menon, V. (2025). Dopaminergic modulation and dosage effects on brain state dynamics and working memory component processes in Parkinson’s disease. Nature Communications, 16(1), 2433.

      (4) Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e614.

      (5) Meer, J. N. v. d., Breakspear, M., Chang, L. J., Sonkusare, S., & Cocchi, L. (2020). Movie viewing elicits rich and reliable brain state dynamics. Nature Communications, 11(1), 5004.

      (6) Phan, A. T., Xie, W., Chapeton, J. I., Inati, S. K., & Zaghloul, K. A. (2024). Dynamic patterns of functional connectivity in the human brain underlie individual memory formation. Nature Communications, 15(1), 8969.

      (7) Ryali, S., Supekar, K., Chen, T., Kochalka, J., Cai, W., Nicholas, J., . . . Menon, V. (2016). Temporal Dynamics and Developmental Maturation of Salience, Default and Central-Executive Network Interactions Revealed by Variational Bayes Hidden Markov Modeling. PLoS Comput Biol, 12(12), e1005138.

      (8) Shine, J. M., & Poldrack, R. A. (2018). Principles of dynamic network reconfiguration across diverse brain states. Neuroimage, 180, 396-405.

      (9) Stevner, A. B. A., Vidaurre, D., Cabral, J., Rapuano, K., Nielsen, S. F. V., Tagliazucchi, E., . . . Kringelbach, M. L. (2019). Discovery of key whole-brain transitions and dynamics during human wakefulness and non-REM sleep. Nature Communications, 10(1), 1035.

      (10) Taghia, J., Cai, W., Ryali, S., Kochalka, J., Nicholas, J., Chen, T., & Menon, V. (2018). Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nature Communications, 9(1), 2505.

      (11) Tambini, A., & Davachi, L. (2019). Awake Reactivation of Prior Experiences Consolidates Memories and Biases Cognition. Trends in Cognitive Sciences, 23(10), 876-890.

      (12) Tambini, A., Rimmele, U., Phelps, E. A., & Davachi, L. (2017). Emotional brain states carry over and enhance future memory formation. Nature Neuroscience, 20(2), 271-278.

      (13) Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e225.

      (14) Verde, M. F., Macmillan, N. A., & Rotello, C. M. (2006). Measures of sensitivity based on a single hit rate and false alarm rate: The accuracy, precision, and robustness of′, A z, and A’. Perception & psychophysics, 68(4), 643-654.

      (15) Wickens, T. D. (2001). Elementary signal detection theory: Oxford university press.

      (16) Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J., & Dolan, R. J. (2020). Episodic memory retrieval success is associated with rapid replay of episode content. Nature Neuroscience, 23(8), 1025-1033.

      (17) Zeng, Y., Xiong, B., Gao, H., Liu, C., Chen, C., Wu, J., & Qin, S. (2024). Cortisol awakening response prompts dynamic reconfiguration of brain networks in emotional and executive functioning. Proceedings of the National Academy of Sciences, 121(52), e2405850121.

      (18) Zielinski, M. C., Tang, W., & Jadhav, S. P. (2020). The role of replay and theta sequences in mediating hippocampal-prefrontal interactions for memory and cognition. Hippocampus, 30(1), 60-72.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements

      Our study identifies characteristics of secretory signal peptides in fungi, and how their sequence determines which alternative pathways that proteins take to the endoplasmic reticulum. All 3 reviewers grasp this, and agree that the study is publishable. Reviewer 3 puts it well, that we "convincingly show that the length of the hydrophobic helix in a signal peptide is the main factor distinguishing [...] pathways. This simplifies a previous model [...] provides a modest but important advancement to the field of protein secretion. ... The study extends its computational analysis beyond the model yeast Saccharomyces cerevisiae to a diverse range of fungal species."

      Thank you to all the reviewers: we found the reviews fair and constructive. and have addressed them in full.

      In the process of responding to reviews, we softened the claim in the title to "Protein secretion routes in fungi are predicted by the length of the hydrophobic helix in the signal sequence". We also reorganised the manuscript to put the cross-fungal analysis first, followed by the more detailed mechanistic analysis. We feel that this leads a broader audience through the story more effectively. This reorganisation also moved some material from introduction to discussion. Also on larger-scale changes, we reformatted the materials and methods section as requested.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript the authors analyze characteristics of secretory signal peptides in fungi. They identify length of the hydrophobic core rather than overall hydrophobicity as the parameter that determines whether proteins use SRP-dependent cotranslational import through the Sec61 channel, or SRP-independent posttranslational translocation through the hetero-heptameric Sec complex to enter the ER.

      Major comments

      1. The authors need to adequately use the existing nomenclature in the field:

        There is no 'Sec63 translocon'. Proteins with more hydrophobic signal sequences are targeted to the ER by SRP and its receptor, and these proteins are translocated cotranslationally by the Sec61 channel (aka the translocon). Proteins with less hydrophobic signal sequences are imported into the ER postranslationally by the Sec complex consisting of the Sec61 channel and hetero-tetrameric Sec63 complex (Sec62, Sec63, Sec71, Sec72).

        Sec63 on its own also contributes to co-translational import (Brodsky et al, PNAS, 1995), so the term 'Sec63 translocon' is really confusing and should be replaced by the standard nomenclature as above throughout the paper.

      We sincerely appreciate the advice in correctly navigating terminology in the secretion and translocation field. We now say "Sec complex", and not the incorrect "Sec63 translocon". In the same spirit, we have replaced the terminology "Sec63-dependent" with "Sec-dependent", which is a more accurate description of the overall role of the Sec complex. For example, Ast et al. primarily assayed dependence on the Sec complex using sec72∆ strains.

      The paper should contain a proper methods section.

      We have reformatted the manuscript with a separate materials and methods section in the main manuscript, per Genetics/G3 journal family guidelines.

      The authors should explain more explicitly the differences of the Phobius and DeepTMHMM algorithms. Why was that particular algorithm chosen for comparison to Phobius?

      We initially focused on algorithms that distinguish SPs and TM sequences in a single tool, which both Phobius and DeepTMHMM do. This differs from other algorithms such as the SignalP family, that do not also predict TM sequences - SignalP version 4.0 onwards was indeed trained to exclude TM sequences from their predictions (PMID: 21959131).

      In response to this and the similar comment from reviewer 2, we expanded our analysis to compare with the SignalP6.0 algorithm as well as DeepTMHMM.

      Minor comments

      • p2, para 2: ER protein import has been studied for 50 years, and its complexity been obvious for well over a decade

      We corrected this to "However, detailed functional investigations of secretion mechanisms in eukaryotes have focused on a handful of model yeasts and mammalian cells, revealing unexpected complexity"

      • p2, para 3: ref for the signal sequence should be one of the original Blobel papers instead of [8]

      We added the citation to Blobel and Sabatini, 1971, and kept the 1979 citation as we find the additional context is helpful to readers.

      • p3, para 1: ref for SRP should be Walter, Ibrahimi, & Blobel, JCB 1981, instead of [11]

      We added the original citation, and again kept the more modern citation that summarizes the field in decades following initial discovery.

      • p3, para 1: NB: SRP and its receptor do NOT translocate anything, they TARGET proteins to the ER

      We have corrected this, thank you.

      Reviewer #1 (Significance (Required)):

      The authors report an interesting observation which is of interest to the field and sufficiently well documented in this manuscript to be convincing. The paper does extend our understanding of the critical characteristics of secretory signal peptides.

      A limitation of all signal peptide prediction by current algorithms is that they are trained on 'standard' signal peptides and tend to miss ones that do not sufficiently conform to the standard parameters.

      Thank you for this point, the "standard/non-standard" conceptualization is helpful and we now mention this in our expanded discussion. We agree that testing the limits of these models would involve experimental screening of non-standard or non-natural sequences.

      Reviewer's expertise: SRP and Sec61 channel structure/function analysis, cell-free assays for ER protein import, yeast genetics

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Review of manuscript of Sones-Dykes et al. entitled: 'Protein secretion routes in fungi are mostly determined by the length of the hydrophobic helix in the signal peptide'

      This manuscript deals with the important question of how different fungi exhibit variety in protein targeting to the secretory pathway mostly using bioinformatic sequence analysis. This is important for understanding the evolution of the diverse targeting routes within the early secretory pathway, but also for biotechnology since diverse fungi are used as "biofactories" in biotechnological production of secreted proteins. While the results of the current study mostly confirm the analyses already carried out in S.cerevisiae, the work is important and warrants publication in a suitable journal.

      We appreciate this positive and balanced appraisal.

      Major points:

      1. Could the authors elaborate what was the motivation to use Phobius and not some other signal peptide predictor? I am wondering because of the cited Ast et al. paper is already several years old and new improved prediction tools such as the latest SignalP iteration have been developed since that study.

      The main motivation to use Phobius, and check with DeepTMHMM, was that these tools simultaneously predict cleaved signal peptides and transmembrane helices, unlike other tools that predict only cleaved signal peptides and can give false positives with N-terminal transmembrane helices.

      To clarify this point, we also emailed Prof. Henrik Nielsen, the lead developer of SignalP. I asked: "Although we mostly used Phobius prediction and also compared to DeepTMHMM, reviewers have asked us to also compare to SignalP. A critical part of our argument is about predictions of the h-region length, so we would like to compare h-region lengths to SignalP4.1 HMM mode in addition to SignalP6.0."

      Prof. Nielsen replied:

      As for your question, I must tell you that SignalP 4.1 does not have an HMM mode at all. The last SignalP version to have an HMM mode was 3.0. Therefore, 4.0, 4.1, and 5.0 do not output signal peptide regions; this was first reintroduced with version 6.0. See also the FAQ tab at the website.

      *You could try to install version 3.0, but for your purpose, I would not recommend it. The old HMM module had a strong preference for certain h-region lengths because of a specific kind of overtraining. This was, at least partially, solved in Phobius through regularization of the length distribution. Since h-region length is a crucial parameter in your analysis, I would not trust the region assignments by SignalP 3.0. You are welcome to cite me for that to the reviewers, if needed. *

      But comparing the region assignments between Phobius and SignalP 6.0 will be interesting.**

      Regarding SignalP3.0, we now cite Liaci et al., who analysed all experimentally verified eukaryotic signal peptides using SignalP 3.0, and Xue et al., who analysed S. cerevisiae signal peptides, and both arrived at similar conclusions that cleaved signal peptides have hydrophobic regions of length 8-14 amino acids.

      Also, we have expanded our analysis to also compare Phobius and SignalP6.0 predictions of entire signal peptides and of h-regions. The comparisons are now in Figures 4, S3, and S4.

      I am slightly puzzled by the analysis of the annotation of the Sec63- and SRP-dependent targeting sequences presented in Fig. 1. Could the "SRP-dependent" sequences with long hydrophobic sequences simply be called transmembrane helices? Based on structure of the SPC, it has been proposed that cleavable signal peptides with h-regions beyond 18 residues are extremely rare so I would imagine that majority of these sequences are longer transmembrane segments.

      The point of this figure is to compare lists of proteins that are experimentally verified to be Sec-dependent or SRP-dependent in their targeting, so that's the correct way to refer to them for the purpose of this analysis. Yes, the conclusion of this paper and other work (e.g. Ast et al.) is that these SRP-dependent sequences with long hydrophobic sequences are mostly transmembrane (TM) helices.

      I appreciate the analysis of protein targeting features in evolutionarily distinct fungal species, but since the authors highlight importance of fungi in heterologous industrial protein production, it would have been satisfying to see some of these fungi included in this analysis. In particular, Pichia pastoris and Trichoderma reesei are commonly used fungi with apparently a highly specialized secretory machinery capable of very high production levels of different secretory proteins. I would urge the authors to consider the aspect of selecting optimal secretion signals for these industrial fungi and perhaps include some discussion of it in this manuscript.

      We added Pichia pastoris (Komagataella phaffii) and Trichoderma reesei to the analysis. We appreciate the suggestion to discuss optimal secretion signals, however, our analysis doesn't directly address that so we chose to leave that point out.

      Minor points:

      1. The authors state that both Sec63 and SRP pathways converge at the Sec61 translocon. However, we now know that targeting of proteins to Sec61 is even more complicated and for example the EMC is a complex that delivers some proteins to Sec61. It might be appropriate to cite some recent reviews on complexity of early protein targeting to Sec61 in the Introduction.

      As a review of complexity of early protein targeting, we cite a Aviram and Schuldiner 2017 (Targeting and translocation of proteins to the endoplasmic reticulum at a glance). We could add other citations if the reviewer considers this to be necessary.

      Page 5. The authors repeat the compound hydropathy analysis of Ast et al. and used the earlier reported 9-amino acid window for this. Is this analysis result robust with other window sizes?

      Ast et al., checked that this result is robust to window sizes of 9, 11, or 19 aa, in their Figure S1A, which we now specifically mention. In our manuscript, we instead check robustness to different hydropathy scales and prediction algorithms.

      Page 12. Authors state that "cleaved signal peptides do not need to span a membrane". A recent structure of the signal peptidase complex (PMID: 34388369) directly suggests that the signal peptide does span the membrane immediately before its final cleavage. Importantly, the SPC thins the membrane in this region to accommodate the shorter signal peptide h-region and this is proposed as a basis for SPC discriminating between signal peptides and longer transmembrane segments. It would be appropriate to cite this paper in the Discussion.

      Thank you for bringing this important paper to our attention. We have clarified our wording here and cited Liaci et al (PMID: 34388369) in the updated manuscript. Both for the detailed structural discussion, and for similarly concluding that in mammals "Signal peptides possess short h-regions".

      Reviewer #2 (Significance (Required)):

      Protein targeting into the early secretory pathway is an important general concept, and recent years have revealed many new aspects into the diverse mechanisms that cells employ for targeting of proteins with diverse folding needs by use of protein-specific targeting sequences. Also, how proteins are targeted is an important biotechnological question as choice of e.g. the signal peptide can have a dramatic impact on quantity and quality of the produced protein.

      This work is generally interesting to cell biologists studying mechanisms of protein targeting, but the results are mostly confirmatory. Still, no-one has carried out such analysis and fungi are remarkably diverse with potential for new innovations in protein targeting and therefore, the work should be published in my opinion. The suitable audience in my view is quite specialized and could be cell biologists with high interest in fungal protein secretion or biotechnologists using fungi for heterologous expression. For the latter, I would request the authors to extend the data analysis to a few more most biotechnologically relevant fungi and add some discussion on choice of signal peptide in biotechnological protein production in fungi.

      We appreciate this fair perspective. Indeed, we have added analyses of the biotechnologically relevant fungi Komagataella phaffii (Pichia pastoris), and Trichoderma reesei.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      This manuscript revisits the analysis of hydrophobic forces driving endoplasmic reticulum translocation in fungi. Sones-Dykes and Wallace convincingly show that the length of the hydrophobic helix in a signal peptide is the main factor distinguishing SRP-dependent and Sec63-dependent pathways. This simplifies a previous model that relied on a compound hydropathy score, which incorporated both length and hydrophobicity. The analysis, confirmed by Phobius and DeepTMHMM, indicates that length alone is an equally effective and simpler metric for predicting the translocation route in fungi. The study extends its computational analysis beyond the model yeast Saccharomyces cerevisiae to a diverse range of fungal species. It finds that the bimodal distribution of hydrophobic helix lengths-short for predicted Sec63-dependent and long for SRP-dependent proteins-is highly conserved. By broadly identifying proteins with short hydrophobic helixes, the research suggests that the Sec63 translocation route is crucial for cell wall biogenesis and secretion (likely encompassing and the secretion of virulence factors). This provides a functional and pathological context for the translocation pathway choice.

      The manuscript was well written, and its central messages were clear.

      We appreciate this, and are glad that the messages came across clearly.

      Major points:

      • Extension of analysis to human secretome: In Fig 4, the helix length analysis is extended to additional organisms, among them Homo sapiens. It is observed that 'h-region lengths in humans had a similar distribution'. However, as the authors themselves note in the introduction, the functional thresholds of signal peptides are dramatically different in mammalian cells. Without overlaying 'ground truth' data of Sec63-dependence in humans, it is difficult to draw any conclusions about the meaning of h region length on human translocation preferences. I would suggest either: (1) Performing an analysis similar to that done in Fig 1 for the human secretome (2) Removing the human outgroup from the analysis in Fig 4.

      We appreciate the reviewer's point, but decided to keep the human analysis as an outgroup in Fig 4. only. This manuscript focuses on fungi by extrapolating and testing results from S. cerevisiae on other fungi. A mechanistic interpretation of signal peptides in human cells is out of scope due to the mentioned differences in functional thresholds of signal peptides in human cells. However, including humans gives a context that we feel readers would ask for if we did not include it.

      If we wanted to analyse the human signal peptides thoroughly then it would be interesting to extend to a more diverse range of eukaryotes, and extend beyond signal peptide prediction algorithms to structural modeling of signal peptides into cognate translocon structures. That's a whole different project.

      • Incorporate additional cross-validation: Since the key findings from this paper stem from hydrophobic segment predictions, it would be beneficial to augment the conclusions with another independent analysis. The Hessa scale (PMID: 15674282) has the advantage of being a 'biological' hydrophobicity scale defined by transmembrane helix insertion. It would be important to show that the findings obtained with Phobius (e.g. no improvement in categorization with compound score) also hold with this scale.

      Thank you for this helpful and important point. We also performed the analysis with the Hessa scale, included in the updated manuscript as Figure S2. The Hessa scale looks like a better predictor than the Kyte-Doolittle or Rose scales in that the distributions are clearly different for SRP-dependent and Sec63-dependent proteins. However, there is no improvement in classification, both because the Hessa maximum hydrophobicity distributions for SP and TM groups overlap, and also because the 97.5% accuracy of the length-based prediction is already so good that there's no room to improve in classifying this set of S. cerevisiae sequences.

      Minor points:

      • Incorporate GO analysis in Fig 4: Visualization of the GO analysis referenced in the text (Fig 4) may be useful to drive home the point of .

      We have indicated the top enriched GO terms in the paper, and also provided the full GO results in the supplementary data at https://github.com/TristanSones-Dykes/TMSP_Pub. There's not really more information in these GO analyses that makes it worth plotting. For example, for predicted signal peptides in all annotated fungi, "extracellular region" and "cell wall" come up as very highly enriched with extremely low p-values.

      • Cite origin of 'ground truth' protein list: The authors cite 83 and 107 bona-fide Sec63-dependent and SRP-dependent proteins which were used to define the 'ground truth' lists. It would be informative to define how these lists were collected; for example, the Ast et al. paper referenced appears to validate ~40-50 proteins as Sec63-dependent.

      The 'ground truth' protein list was collected and curated in the paper by Ast et al., and thoroughly explained there. In our expanded methods section, we now explain their classification based on localisation/mislocalisation of GFP-tagged proteins in sec72∆ (Sec63 complex deficient) strains. After careful checking, we didn't find any flaws in their analysis or any better yeast datasets more recent than 2013. So, we think the approach of giving a brief description here and referring to Ast et al. for a thorough description is most helpful for readers.

      Reviewer #3 (Significance (Required)):

      This manuscript by Sones-Dykes and Wallace provides a modest but important advancement to the field of protein secretion. While previous work has already identified that Sec63-dependent proteins in baker's yeast have moderately hydrophobic signal peptides, this paper refines this concept and extends it for additional fungal species. It will be of interest to researchers studying protein translocation/secretion pathways and fungal biology.

      Thank you for supporting the main point of our paper. We agree with the assessment, and that this analysis needed to be done to discover if and how results from S. cerevisiae extend to other fungi. We hope that this paper will encourage new work on mechanisms of protein secretion in other fungi, especially of the role of the Sec63 complex.

    1. Author response:

      Reviewer 1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992)  and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We will soften this term in our revision to “nearly parallel to the microtubule”. In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We appreciate the reviewer’s point and we will work to streamline and define terms in our revision.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution. Our preliminary analysis suggests that roughly 20% of events are missed due to this 60 nm cutoff. This will change our unloaded duration numbers slightly, but this will not alter our conclusions.\

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point.

      More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We agree that force after slippage is much smaller than at stall, and we plan to clarify that section of text. However, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. We plan to clarify these points in the text of the revision.

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      We will address these points in our revision.

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      We will address this point in our revision.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although these non-parametric methods make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). For instance, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections due to finite microtubule lengths.

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6A kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. There are also two faint horizontal red traces, which are labeled DNA diffusing by (smearing over a large area during a single frame). Panel S6B shows run durations of motors with DNA attached. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      Our interpretation of the unloaded kinesin-DNA result is that the much slower diffusion constant of the DNA relative to the motor alone enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration. In contrast, such detachment events for motors alone normally result in the motor diffusing away from the microtubule, terminating the run. This argument has been used to reconcile the longer single-motor run lengths in the gliding assay versus the bead assay (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, then it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event. We will address this point in the revision.

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We will change this text to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and it is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the revision we will note in the Discussion that this behavior was indicative of an ideal bond (not a catch-bond), and we will also add a sentence in the Introduction highlighting this work.

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. To address it, we calculated the predicted stiffness of the dsDNA by taking the slope of theoretical force-extension curve in Fig. 1B. Below 650 nm extension, the stiffness is <0.001 pN/nM; it reaches 0.01 pN/nM at 855 nm, and at 960 nm where the force is 6 pN the stiffness is roughly 0.2 pN/nm. That value is higher than the quoted 0.05 pN/nm trap stiffness, but for reference, at this stiffness, an 8 nm step leads to a 1.6 pN jump in force, which is reasonable. Importantly, the stiffness of kinesin motors has been estimated to be in the range of 0.3 pN (Coppin et al., 1996; Coppin et al., 1997). Granted, this stiffness is also nonlinear, but what this means is that even at stall, our dsDNA tether has a similar predicted compliance to the motor that is pulling on it. We will address this point in our revision.  

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. We plan to address this in our revision. Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      Due to the compliance of the dsDNA, the 95% limits for the initial attachment position are +/- 290 nm (Fig. S2). Thus, some apparent ‘jumps’ from the detached state are expected. We will take a closer look at why there are jumps for kinesin-2 that aren’t apparent for kinesin-1 or -3.

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References:

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Coppin, C.M., J.T. Finer, J.A. Spudich, and R.D. Vale. 1996. Detection of sub-8-nm movements of kinesin by high-resolution optical-trap microscopy. Proc Natl Acad Sci U S A. 93:1913-1917.

      Coppin, C.M., D.W. Pierce, L. Hsu, and R.D. Vale. 1997. The load dependence of kinesin's mechanical cycle. Proc Natl Acad Sci U S A. 94:8539-8544.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-of-war models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna, and J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:6371-6376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243-253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaffer. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:1122-1126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaffer. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17:e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      eLife Assessment

      The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.

      We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that significant additional feasibility studies are required. As comparison, the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003) achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: “They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.”

      Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.

      The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.

      Strengths:

      This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.

      Weaknesses:

      Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.

      Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and importantly test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.

      The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.

      Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.

      We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798 and PMID: 24685391). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.

      Lastly, if one wants to backcross at a later stage, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.

      Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.

      Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment that tagging each gene separately may not be considered helpful. Why would one do single tagging at a time, rather than triple tagging if the whole point of the paper is to demonstrate the scalability of tagging? Meaning, that one can shortcut tagging all genes by a factor of 3 through joint tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.

      Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.

      The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.

      We do not think that the utility of current BFPs is very limiting. The theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.

      Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.

      We are not sure what the reviewer is referring to when bemoaning that the Abstract and Introduction are too focused on our paper and not presenting the state of the field. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.

      This being said, we will gladly expand our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.

      In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.

      In the revised version of this manuscript, we will discuss some of these points in the first paragraph of the results section:

      “In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014).”

      “These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci.”

      Reviewer #2 (Public review):

      The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?

      We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.

      As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.

      We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.

      The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.

      As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community.

      The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.

      Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:

      We are not convinced that expectations are mistaken. Below we respond to the reviewer’s specific examples and we are open to hear from the reviewer about additional cases.

      (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).

      There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.

      The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (Packer et al 2019, highest intestine and hyp; Ghaddar et al 2023 intestine, sheath and BWM, and even oocyte). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.

      However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that “there are no published studies about this enzyme, so we really don't know for sure what it's doing” is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.

      (2) The expectation that HXK-1 is ubiquitously expressed.

      Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).

      The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.

      Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.

      We note that the Ghaddar et al. and CeNGEN/Taylor et al. datasets do not. The scRNA paper cited by the referee (PMID: 38816550) also shows enrichment in neurons and pharynx, which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.

      To clarify these points, we will add the following to the discussion section:

      “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”

      The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.

      We will add this information to the table including annotated expression levels in young adults from various datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.

      Reviewer #3 (Public review):

      Summary:

      The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.

      Strengths:

      The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.

      We are grateful for the referee’s appreciation that whole proteome tagging is feasible.

      Weaknesses:

      Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.

      (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?

      All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.

      (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.

      mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.

      The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      we thank the reviewers for their close reading of the manuscript and detailed comments.

      __Reviewer #1 __

      1. The idea that Xrp1 induction switches around 16 h post-IR, becomes RpS12-dependent, and subsequently engages cell competition is interesting and potentially important. However, the evidence supporting RpS12-dependence of Xrp1 induction is currently not sufficiently convincing. For example, based on the images in Figure 6F-supplement 1, the conclusion that Xrp1 is induced in an RpS12-dependent manner appears difficult to support. The authors should strengthen and quantify this result or provide the raw image data. In addition, because this point is central to the authors' model, they should move the key supporting data from the supplementary figures to the main figures to ensure that this critical claim is clearly supported and readily accessible to readers.

      We apologize for confusing all three reviewers with this figure. Actually, Figure 6F supplement 1 does not compare RpS12-dependent and -independent Xrp1-HA expression. Instead, it shows that the rps12-independent Xrp1-HA expression is only mildly p53-dependent, which is consistent with our idea. We had not compared RpS12-dependence or Xrp1 expression in this manuscript because we had published that previously and found a substantial dependency (Fig 1N-P of Ji et al 2021). Because that previous paper used an anti-Xrp1 antibody, and the present paper measures an HA-tagged Xrp1 protein, it is probably a good idea to include the RpS12-dependence of late Xrp1 expression again, using the Xrp1-HA reagent. We have this data, which shows ~75% dependence, which is highly significant statistically. We will include this data in the revised manuscript, within one of the main figures.

      • The authors suggest a model in which Xrp1 executes two qualitatively distinct "modes"(pro-repair/acute DDR and elimination of aneuploid cells), but this remains only partially convincing as currently presented. The authors should at least (i) provide quantitative evidence that could explain how Xrp1 might produce distinct outcomes across phases(e.g., comparing Xrp1-HA levels and/or the fraction of Xrp1-HA-positive cells at 2-4 h versus 16-24 h post-IR), and (ii) explicitly discuss plausible mechanisms in the Discussion. Even if the molecular "switch" is not fully resolved experimentally, a clearer, data-grounded discussion of how Xrp1 could mediate these temporally distinct functions is needed. In addition, since ISR signaling (e.g., eIF2α phosphorylation) has been implicated as a single feature associated with Xrp1-dependent loser elimination, the authors should consider assessing p-eIF2α levels in Xrp1-HA positive cells at early versus late time points after IR(e.g., 4 h vs 24 h).

      We thank the reviewer for highlighting the need for this discussion. We will clarify these issues in the revised manuscript but do not think further experiments are necessary.

      1. It was well established previously and confirmed here that little DNA damage remains ~24h after IR. This is sufficient to explain why there is little DDR at this stage. We will make this clear in the revision.
      2. We did not intend to claim that no cell competition happens during the acute DDR ~4h after IR. We are not aware of experiments showing the DDR is strictly cell autonomous and not influenced by neighboring cells. If the acute DDR is indeed cell autonomous, or mostly so, this could be due to the additional genes induced directly by p53 that are not induced by Xrp1 ~24h after IR. The cell death gene Rpr is one example reported in our paper. We will discuss this in the revision.
      3. The reference to ISR as the single feature inducing Xrp1 expression is referring to two Nature Cell Biology papers published in 2021 (Baumgartner et al 2021; Recasens-Alvarez et al 2021). This idea has not stood the test of time. The ISR reporter activities shown in these papers were later shown to be downstream of Xrp1, not upstream (Langton et al 2021; Kiparaki et al 2022). Langton et al argued that there could be an initial ISR that was too small to be detectable, but this is hypothetical. There are now multiple papers and preprints showing that it is long isoforms of Xrp1 are ISR responsive, but that short isoforms of Xrp1 initiate cell competition, and that RpS12-dependent alternative splicing produces the short isoform. The short Xrp1 isoforms lack the uORF that responds to ISR (Elife 2021 Oct 4:10:e74047; bioRxiv 06.15.659587; bioRxiv 2025.10.29.685279). This is not consistent with the ISR initiating cell competition idea. Because we and others have shown that it is Xrp1 activity that induces eIF2α phosphorylation (Ochi et al 2021, Langton et al 2021, Kiparaki et al 2022), eIF2α phosphorylation in Xrp1 expressing cells would not prove a role for ISR and we do not propose to make these measurements. We are undecided whether to include this discussion of the ISR in the paper. It would lengthen the paper and we do not think it is directly relevant.
      4. The idea that aneuploid cells-or cells with altered ribosomal gene dosage-could be removed via Xrp1-mediated cell competition is intriguing. However, the manuscript does not currently provide any evidence that such cells are, in fact, being eliminated. The authors should therefore (i) quantify cell-level overlap metrics, such as the fraction of γH2Av-positive cells that are Xrp1-HA-positive (and vice versa), as well as the fraction of γH2Av-positive cells that are cleaved Dcp-1-positive (and vice versa) at 24 h post-IR. These quantitative analyses would clarify whether the late Xrp1-HA-positive population corresponds to persistently damaged cells and whether it is enriched for cells undergoing apoptosis/clearance. The authors should also (ii) directly assess aneuploidy/segmental copy-number imbalance in the late Xrp1-HA-positive clusters (e.g., by DNA FISH targeting one or two chromosome arms/regions), and if these experiments cannot be completed within a reasonable revision timeframe, the authors should temper their wording and present aneuploidy and selective elimination as a plausible interpretation supported byRpS12 dependency and prior literature, rather than as a demonstrated conclusion in the current study.

      We agree that aneuploidy is not demonstrated in the current study. Elimination of aneuploid cells with altered Rp gene dose was already established by previous papers. We cited previous work in the manuscript but did not summarize the evidence explicitly, so we are not sure whether the referee was fully aware. Ji et al (2021) created 17 different segmental aneuploidies using Flp/FRT recombination including or abutting 10 different Rp genes, together covering >20% of the euploid genome. The results showed that segmental aneuploidies are largely removed by Rp gene dose-dependent cell competition using the RpS12 and Xrp1 genes. Others have since confirmed that aneuploidies are removed by cell competition and that the effects of Rp gene dose depend on Xrp1 (Fusari et al Cell Genomics 2025). Therefore, we consider it established that aneuploid cells with altered Rp gene dosage are removed by this mechanism. We will discuss this explicitly in the revised manuscript.

      The question of whether cells dying in a p53-independent manner ~24h after irradiation are aneuploid cells undergoing cell competition was also addressed previously. Ji et al 2021 already showed that most of these cells are eliminated by RpS12 and Xrp1, consistent with altered Rp gene dosage, and that preventing cell competition leads to persistence into adulthood of cells that can be recognized at Rp+/- from their bristle phenotype. Evidence was shown that most such cells are segmental aneuploids, consistent with earlier studies of DNA repair mutants (Baker, 1978). We will summarize this in the revised manuscript so that it is not necessary to read the cited references to appreciate the evidence. The only new observation being made in this paper about the ~24h cell death stage is that loss of p53 increases the number of these cells, which could be because inadequate DNA repair leads to more aneuploid cells.

      It is important to appreciate that we do not claim that cells labeled by the DNA damage marker γH2Av are aneuploid, or being removed by cell competition. On the contrary, γH2Av labels cells with unrepaired DNA damage, whereas segmental aneuploidy can only occur as a consequence of completed DNA repair. Thus γH2Av-labeled cells are not generally expected to be Xrp1 positive or undergoing cell competition. Some may be, if they are cells that have both unrepaired DNA damage and repaired DNA damage that led to aneuploidy. We cannot quantify overlap in the existing data, since mouse antibodies for γH2Av and HA-tag were used in separate experiments. Repeating the experiments with different antibodies to measure the overlap would not address any outstanding questions.

      We doubt FISH would be effective at measuring aneuploidy because only gene dose corresponding to the probes would be detected. Only small portions of the genome could be assessed at a time so the frequency at which aneuploidy could be detected would be low. We will make it clear in the revised manuscript that cell competition of aneuploid cells is not a new claim of this paper but something that has been studied before.

      • Regarding the statistical analysis, revisions are warranted. In multiple panels, Student's t-tests are repeatedly performed against the same control, which inflates the family-wise error rate and increases the risk of false-positive findings. In such cases, an overall ANOVA (one-way) followed by an appropriate multiple-comparison procedure-such as Dunnett's-test would be more appropriate.

      This concern applies in particular to:

      Figure 1A- Supplement 1

      Figure 2M-R

      Figure 3Q, R

      Figure 5D

      Figure 5J- Supplement 1

      Figure 6G- Supplement 1

      1. Figure 6I- Supplement 2

      We agree and will apply Anova with multiple comparison procedures in the revised manuscript.

      Minor comments:

      1. Figure 2E is not cited in the text, and it is difficult to tell from the images as presented whether p53DN overexpression suppresses the Gstd-lacZ signal at 4 h post-IR.

      We will replace Fig 2E with a clearer example, and add a quantification of all our data, with statistics, as a supplemental figure. Note that the conclusion is already substantiated by qRT-PCR data (Figure 2M)

      In Figure 4, rpr150-lacZ does not appear to be upregulated by Xrp1 overexpression. Therefore, the authors should revise the figure title to avoid misleading readers, because rpr, a well-known p53-responsive pro-apoptotic gene, is not induced under this condition.

      We will change the Figure title. Failure to induce rpr150-LacZ here is a control to show that Xrp1 overexpression does not induce p53 activity.

      In Figure 6E, based on the data as presented, it is difficult to determine whether cleaved Dcp-1 (cDCP1)-positive cell counts are reduced upon Xrp1 knockdown. The authors should provide clearer representative images and/or include the underlying raw images as supplementary source data to support the conclusion.

      We will replace Fig 6E with a clearer example, and add a quantification of all the data.

      The authors should (i) show raw data points overlaid on summary plots (e.g., dot plots on top of bar graphs/box plots) to convey data distribution and (ii) include higher-magnification insets and/or quantitative localization/overlap analyses where colocalization is central to the interpretation (e.g., Xrp1-HA relative to γH2Av).

      We agree regarding the data display. As discussed later, colocalization is not relevant to the interpretation.

      __Reviewer #2 __

      1. First, authors present evidence that Xrp1 is induced in wing discs exposed to ionizing radiation (IR, known to cause DSBs) and that this induction relies on p53 regulating Xrp1transcription (Figure 1 and S1). Data are clear but there is a puzzling result. Xrp1-lacZ (a reporter of Xrp1 transcription) is induced by IR but independently of p53. These results need attention as they appear to be contradictory (why Xrp1-mRNA but not Xrp1-lacZ relies on p53). Nicely, authors show that Xrp1-lacZ induction relies on Xrp1/Irbp18 autoregulatory feedback. Is the lacZ insertion somehow interfering with the capacity of p53 to bind and regulate Xrp1 expression?

      We agree that it is a puzzling result. We have also noted elsewhere that Xrp1-LacZ does not always reflect Xrp1 mRNA and protein expression (Kumar and Baker 2022). We can add the reviewer's hypothesis to the manuscript, although it does not explain why Xrp1-LacZ is induced by IR

      • Second, authors use a collection of reporter genes and show that Xrp1 regulates, most but not all, Dp53 target genes. It is really unclear whether the reaper-lacZ used in Figure 3L-P recapitulates the induction of reaper by p53. I know this reporter was claimed by other do so, but NOT in the wing disc. I would then remove it as mRNA data are clear.

      rpr150-lacZ was used as a p53 reporter in wing imaginal discs by Wells et al. 2011 (PMC3296280). We will cite this in the revised manuscript. We prefer not to remove it as we also use this reporter for the experiment shown in Fig 4.

      3 Third, authors show that Xrp1, as expected from the previous data in Figure 2 and 3, also mediated the role of Dp53 in inducing cell death, although only partially, and these differences are attributed to the gene reaper (p53 but not Xrp1 target). Dcp1 should be cDcp1 and clones should be magnified in Fig 5E-G.

      We will follow this advice in the revised manuscript

      • First, the impact of Xrp1 on the levels of DNA damage and cell death after 24h of IR are shown in a p53 mutant background (6E1-6E3). Authors should present the data in a clean +/+ background. Quantification of 6F should also be done in the same background.

      This data was presented in a the p53 mutant background to focus on the p53-independent removal of cells by cell competition. We can perform an experiment in the presence of wild type p53 for completeness if desired, but a mixture of DDR and cell competition effects may result.

      Second, hid-GFP is being induced by IR already at 4 h after IR and this induction and this induction relies on p53 and Xrp1 activities as shown in previous figures. Thus, the data presented in 6G-J could be a trivial consequence of the strong perdurance of the GFP protein.

      hid-GFP is not expressed at 4 hours in p53DN and Xrp1 K/D (Fig 3D,E), so the expression in 6G-J cannot be explained by GFP perdurance from the earlier timepoint.

      Third, the role of cell competition (driven by Minute aneuploids) is not demonstrated and relies simply on the potential role of Xrp1 in the late wave of cell death, proposal that has not been demonstrated in this paper either. Indeed, the no-role of RpS12 in the late induction (24 h wave) of Xrp1 (Figure 6 S1-F) reinforces my doubts. Authors should reflect in the introduction and discussion sections the most recent literature in the field.

      The role of Xrp1 in the late wave of p53-independent cell death is shown in Fig 6D-F. As discussed above (reviewer 1 point 1), Fig 6S1-F shows the limited role of p53 in rpS12-independent Xrp1 induction, not the role of RpS12. We will add a figure to the revised manuscript showing the strong RpS12 dependence of the late induction of Xrp1-HA and explain this more clearly. We did not include this in the first manuscript version because we had already published this result, albeit with an anti-Xrp1 antibody (Ji et al Fig 1 N-P). As also discussed above (reviewer 1 point 3), we agree that the role of cell competition in removing aneuploid cells is not demonstrated in the present manuscript, but we considered this had been demonstrated previously (Ji et al 2021), and parts of that study recently confirmed by others (Fusari 2025 Cell Genomics), so it is not necessary to add further experimental support here, although it will be useful to explain the published literature more fully.

      Reviewer #3

      1. Figure 2E. Based on the text, I think the authors are claiming that the expression of GStD-LacZ is reduced in the posterior compartment of panel 2E compared to 2D. This is unconvincing. If at all, the expression along the DV boundary in the posterior compartment is stronger in E than in D. Am I missing something?

      We will replace Fig 2E with a clearer example, and add a quantification of all our data, with statistics, as a supplemental figure. Note that the conclusion is already substantiated by qRT-PCR data (Figure 2M)

      Figure 3I - K. The expression in the posterior compartment is supposed to be reduced compared to the anterior compartment. Once again, these differences are not easily apparent to me. Perhaps these images need to be quantified to illustrate the supposed difference.

      We are sorry that the reviewer found the images unconvincing. We will replace these figures with other examples, and add quantifications of all data, with statistics, as a supplemental figure. Note that the conclusions are already substantiated by qRT-PCR data (Figure 3R)

      • . *

      Line 286. The heading "Xrp1 is sufficient for the expression of p53-dependent DDR genes" is misleading. As stated in the final sentence of paragraph 2 of this section, the authors show that Xrp1 functions downstream of p53 and is sufficient for expressing a subset of p53-dependent DDR genes.

      We apologize for misleading the reviewer. We will change the heading to "Xrp1 is sufficient for the expression of many p53-dependent DDR genes", which is the meaning we intended.

      Figure 5, panels F and G could be made much easier for the reader to follow. The labels in these two panels are very difficult to see and understand. It might be better to show some high magnification regions (e.g. insets) that show the differences in the prevalence of cell death in regions with different genotypes. Also, why is Xrp1 +/- not quantified in panel H since the authors claim that cell death is reduced even in the heterozygous cells?

      It is a good idea to add enlarged figures, and we will do so. We can quantify the Xrp1+/- genotype as well.

      Line 363 and Figure 6D, E. The authors argue that the increase in H2Av in the posterior compartment implies that cells with damaged DNA are not being eliminated when Xrp1 function is reduced. An alternative explanation is that the p53 mutation together with the Xrp1 knockdown impairs the DDR even more resulting in increased H2Av staining. I don't know how that authors' data can exclude this possibility.

      We agree with the reviewer and did not intend to exclude this possibility. We will rewrite this text to make both explanations clear.

      Line 365. Is the resolution of the "double labeling" sufficient to conclude that some of the H2Av cells upregulate Xrp1-HA? A more conservative interpretation would be that in these regions that have increased H2Av, that there is more expression of Xrp1-HA.

      We apologize for a mistake in the submitted manuscript. In fact the anti-H2Av and anti-HA primary antibodies used were both raised in mouse, and Fig 6G,H show distinct wing discs, not double labels. We will replace line 365 with the sentence suggested by the reviewer.

      Figure 6 - supplement 1. The expression of Xrp1-HA is reduced in the p53DN cells when they are a loss mutant for rps12. Although statistically significant, this reduction is modest. If this induction were due to a cell competition like phenomenon, would you not expect the induction to be completely abolished since rpS12 mutations abolish cell competition completely? Please explain.

      We apologize for confusing all three reviewers with Figure 6F supplement 1. This figure does not compare RpS12-dependent and -independent Xrp1-HA expression. Instead, it shows that the rps12-independent Xrp1-HA expression is only mildly p53-dependent, which is consistent with our conclusions. We will add a figure to the revised manuscript showing the strong RpS12 dependence of the late induction of Xrp1-HA and explain this more clearly. We did not include this in the initial manuscript version because we had already published this result, albeit with an anti-Xrp1 antibody (Ji et al Fig 1 N-P).

    1. Since mm. 35–39 hold onto the dominant harmony from the end of TR, what we find is a blurred entry into S-space. As a result, commentators have differed about where the secondary theme begins.6Close This problem can occur when S-themes start on or over the dominant, following an HC:MC in the key of S. Sonata Theory regards such an opening as one type of S0  (S-zero) or S1.0  theme: a new melodic idea, usually with a clear initiating function, but a theme that, at its opening, “retains the MC’s active dominant, which continues to ring through the succeeding music as momentarily fixed or immobile . . . [rather like] a prolongation of the caesura-dominant itself” (EST, 142–43). Emerging out of the low-register darkness and directed forward by the now diatonically inflected wobble in the viola, D3-C♮3, the cello opens the exposition’s part 2 in m. 35 with S0. It begins with a triadic climb on the sustained dominant, D2-F♯2-A2 (5̂-7̂-2̂), mm. 35–36, releasing the preceding G minor into G major with the B♮ upper-neighbor at the end of m. 35. At the same time, it reanimates the cello’s dotted-eighth-and-three-sixteenths rhythm from mm. 31–32 (traceable back to the P1.3 melody in mm. 13–17), the task of whose pulsations is always to flow into the succeeding bar: it will recur throughout much of S. Recalling Adorno’s suggestion that this movement may be heard “as the [unfolding] history of the opening fifth,” we may be invited to hear a relationship between the D-F♯-A opening of S0 and the blunt fifth-leap of P0. As we shall observe, other aspects of the subsequent S-theme also suggest back-references to P, continuing the sense of this music as enacting a process of ramification and becoming. As so often in Beethoven, it is possible to hear S as an imaginative recasting of several of P’s characteristic features: the principle, once again, of contrasting derivation. If one wishes to underscore this point, it is possible, with due cautionary nuances, to suggest that a new subrotation begins at m. 35. But to claim, with Adorno, that our task must be to show the “mediated identity” of P and S (my italics) is an ideologically grounded step too far (1998, 13). The cello’s D2-F♯2-A2 is answered three octaves higher and in retrograde by the first violin, A5-F♯5-D5, mm. 36–37. Continuing the process of S-emergence in the manner of a question or proposal, the cello climbs higher on the rungs of the V7/III chord, F♯2-A2-C3, mm. 37–38. The first violin responds with a reply that floats upward into the highest available register, sweeping the fog away into a patch of momentarily confident serenity, gliding along with the now-rolling meter. Triggered by the I6 chord in m. 39 (reckoning now in G major), the seraphic mm. 39–40, with fluttering inner voices, sound a complete cadential progression and produce a seemingly trouble-free III:IAC on the second beat of m. 40. Mm. 35–40 can be grouped as a compressed, six-bar sentential phrase. Even while they prolong a V7 harmony, mm. 35–36 and 37–38 suggest the onset of a rhetorical presentation (2+2, αα‎′). In this case, Beethoven omits the usual continuation idea (β‎) and proceeds immediately to the S1.2 cadential unit (γ‎). Let’s call the presentation, mm. 35–38, S1.1 (S0==>S1.1) and attach the designator S1.2 to the cadence, mm. 39–40.7Close Grasping the import of this six-bar phrase, mm. 35–40, is critical to understanding all that follows in the exposition. Recall the menacing E-minor threat from P, remembering also that no E-minor PAC had been sounded in that zone: that chilling seal of negativity had been pushed aside, repressed in m. 19. The point now, in S, is to secure a major-mode III:PAC with the hope of resolving it into a I:PAC in the parallel spot of the recapitulation, whereby the mechanics of the sonata process would overturn the initial E minor into E major. While by no means providing terminal closure, sounding the serene, G-major IAC in m. 40 is the first step of this attempt. It could be understood, for instance, as a six-bar antecedent, naïvely hoping for a consequent. But no consequent follows it. Instead, mm. 41 backs up to sound a variant of m. 39, a phrase-extension seeking to replicate the III:IAC with the melody now in the second violin. Near the cadential moment, m. 42, the predicted cadence falls apart on an f♯o7 chord (viio7, with the cello also shifting momentarily into a higher register), slipping onto V65 at the end of the bar. Nonetheless, gliding along on the metrical rails, the sense of local serenity spins onward in mm. 43–45, S1.3, piano and dolce. These bars constitute another, similar cadential unit, I-ii6-V(7)-I, producing a second III:IAC at the downbeat of m. 45, again with B5 in the topmost voice. As before, the IAC is not allowed to settle, but is immediately subjected to a variant of S1.3', mm. 45–46 (= mm. 43–44). This time the potential IAC-effect in m. 47 is softened through melodic diminution, and instead the tonic chord on m. 47 starts the gentle push of yet another cadential progression, mm. 47–48, this time clearly headed for a desired III:PAC downbeat and the hoped-for structural closure in 49. More than that, the V65/V in the second half of m. 47 and, above all, the melodic descent in the first violin in m. 48 (6̂-1̂-3̂-2̂) recall and transpose m. 18 from P—the E-minor cadential moment whose seemingly inevitable i:PAC had been subverted. And similarly, Beethoven subverts the predicted G-major cadence in m. 49 with an unexpected forte, f#o42—enharmonically the same diminished seventh that had thwarted the E-minor cadence in m. 19. By now it has become clear that sounding that III:PAC (EEC) is not going to be an easy task. For all of its dolce serenity up to this point, S is now running the risk of being reduced to a string of failed cadential modules. The diminished-seventh bluster of mm. 49–50, S1.4, not only blocks the expected III:PAC but also assumes the role of a two-bar anacrusis: a new, energetic windup gathering up strength to throw off a hopefully more secure approach to the anticipated structural cadence. Once again, the procedure in play—backing up to restate or refashion an earlier, unsuccessful cadential module—is the familiar “one-more-time technique” (Schmalfeldt 1992). Its first release, with the viola now in the upper voice, is in mm. 51–52, an S1.3 variant now falling, with the viola’s 6̂-5̂-4̂-3̂-2̂-(1̂) descent, toward a promised III:PAC. But again the cadence is blocked by an even more emphatic intervention of the S1.4 anacrusis-windup, mm. 53–54, expanding outward in an aggressively strenuous wedge. This opens onto a climactic cadential in m. 55, with registral extremes in the outer voices.8Close At this point the S zone’s “one-more-time” strategy changes. With the F♮6 in the first violin, m. 55, we abandon the quest for a straightforward cadential module. The three bars of mm. 55–57—at first a near-gravityless hovering, then a dolce, rapid plunging down to earth—close the wide-open wedge and signal a preparation for something new. They land on the downbeat of m. 58, where something different starts to generate. Call it S1.5: a more decisive buildup, begun in a hushed, secretive pianissimo: reculer pour mieux sauter. If the soaring mm. 55–57 had struck us as a metrical expansion, unpinning our entrainment with the previously smooth-flowing meter, the chromatic mm. 58–64 give us a different sense of metrical compression or disruption. The off-kilter rhythms and tied eighth notes set the notated meter into conflict with what soon locks into an implicit displaced from the barline by a half-beat: a metrically offset hemiola. While anticipated in m. 58, this becomes clearly apparent by m. 59, where the “misaligned ” implications are more securely established with the second eighth note of the bar. Their metrical-clash tuggings, which Kerman characterized as “nervous . . . twitchy syncopation” (1966, 126), are unmistakable in the buildup occupying mm. 60–64. Reinforcing the edgy tension of mm. 58–64 are the chromatic bass-line windings around the ever-strengthening dominant (notice the potent augmented-sixth approach to the in mm. 62–63) and the inexorable homophonic crescendo. By m. 64 the now-supercharged V7 is sounded forte, with ringing double-stops in the upper three parts. The import of all this could not be clearer: the drawing-back of the tensest possible bowstring in preparation for a potent downbeat-release. The arrow is shot forth with the sforzando tonic chord in m. 65, elided with and setting off a new, decisive thematic module. Notice also how Beethoven enhances m. 65’s shooting-forth through a foreshortening of the last of the metrically displaced “” implications by an eighth note. Thus the ensemble’s final bow-stroke in m. 64, marked staccato, becomes the trigger-moment that snaps the off-kilter syncopations back into realignment with the notated barlines, restoring our entrainment with meter. We now confront the most analytically challenging moment of the exposition, one that will shape any larger interpretive reading that we have of the movement. M. 65 is certainly a point of strong tonic arrival: G major rings out with celebratory flourishes, and it is emphatically prepared by a preceding V7. But does it qualify as a structural cadence? For Sonata Theory the question matters, since one of its central concerns is to attend to the manner of attaining, or not attaining, the generically mandated, non-tonic PAC near the end of any exposition: the completion of the essential expositional trajectory with the cadential production of the EEC. For all of the sense of euphoric arrival at m. 65, the notational evidence on behalf of an unassailably secured structural cadence is not complete, leaving open the possibility for two different understandings of this moment. In such cases Sonata Theory’s maxim is to explicate the ambiguities rather than to insist upon only one right way to understand the situation. Why might one hesitate before endorsing m. 65 as a structural cadence? What I’ll call Reading 1 draws attention to its cadential complications. Here at the downbeat of m. 65 we first notice that the topmost voice is on 5̂, D6, setting off an arpeggio cascade down to another 5̂, D4. From that perspective m. 65 might heard as a III:IAC, not a III:PAC,9Close and that accented high D6 continues to ring through mm. 65–68 as if sustained or frozen in that register. Moreover, at m. 65 Beethoven silences the second violin for two blank bars: its valenced leading-tone in m. 64, F♯5, is kept from its predicted resolution onto G5. Why? (As we shall see, in the parallel passage in the recapitulation this does not happen.) To be sure, the sforzando kickoff to the new thematic idea is forcefully accented, but the m. 65 reduction from the preceding double-stop thickness to a three-part texture is at least worthy of our notice. We might also observe that in m. 65 the downbeat G2 in the cello is of the briefest possible duration, and the vigorous G2-D2 alternation in the cello keeps the D2 dominant of mm. 63–64 in play through m. 68, albeit on metrically weak offbeats. This means that the thematic bolt shot forth in mm. 65–68 is registrally framed by a quasi-sustained D6 on the top and D2 on the bottom: the theme is encased within 5̂ above and 5̂ below. To what degree does all this undercut, or at least attenuate, the impression of a structural cadence? Or, in extreme versions of Reading 1, is it conceivable to hear m. 65 as anything other than a cadence? The alternative would be to hear S1.5, mm. 58–64, less as a cadential-function module than as a broad anacrusis that lands squarely on the tonic at m. 65 to set free a fresh, resolute thematic idea. (As noted in chapter 4, the music preceding elided PACs or PAC-effects, particularly when the thematic material of the cadential downbeat is vectored determinedly forward, can often take on the additional, preparatory function of an extended anacrusis, released at the point of tonic arrival.) But what would such a reading suggest? M. 65 surely marks an attainment of some sort. But it may be that m. 65’s G major is insisted upon by a dogged force of will, not attained by a problem-free cadence: a hyper-strong downbeat prepared by a metrically conflicted, seven-bar anacrusis in mm. 58–64.10Close “If G major cannot be secured with an unequivocal cadence—if there is no literal PAC—we will at least proclaim G major to be sufficiently attained by fiat. Plant the flag with fortitude even though the territory is not yet fully conquered.” This would mean that m. 65 falls short of being read as an EEC. And yet for all of these complications most listeners would probably find it more intuitive to hear an implicit cadential arrival at m. 65, especially in the immediate secondary-theme context of repeated cadential frustration through the several preceding “one-more-time” blockages, which are generically common toward the ends of secondary-theme zones. Those favoring a (quasi-) cadential understanding of m. 65—call it Reading 2—might suggest that the “PAC” resolution of the preceding V7 is something to be conceptually understood, even though upon examination it is not literally present: the forceful, sforzando elision of the newly released theme blots the implicit PAC out of audibility. Listeners, the argument might go, will hear a PAC-effect at m. 65 even though a check of the notation does not provide the written evidence for one. Such a PAC-effect, in turn, could be understood as providing at least a locally credible EEC-effect. Within the flexibilities afforded by Sonata Theory practice, the argument would be that, given the strength of the m. 65 arrival and the manner in which it is prepared, it could be considered a deformational EEC—a contextually practical substitute for it—seeking to ground the G-major tonic by assertion, that is, by means other than the prototypically normative cadence. In sum, Reading 1 (no structural cadence) argues that the generically expected III:PAC is so compromised at m. 65 that we should not conclude that the EEC has been satisfactorily accomplished. Reading 2 (implicit cadence-effect) allows for a sufficient EEC-effect via a cadentially attenuated but practicable stand-in for the EEC. Is it obligatory to choose either the one way or the other? Or might it be, in the reading that I prefer, that Beethoven has purposely composed these ambiguities into mm. 58–65 in order to unsettle our confidence in what, now mulling over the matter two centuries later, Sonata Theory regards as a normatively secured EEC? Perhaps the point is precisely that of its almost-ness, its combination of yes-and-no features, both of which play into the dramatic staging of the movement’s larger {– +} drama of modal reversal or non-reversal. Any such conclusion would have to be a central part of one’s hermeneutic reading of the movement. What then do we make of the theme that begins in m. 65? Should we think of it as a closing theme (post-EEC) or not? It may sound like a characteristic C theme, or a C theme that could have been, but, again, the confidence of its C-status can be called into question through the multiple attenuations of the PAC-effect at m. 65. How to resolve this question? As I have also noted in chapter 5’s discussion of the first movement of Haydn’s “Military” Symphony, Sonata Theory refers to such a thing as an SC  theme: “the presence of a theme literally in precedential, S-space that in other respects sounds as though it is more characteristically a closing theme.” This kind of theme seems “to bestride both the S- and C-concepts” (EST, 190–91). While regarding m. 65 as self-evidently precadential is a step too far, my preference is to call this an SC theme, if only to remind myself of the problems surrounding the m. 65 moment. If you are convinced by the EEC-effect at m. 65 and wish to regard the new theme as C, that’s also fine: substitute your C for my SC in what follows. In most cases SC themes will lead to a clearer production of an EEC (and C themes will normally confirm the EEC with one or more cadences). That’s not the case here. This SC (or C) theme starts out as a confident sentence, with presentation αα‎′ (mm. 65–66, 67–68), but the sentence is cut short in m. 69a. Its bluff bravado is redirected elsewhere; the theme is cut off at the knees. (The brutality of the truncation is not adequately captured by the benign connotation of the word “retransition,” RT.) Even if we have considered m. 65 to mark a sufficient EEC, that G-major confidence cannot be reaffirmed with closing material. This leaves the exposition cadentially open. Under these circumstances m. 65’s “EEC-effect” is at best left undersecured and uncertain. And with SC’s inadequacy now demonstrated, m. 70a brings back the malevolent E minor with a vengeance. We are thrown back to m. 1 and the repeat of the exposition. In sum, this {– +} exposition (E minor, G major, i-III) has produced at best a tenuous EEC-effect, one that has proved unable to be confirmed—and in fact is lost—in the brief music that follows, producing a non-closed exposition. Given m. 65’s ambiguity, I suggest that this movement is at least in dialogue with the concept of what Sonata Theory calls a failed exposition, not at all in the sense that Beethoven has composed it poorly but rather in the sense that he has staged a musical drama of cadential ambiguity (an EEC almost but perhaps not quite attained) within an exposition that, by its end, is left open. The expositional tale told here is one in which the major mode (III), while very much present, has proven unable to produce and maintain an unequivocal, major-mode PAC close. In turn this means that the expositional hope of producing an unequivocal I:PAC/ESC in the recapitulation is cast into doubt. On the other hand, we should remember that there have also been no E-minor PACs in the exposition. A bitter struggle is brewing. But before getting to the recapitulation, we have to pass through the trials of the development. Development (mm. 70b–138) Rotation 1 (mm. 70b–107) In both the first and second endings Beethoven suppress

      We now blurrily enter the S space starting on a dominant. Commentators differ on where the Secondary theme starts due to the theme starting on a dominant following a HC; or S0/S1.0 theme in sonata theory. The S theme suggests references to P-- the book suggests one could argue that a new subrotation begins at m.35. M.35-40 seeks to secure a major mode. The book calls 35-38 S0-S1.1, and S1.2 to the Ms. 39-40 cadence. the 6/8 gets disrupted around measure 58 giving the feeling of a 3/4 displacement. Measure 60-64 are characterized as nervous twitchy syncopation. M.65 is a point of tonic arrival in G major with the production of the EEC within the essential expositional trajectory in sonata theory, although whether or not this is a structural cadence is complicated. m.65 falls short of an EEC as there is no PAC. although it is very hearable to a listener as a cadence. The book calls this a deformational EEC. The author suggests this is a failed expostiion.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript describes the pattern of relaxed selection observed at spermatogenesis genes in gorillas, presumably due to the low sperm competition associated with single-male polygyny. The analyses to detect patterns of selection are very thorough, as are the follow-up analyses to characterize the function of these genes. Furthermore, the authors take the extra steps of in vivo determination of function with a Drosophila model.

      This is an excellent paper. It addresses the interesting phenomenon of relaxation of selection as a genomic signal of reproductive strategies using multiple computational approaches and follow-up analyses by pulling in data from GO, mouse knockouts, human infertility database, and even Drosophila RNAi experiments. I really appreciate the comprehensive and creative approach to analyze and explore the data. As far as I can tell, the analyses were performed soundly and statistics are appropriate. The Introduction and Discussion sections are thoughtful and well-written. I have no major criticisms of the manuscript.

      We thank you for your kind words!

      The main area that I would suggest for improvement is in the "Caveats and Limitations" section of the Discussion. Currently, the first paragraph of this section states the obvious that genetic manipulation of gorillas is not feasible. Beyond a reminder to the reader that this was a rationale for the Drosophila work, it isn't really adding much insight. The second paragraph is a brief discussion of the directionality of change. I think it comes across as overly simplistic, with a sort of "well, we can never know" feel. Obviously, there are plenty of researchers who do model change to infer direction and causation, and there are plenty of published papers attempting to do so with respect to mating systems in primates.

      We understand these statements might seem trivial, but they are meant to fully acknowledge, particularly to non-evolutionary biologists, the fact that we can’t do the genetics to “prove” these putatively deleterious mutations really are so (hence the statement about forward/reverse genetic experiments), nor causation (since this mating system evolved once in the history of gorillas we cannot know directionality in this lineage, although we could infer it if we had species in which different stages were extant, for example).”

      I do not think the authors need to remove these paragraphs, but I do encourage them to turn the "Caveats and Limitations" section into something more meaningful by addressing limitations of the work that was actually done rather than limitations of hypothetical things that were not done. A few areas come to mind. First, the authors should discuss the effect of gene-tree vs species-tree inconsistencies in the analyses, which could affect the identification of gorilla-specific amino acid changes and/or the dN/dS estimates. Incomplete lineage sorting is very common in primates including the gorilla-chimp-human splits (Rivas-González et al. 2023). It would be nice to hear the authors' thoughts on how that might affect their analyses. Second, the dN/dS-based analyses assume the neutrality of synonymous substitutions. Of course, that assumption is not completely true; it might be true enough, and the authors should at least note it as a caveat. Third, and potentially related, is the consideration that these protein-coding genes may be functioning in other ways such as via antisense transcription. The genes under relaxed selection may be on their way to becoming pseudogenes and evolving as such at the sequence level, but many pseudogenes continue to be transcribed sense or anti-sense in a regulatory purpose. I don't think there is a way to incorporate this into the authors' analyses but it would be nice to see it acknowledged as a caveat or limitation.

      We thank you for the helpful suggestion and have added a discussion of these issues in the reworked Caveats and limitations section (lines 639 - 710).

      Reviewer #1 (Recommendations for The Authors):

      This is an excellent paper with thorough and creative approaches to address an interesting connection between genotype and phenotype. Stylistically the paper is very well written.

      We thank you for your kind words.

      Page 3: I suggest deleting the word "vaginal" so the sentence reads "... the evolution of female traits such as anatomical features that allow female control...". Most of the well-documented examples of cryptic female choice are in animals that do not have vaginas like insects, fish, and birds, including the reference given at the end of the sentence (Brennan et al. 2007 on waterfowl).

      We agree and have made this edit.

      Page 3: I would delete the words "multimale-multifemale" when discussing gorillas, to make the sentence read "Most gorillas, for example, live in groups with age-graded...". The use of "multimale-multifemale" here is not exactly wrong, but can be confusing to the reader since the authors essentially use "multimale-multifemale" as a synonym for "polygamous" in the previous paragraph.

      We agree and have made this edit.

      The writing in the Materials and Methods fluctuates between present and past tense. The authors should pick a consistent style, probably past tense by convention.

      We have edited the Materials and Methods only to use past tense.

      "Drosophila" is italicized sometimes, but not sometimes not. Make consistent.

      To ensure consistency, italics were used only when genus and species were shown together (i.e., Drosophila melanogaster).

      In the main text, a few reference typos/confusions:

      Box 1, Figure 1B caption: I believe this "Dixson, n.d." reference should be Dixson (2009), if it refers to the book (Oxford Press).

      Yes, that is the case. Thank you for having spotted this. The reference has been corrected.

      Page 21: The authors use the term "false exons" and "fake exons" in the same paragraph. Are these the same thing? If so, just use "false exons" both times.

      These are the same, we have changed fake to false.

      Page 22-23, maybe elsewhere: The Smith et al. reference includes Martin's first name.

      Thank you for bringing this issue to our attention. The reference has been corrected.

      Page 25: in the parenthetical listing of scientific species names, the word "and" should not be italicized. In this same section, there's really no reason to include "gorilla" as the subspecies. It isn't given for the other species.

      Corrected.

      Page 27: Missing period in the second paragraph after "(Guyonnet et al. 2012)".

      Corrected.

      Page 29: Should read "... available in gnomAD that would allow us to exclude..." (or possibly "... available in gnomAD that would allow the exclusion of ...").

      Corrected.

      Page 33, figure legend off Appendix Figure 1A: "gray line" not "gray liner".

      Corrected.

      Box 1, Figure 1A: This is confusing in a few ways. First, the gorilla red dot is labeled "Gorilla", but the chimpanzee and bonobo dots are not labeled. Perhaps in the legend the colors could be indicated, such as "... percentage of body mass for gorilla (red), common chimpanzee (dark blue), and bonobo (light blue)"? Secondly, the bar chart shows the testes/body mass ratio but it is not clear what they are scaled to. Should there be a second y-axis on the right side of the plot?

      The bar chart showed the testis weight/body weight ratio (log), but it is not really necessary. We have removed the bar chart and labeled chimpanzees and gorillas.

      Figure 1D: I found myself confused by the vertical label of "Percent of genes with w>1 in Gorilla". Because all genes are in the stacked histogram, my first thought was that ~99% of the genes have w>1 (gray). Would be more clear if the label was the same as 1G ("Percent of genes").

      We agree and have made this change.

      The text in the figures is extremely small. I don't know what it will look like once it is fully formatted for publication, so I'll leave those concerns to the editor/publisher.

      We will wait until the proofs to determine if this figure needs to be split into multiple figures with larger text.

      References in the reference section need a LOT of cleaning up. It does not appear that any manual editing was done. Please check for consistency in capitalization, italicization, abbreviations, missing information, etc. The level of neglect to this section is frankly unprofessional.

      I (VJL) apologize for this; it is entirely my fault. To explain but not justify, I have dyslexia, and the shifting combination of text, numbers, punctuation, fonts, and font styles makes it difficult to see the inconsistencies. To mitigate this, I use a reference manager to format references (like everyone else) and almost always have someone proofread the reference section, but I didn’t do that with this manuscript. I apologize for the oversight. My dedicated co-authors have cleaned the reference section.

      Reviewer #2 (Public Review):

      As outlined in the public review, this is a nicely executed molecular evolutionary study. The analyses and overall patterns described in gorillas appear rigorous and convincing. The fundamental limitation here is a lack of comparative context to specifically establish the connection to mating system or the uniqueness of these overall patterns to gorillas.

      We thank the reviewer for the compliments. However, there is some confusion about the hypothesis we tested. We hypothesized that genes involved in male reproductive biology would have relaxed selective constraints in gorillas because of their mating system, not that polygynous mating systems would lead to relaxed selection. While that may be true, it is not the hypothesis we tested, nor do we state that the overall pattern we observe is unique to gorillas. Our data, however, support our claims: 1) We performed an unbiased selection scan in gorillas and identified genes with K<1, an evolutionary signature of reduced selection intensity; 2) We found that those genes were enriched for male reproductive functions; and 3) Some of those genes had effects on male reproduction in both Drosophila screens and in infertile men. These are the results one would expect if our hypothesis were true.

      To partly address the concern that our results do not have a connection to mating systems or may be an overall pattern rather than a gorilla-specific one, we ran RELAX using the same dataset but in the elephant seal, another species with a highly polygynous mating system. Although elephant seals are a polygynous species, they differ from gorillas in that their spermatogenesis does not undergo persistent deterioration, but instead follows a seasonal pattern. According to the comprehensive study by Laws (The Elephant Seal (Mirounga Leonina Linn.): III. The physiology of reproduction; Scientific Reports, 15, Falkland Islands Dependencies Survey, 1956], male gamete production is upregulated during the mating season and is mostly inactive throughout the rest of the year. Of the 573 genes with K<1 in gorillas only 14 also have K<1 in elephant seals, which had 350 genes with K<1. A GO analysis of the 350 elephant seal K<1 genes does not identify enrichment in spermatogenesis-related terms. In fact, the list of GO terms is quite broad. A potential, if admittedly speculative, interpretation of these findings is that although polygynous, the selective pressure on elephant seal spermatogenesis is not relaxed (unlike in gorillas) because of the seasonal nature of their mating period. In other words, by having a temporally narrower window for reproductive success than gorillas, the selective constraint on male gametogenesis in seals is not weakened. Regardless, the low overlap in relaxed genes between the two tested polygynous species support the view that this reproductive strategy is probably associated with different evolutionary signatures in the genome (depending on the species), a likely reflection of the complex, nuanced and multi-factorial aspects of such strategies. We include this analysis in the Appendix (lines 1112 - 1132).

      While there is much that I like about the study and approach, this is a substantial shortcoming that really limits the significance of the, especially given that lineage specific patterns were also analyzed by Scally et al. (2012) over a decade ago.

      While Scally et al. (2012) reported the initial sequencing, assembly, and analyses of the gorilla genome, the method they used to characterize selective pressure on coding genes - the branch and branch-site model implemented in PAML - is misspecified to detect relaxed selection (PMID: 25540451). Under relaxed selection, the d<sub>N</sub>/d<sub>S</sub> of sites under purifying selection will move towards 1, the d<sub>N</sub>/d<sub>S</sub> of sites under positive selection will also move towards 1, and some sites will not experience a change in d<sub>N</sub>/d<sub>S</sub>. The PAML test used Scally et al. (2012) averages d<sub>N</sub>/d<sub>S</sub> across all sites, rather than having distinct rate categories for each of the three selection classes. A change in d<sub>N</sub>/d<sub>S</sub> toward 1 under the PAML model can arise because the strength of positive selection is weaker in the foreground lineage than the background lineage, even if there is still positive selection acting on some sites. Averaging across all sites also means there is little power to detect relaxed selection, even if it is relaxed selection. Furthermore, the PAML test used by Scally et al. (2012) is underpowered to detect relaxed selection because it depends on selective regimes in background species. Scally et al. (2012) also used six species, which underpowers their test of relaxation, because if one or more of those species experience an increase in their d<sub>N</sub>/d<sub>S</sub> rate, the background rate will increase giving the appearance of a decrease in the gorilla lineage even if its d<sub>N</sub>/d<sub>S</sub> rate has not changed. We elaborate on this in the Appendix section (lines 1036 - 1073). Finally the method implemented in PAML does not allow for synonymous rate variation across sites or multi-nucleotide mutations per codon, ignoring synonymous rate variation dramatically inflates the false positive rates in selection tests (PMID: 32068869) as does ignoring multi-nucleotide mutations (PMID: 29967485 and PMID: 37395787); we have added a discussion of these issues in our Caveats and limitations section (lines 683 - 710).

      Reviewer #2 (Recommendations for The Authors):

      Specific comments

      Framing: Overall, the connection between mating system is referred in variable levels of certainty, some appropriate, others overstated. The paper title uses 'coincident' which is appropriate, but also at odds with the stronger conclusions that are emphasized throughout. Elsewhere the phrasing is much stronger (abstract, discussion) implying a direct statistical association with mating system variation that has not been established. Elsewhere the term 'association' is used in the same manner, but in instances where a statistical association is tested and demonstrated (tests of enrichment, etc).

      We are unsure why the Reviewer considers our claims overstatements. The patterns of molecular evolution we found are ‘associated,’ and 'coincident with,' and we believe our results are ‘compelling’. Our tests for relaxed and positive selection are statistically associated with a polygynous social system which we a priori hypothesized. We have taken care to ensure a more consistent framing of this connection throughout the manuscript to avoid potential misinterpretations of causality.

      Page 7, elsewhere- It is essential to compare the reported patterns (percentage of relaxed genes in gorilla, patterns of enrichment, etc) to other primate lineages to identify if this number is enriched due to mating system or if these patterns are unusually for sperm genes across mammals. The implication here and throughout is that the specific pattern reflects specific aspects of gorilla mating biology, but this is never established. Additionally, it would be interesting to know the relative number of genes under positive selection across species (or across great apes).

      We agree that if we were using a PAML-like approach that these controls would be informative. But with the RELAX method the foreground K is compared to the background K, K only becomes significantly less than one if there is relaxing in the intensity of selection in the foreground. If these patterns were common to sperm genes across mammals the background and foreground K would not be significantly different. Our a priori hypothesis was that genes related to male reproductive biology would show evidence of a decrease in the intensity of selection (both positive and purifying), which we tested and found to be true. In this regard, we can conclude that the gorilla mating system is associated with patterns of molecular evolution in the species’ genome.

      While we too would find it interesting to know the relative number of genes under positive selection across species (or across great apes), that is not the study we performed and is beyond the scope of this one (and we only identified 96 genes that were positively selected in gorilla suggesting that few genes are positively selected across species).

      Page 8, bottom, elsewhere- "13,491 background set" elsewhere this is 13,310 (abstract). The number of genes here is different, and the set seems to change across multiple parts of the paper without explanation. This could be a simple typo, however, it may affect statistical analysis if the problem is widespread, especially when assessing enrichment of (presumably) small sets of genes.

      This is partly true and partly a typo. We generated 13,491 alignments, 13,310 of which had HUGO gene symbols. These 13,310 genes were used in all subsequent studies. We have re-written the text to clarify this point, and have added a statement: “We thus generated a dataset of 13,491 orthologous coding gene alignments from the genomes of 261 Eutherian mammals, corresponding to 62.7% of all protein-coding genes in the gorilla genome. Of the 13,491 alignments, 13,310 had an identifiable HUGO gene symbol and were used in all subsequent analyses (lines 158 - 162).”

      Related to this, it is difficult to determine how many genes these GO associations are based on. Even small numbers of genes can result in very significant results with these tests. How many genes are these associations based on? This connection is a key component of the overall narrative that changes in sperm competition have a large effect on genome-wide shifts.

      All analyses are based on the 13,310 genes with identifiable HUGO gene symbols, including over-representation analyses (ORA). Our dataset submitted with this manuscript includes these 13,310 genes (as well as the genes with K<1 and K>1). The number of genes used as the foreground is the 578 with K<1, these genes are given in Figure 1 – source data 3. The minimum number of genes annotated in a GO or pathway term was 3. While it is unlikely that statistically significant GO term enrichments result from a few genes annotating to each term, that scenario would produce small P-values, the false discovery rate would be high and readers can decide what false discovery they are willing to accept.

      How many of these 578 genes are plausibly related to reproduction? Apologies if I missed this detail, but Figure 3 does not convey this. Could you speak to this directly in the text and include a table or supplemental table of the GO terms to show the differences in enrichment between classes of genes, and counts per term?

      These data are included in Figure – 3 source data 1.

      One of the key results is the relative frequency of relaxed constraint versus positive selection. This is expected on some level as the form of recurrent positive directional selection detected with these models is usually relatively rare. However, it is not at all clear that it is rarer in gorillas versus other mammals, as implied.

      Our comparison of relaxed constraint to positive selection was to explore if more genes experienced one pattern of molecular evolution or the other within gorillas, we do not imply that it is rarer in gorillas than in other mammals.

      Likewise, I was wondering how the dataset itself may be biased toward this result. If I understand correctly, you are requiring very high levels of conservation (251/261 genes) for inclusion in the dataset, resulting in ~60% of all gorilla genes being included. Rapidly evolving genes that are targets of recurrent positive selection often also tend not be highly conserved across such a deep phylogenetic sample. It would be good to acknowledge this potential bias when implying meaning to the differences in relative rates of the two forms of selection.

      Our results are unlikely to be subject to this bias. The RELAX test relies on accurately estimating K in background lineages, which requires that we include as many species as possible. The tradeoff is a reduction in the number of genes included in the dataset due to evolutionary dynamics across a wide range of species. However, it's not that 40% of the genes are excluded because they are evolving so rapidly we cannot identify or align them, it mainly reflects the fact that we cannot identify the gene in 251 of the 261 species included in the dataset (due to gene loss, etc).

      Page 9 - The results here (and in Figure 3D) shows that relaxed genes are enriched broadly across spermatogenesis cell types except for Sertoli cells. But the Sertoli cells and a few non-significant cell types are the only thing to compare to. Instead, it would be interesting to identify single cell expression patterns from other tissues- or even bulk RNA as sc-RNA may be limited in the species. This would show that these genes are enriched in testis compared to other tissues, as opposed to just being broadly expressed. Additionally, the authors could compare to the other primate testis sc-RNA available in Murat et al. Without such comparisons the interpretations here seem limited.

      We did not test whether K<1 were enriched in other cell types because: 1) we had an a priori hypothesis that genes with K<1 would be enriched in cells involved in male reproduction, rather than enriched in cell types in the testis compared to any other cell type; and 2) The number of genes with K<1 is relatively small and the number of known cell-types in very large, at least one estimate points to ~400 major cell types in a higher primate (PMID: 37722043). Using a P-value of 0.05 from a hypergeometric or Fisher's exact test and a Bonferroni correction to control for multiple hypothesis testing, we would need the P-value for enrichment in any cell type to be 0.000125, which we are unlikely to achieve.

      More comprehensive functional comparisons could provide evidence that even though relaxed constraint is present in all lineages, perhaps relaxed constraints in the gorilla lineages are more related to sperm formation and function.

      The RELAX test is a relative one; while relaxed constraint may be present in other lineages, to observe a statistically significant K<1 in gorillas the degree of relaxation would have to have a greater effect size in gorilla than in other lineages.

      I was also a little unclear what to make of the interpretation of K<1 versus K >1 enrichment by cell type. The enrichment of K<1 is called out as noteworthy because this is when the spermatogenesis specific genes begin to be expressed, but then the K > 1 result is dismissed as occurring during pachytene which is a transcriptional permissive state of testis. To be clear, pachytene is also a critical checkpoint for fertility and enhanced purifying selection at this step could be reasonably interpreted as being at odds with the entire erosion of reproduction argument. This seems to be a selective interpretation for the overall narrative. Also, permissive transcription is not only limited to the pachytene stage and the relaxation of constraint concomitant with increased specificity and permissive expression during the later stages of spermatogenesis is a well-known result in mammals, and not anything that can be ascribed gorillas and their change in mating system.

      We agree with the Reviewer’s comment and have removed the K<1 versus K>1 interpretation from the manuscript.

      Page 13 - The LOF enrichment identified from this random sampling is borderline significant. An improved approach would be to perform permutations of random samplings and identify the range of significance based on 1000+ permutations.

      We have redone the burden test with population-matched groups to confirm the reliability of this association (lines 435 - 446). In addition, we now acknowledge in the Caveats and limitation section that our observations could benefit from a permutation analysis (lines 695 - 697).

      Page 17, bottom- Statements like these are overstating the correlation as the comparative analyses were not shown.

      We agree and have edited the text to avoid potential overstatements.

      This is good to include the role of female reproductive tract. Shouldn't the unbiased screen pull these out anyway? The authors did find some female GO terms enriched. What additional information or experiments would be needed to test the hypothesis of female compensation? The expectations for this should be made clearer.

      Given the nature of these putative female compensatory mechanisms (primarily acting on the oviduct and lower uterus, as speculated in lines 586 – 601), it is currently impossible to functionally test them in gorillas. The continued development of in vitro systems mimicking the female reproductive tract may allow such studies in the future.

      Page 18, middle- Pleiotropy is an important consideration and this paragraph discusses some valuable points. However, this is another section that could be improved by discussing the relaxed constraints in later spermatogenesis, which likely suggests that genes expressed in later stages are less pleiotropic and more testis- specific.

      We agree and have added a brief discussion of this in lines 619 - 622: “It is also possible that the negative consequences of deleterious pleiotropy become less pronounced at later stages of spermatogenesis as meiotic and post-meiotically expressed genes are enriched for testis-specific functions (PMID: 36544022).”

      Page 27, Bottom- The criteria for selection of genes to target here is interesting and disconnected from the claimed interpretation of the results. If you're targeting genes with reliable expression in Drosophila, it is not surprising that a percentage of them will lead to fertility loss. Shouldn't the background be a random set of testis-expressed genes? This test would show that relaxed constraint is a strong way to screen for fertility genes. Additionally, the authors previously showed that these genes were enriched in SC-rna in gorilla,- and likely other species. Suggesting that you identified genes 'lacking evidence' of a role in spermatogenesis in previous studies is misleading, when many of these genes are present in testis RNA datasets and enriched for sperm go terms. I would argue that genes found to be expressed in testis and spermatogenesis specific cell types, certainly have evidence of being involved in spermatogenesis.

      We thank you for the helpful suggestion. We have generated a new background group composed of a random set of testis-expressed genes. More specifically, by looking at previously published Drosophila testis expression data (PMID: 30249207), we randomly selected 156 genes with TPM>1 (transcript per million) and determined the percentage of them with reported spermatogenic / male fertility defects in Drosophila. We observed that 18 (11.5%) had been previously demonstrated to be functionally required for male reproductive fitness. This percentage is slightly higher than what we had previously observed for a random selection of Drosophila genes (9.6% - an update, using the latest available data, to the 7.7% reported in the original version). Nevertheless, both figures are still well below the 27.6% hit rate we found for the Drosophila orthologs of the gorilla K<1 genes. We have added this new information to the manuscript (lines 380 - 386).

      Regarding the potential correlation between expression and function in spermatogenesis, we and others have shown that the majority of the protein-coding genome is expressed during spermatogenesis in both vertebrate and invertebrate species (PMID: 39388236). Although the reasons for such widespread transcription in the male germ line are not entirely clear, it advises a cautious approach in terms of correlating expression with function. Indeed, our recent analysis of 920 genes reliably expressed in insect and mammalian spermatogenesis revealed that only 27.2% of them caused male reproductive impairment when individually silenced in the Drosophila testis (PMID: 39388236). Since genetic redundancy is a factor that needs to be taken into consideration when dealing with such a central biological process for the survival of a species, we take the more stringent approach of only considering a gene to be functionally involved in spermatogenesis if there is phenotypical evidence (from our RNAi assay or from previous publications) that its disruption is associated with spermatogenic impairment and/or abnormal fertility. We have added this clarification to the manuscript (lines 349 - 363).

      Page 17 "Our data ... suggests that gorillas may be at the lowest limit of male reproductive function that can be maintained by natural selection (at least in mammals or vertebrates)." I realize this is the speculation section, but this is a massive overstatement. There is absolutely nothing in your data or results that support this statement, nor is this supported by the extensive comparative reproductive data in mammals. For example, there are many mammalian systems that show lower metrics of reproductive function than gorillas. For example, the sperm abnormality indices in Box 1F are nowhere near as severe as found in many species that still somehow manage to reproduce.

      We agree and have edited the text to avoid potential overstatements (see above).

      Reviewer #3 (Recommendations for The Authors):

      (1) More discussion is needed as to whether their results could be explained by a reduction in effective population size in gorillas.

      Thank you for raising this important point. As you know, reduced effective population size can lead to an increased load of deleterious mutations/relaxed selection intensity. However, we do not believe that it substantially affects our observations. Indeed, relatively few genes have K<1 and those are enriched in sperm biology. Given that a reduced effective population size will plausibly increase the load of deleterious mutations and relaxed selection across many genes, it is unlikely that such a broad phenomenon would result in a specific enrichment in genes related to male reproductive biology. We have added this reasoning to the Caveats and limitations section (lines 675 - 682).

      (2) Properly controlled genetic association testing when performing a burden test is essential, and methods that allow for some variants to be associated with increased fertility should be considered. Rare variants are much more likely to show population-specific differences, and selecting humans from two potentially very different cohorts and sample sizes can easily lead to confounding. I suggest performing a principal component analysis to ascertain the degree of genetic differentiation between these cohorts, and use this to guide the selection of a subset of the control cohort as well.

      We agree and have replicated this analysis using only individuals of European descent; our conclusions have not changed but the P-values have become lower (lines 435 - 446).

      (3) Citations should also be included in Table 1, for each relevant phenotype. You may also want to consider a more general comparison of p-values and effect sizes of genome-wide association studies for human male infertility to test for an enrichment in/nearby genes showing relaxed selection along the gorilla lineage. In other words, do the relaxed genes in the gorilla lineage have an enrichment of small p-values for being associated with male infertility.

      Citations have been included in Table 1, as suggested, and the table has been updated to include the latest reported phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how βglucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We appreciate your comments on assessing the chormoatain accessibility of HSCs induced by b-glucan training, as epigenetic reprogramming is known to be one of the underlying mechanisms for trained immunity suggest by many groups including our group. To delineate the genome-wide epigenetic reprogramming induced by β-glucan (BG), we reanalyzed publicly available chromatin profiling datasets where ATACseq of HSC from control and β-glucan trained mice was performed (accession number: CRA014389). Comparative analysis revealed HSC from BG-trained mice demonstrated pronounced enrichment at promoters and distal intergenic regions—key regulatory loci governing transcriptional activity (Fig. S7A). This divergent genomic targeting was further corroborated by distinct signal distribution profiles (Fig. S7B), supporting pronounced upregulation-driven remodeling of the epigenomic landscape induced by BG treatment. Functional annotation of these epigenetically primed promoters via GO term analysis revealed significant enrichment of immune-relevant processes, including leukocyte migration, cell-cell adhesion, and chemotaxis (Fig. S7C). Consistently, KEGG pathway analysis highlighted the enrichment of signaling cascades such as chemokine signaling and cell adhesion molecules (Fig. S7D), reinforcing the involvement of BG-induced trained immunity in inflammatory and mucosal homing pathways.

      Furthermore, promoter-centric enrichment of terms related to “defense response to bacterium” (Fig. S7E) underscored the role of BG in priming antibacterial transcriptional programs, which is a crucial axis for maintaining intestinal homeostasis. Locus-specific examination of chromatin states further validated BG-induced epigenetic modifications in the upstream regions of selected target genes, including Gbp5, Gbp2 and S100a8 and Nos2 (Fig. S7F). Collectively, our integrative reanalysis demonstrates that BG reshapes the epigenomic architecture at regulatory elements, thereby orchestrating immune gene expression programs directly relevant to IBD pathophysiology and mucosal immunity. (Line 201-211)

      Reviewer 1 (Recommendations for the authors):

      (1) It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We appreciate your comments and proposed a graphical abstract as in Author response image 1.

      Author response image 1.

      (2) Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target or side-effects of β-glucan induced trained immunity. As trained immunity is known to augment inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We have discussed this potential caveat in the discussion (Lines 299-302)

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for your positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, antiinflammatory TI program is proposed.

      We appreciate your valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript as the current manuscript was aimed to build on the foundation of β-glucan-induced trained immunity established by many other groups including us and address its potential as a therapeutic approaches in the colitis setup.

      That being said, we fully agree with your comments to analyze the epigenetic profile on key pathways similar to the question raised by reviewer 1, we reanalyze the relevant public datasets and presenting summarize the finding in Supplementary Figure S7. ATAC-seq analysis further validated and provide the epigenetic basis of the enhanced inflammatory and antibacterial capacity of monocytes which are seeded back in the HSC compartment.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation. Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). These results indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1<sup>+</sup> macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      We thank the reviewer for this important point. We acknowledge that direct in vivo tracking of the adoptively transferred monocytes to confirm their homing to the colon and differentiation into specific macrophage subsets would strengthen the mechanistic link. However, due to technical limitations in reliably tracing the fate of transferred cells in our experimental setting, we were unable to provide this direct evidence. Instead, we present a strong correlative and functional evidence chain that supports the proposed model:

      (a) Following BG pretreatment, we observed a significant decrease in circulating Ly6Chi monocytes specifically at the peak of colitis (day 7, Fig. 5D), concurrent with a marked increase in monocytes/macrophages within the colonic lamina propria (Fig. 2D). This inverse relationship strongly suggests enhanced recruitment of monocytes from the blood into the inflamed colon upon BG training.

      (b) Using CX3CR1-GFP reporter mice, we found that BG pretreatment led to an increased proportion of colonic myeloid cells in an intermediate state (P5: Ly6C<sup>+</sup>MHCII<sup>+</sup>CX3CR1<sup>+</sup>, Fig. 5F). This population represents monocytes actively undergoing differentiation into intestinal macrophages, supporting the idea that BG accelerates the monocyte-to-macrophage transition in situ.

      (c) Our scRNA-seq analysis independently revealed an expansion of monocyte-derived macrophage clusters (e.g., Macro1, Macro2) in BG-treated mice, which express canonical tissue macrophage markers (including Cx3cr1) and genes associated with tissue repair (e.g., Vegfa, Fig. 4A, 5H, 5I).

      These data collectively indicate that BG-trained monocytes exhibit enhanced capacity for colonic recruitment and preferential differentiation toward reparative macrophage subsets, which aligns with the protective phenotype observed after adoptive transfer. We have explicitly noted the absence of direct fate-mapping data as a limitation in the revised Discussion and agree that future studies employing advanced tracing techniques would be valuable to definitively establish this cellular trajectory. (Line 378-380)

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show direct casual evidence via specifically depleting subcluster cells. However, the result from the monocyte adoptive transfer experiment with Ccr2 KO mice experimental strongly suggest the presence of monocytes is crucial for this protective effect. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 400-404).

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-326).

      Reviewer 2 (Recommendations for the authors):

      (1) The authors do not provide direct mechanistic evidence of TI (e.g., epigenetic and metabolic reprogramming). The absence of such data weakens the mechanistic strength of the TI claim. The authors should soften the terminology to BGinduced myeloid reprogramming suggestive of trained immunity, acknowledge, and discuss this limitation.

      We appreciate your comment highlighting the lack of direct epigenetic and metabolic assessment in our current study. Previous work from our group (S.-C. Cheng) and others has extensively documented the epigenetic and metabolic profiles of monocytes from β-glucan–trained mice, focusing primarily on inflammatory-related genes. Based on this established foundation, our current manuscript focuses on exploring the translational potential of BG-induced trained immunity.

      That said, as mentioned in our response to the identified weakness, we performed reanalysis from the public epigenetic datasets with a focus on pathways related to reparative and antibacterial functions and integrated this part in the revised manuscript (Fig S7, Lines 201-211).

      (2) CX3CR1<sup>+</sup> macrophages' role is not functionally validated. The data relies solely on scRNA-seq and cluster annotations, which are insufficient to confirm functional roles in vivo. Depletion or in vitro studies would provide stronger causal evidence. The authors should acknowledge this limitation in the Discussion.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show a direct casual evidence. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 395-404).

      (3) Rag1<sup>-/-</sup> mice retain innate lymphoid cells (ILCs), particularly ILC3, which are mucosal and produce IL-22, contributing to tissue repair (PMID: 21502992; PMID: 32187516). The potential for BG to activate ILCs remains unexplored in this study. This limits the interpretation of whether the observed protection arises from monocyte/macrophage reprogramming or is partially mediated by residual ILC activity. The authors should explicitly acknowledge this limitation and discuss the possible contribution of ILCs to the observed phenotype.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-327).

      (4) Figure 1-It would help to clarify whether a BG-only control group (without DSS) was included in the design. This would be critical to determine if BG alone alters the colon. If omitted, the authors should clearly state this and consider adding such a group in future experiments. This would help define the baseline effects of BG and support the claim that its benefits are dependent on TI (upon second challenge - DSS).

      We appreciate this valuable suggestion. While we did not perform qPCR to assess mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a dedicated BG-only control group at based line before DSStreatment (Colitis_d0). These data indicate that BG preconditioning alone does not alter the baseline expression of colon mucosal repair genes.

      (5) Figure 3 - It would strengthen the conclusions to include a vehicle-treated PBS BMT donor control group, or to state its absence. It is unclear whether the protective effect observed in recipients of BG-treated BM is due to trained immunity or to non-specific effects of transplantation, irradiation, or batch variation.

      We fully agree with your comments that it is critical to including the vehicle-treated PBS BMT control to rule out any non-specific effects induced by transplantation, irradiation or batch variation. We actually did the blank PBS transfer control everytime after mice received irradiation treatment as a control to assess the successful induction of irradiation to get rid of bone marrow from irradiated mice. Mice that receive PBS only will die after 8 days while only mice receiving either bone marrow from PBScontrol or BG-treatment group will survive. We also perform flowcytometry to examine the successful BMT transplantation (Fig S5C). We have added part regarding the vehicle-treated control for BMT in the material method section for clarification (Lines 456-466).

      (6) No gene expression or phenotypic data is provided for monocytes/macrophages in BMT recipients; therefore, it cannot be confidently stated that these cells were reprogrammed. Expression/phenotypic data should be added or discussed.

      We thank the reviewer for raising this important point. We acknowledge that a detailed transcriptomic or phenotypic analysis of donor-derived tissue-resident myeloid cells in the BMT recipients would provide the most direct evidence for their reprogrammed state.

      While our BMT study focused primarily on assessing the transferability of the protective phenotype via endpoint disease parameters and circulating immune cell composition, we present a coherent and compelling line of evidence supporting the conclusion that BG's training effect is maintained within the hematopoietic system of recipients and mediated by reprogrammed myeloid cells:

      (a) A key finding is the significant increase in the proportion of donor-derived Ly6Chi monocytes in the peripheral blood of recipients receiving BG-trained bone marrow (Fig. 3J). This is not a bystander effect but direct evidence that the BG-induced on donor hematopoietic stem/progenitor cells instructs a biased differentiation program towards a specific effector precursor population within the new host, demonstrating the functional persistence of the trained state post-transplantation.

      (b) The core of reprogramming in trained immunity lies in persistent epigenetic and functional changes. Our new analysis of public datasets (Fig. S7) confirms that BG directly reshapes the chromatin accessibility landscape in hematopoietic stem cells (HSCs), particularly at loci regulating immune and antibacterial responses. This provides the fundamental mechanism explaining how the trained phenotype is both long-lasting and transplantable: the reprogramming occurs at the progenitor level.

      (c) The most causally compelling data in our study comes from the independent adoptive transfer experiment, where transfer of purified BG-trained monocytes alone was sufficient to ameliorate colitis in recipient mice (Fig. 3K, L). This definitively proves that the trained monocytes themselves carry the protective functional program. It strongly suggests that these reprogrammed monocytes/macrophages are the likely effectors mediating protection in the BMT model.

      (d) Our interpretation aligns with well-established paradigms in the field. Precedent studies confirm that the BG-trained phenotype (e.g., enhanced cytokine potential) can be transferred via BMT or monocyte adoption. For instance, Haacke et al. (PMID: 40020679) demonstrated that splenic monocytes from BG-trained donors, when transferred into arthritic recipient mice, led to elevated inflammatory cytokine (e.g., Tnf, Il6) expression in recipient joints, directly proving the maintained functional reprogramming of trained cells in a heterologous host environment. This provides a strong precedent supporting the functional activity of transferred trained cells in our model.

      (7) The study is consistent with emerging evidence that distinct TI programs may exist depending on the stimulus and context, including immunoregulatory and tissue-reparative responses (PMID: 35133977; PMID: 31732931; PMID: 32716363; PMID: 30555483). The authors should integrate this perspective into the Discussion to acknowledge that their findings may represent one example of such context-dependent, potentially reparative TI programs. This would place the study within the growing literature describing functional heterogeneity in innate immune training.

      We appreciate this suggestion and have incorporated it into the discussion. In the revised manuscript, we discussed how our findings of BG-induced protective myeloid reprogramming align with the concept of tissue-reparative or immunoregulatory TI, which is distinct from the pro-inflammatory TI phenotypes described in other contexts. By highlighting the functional heterogeneity of innate immune training, we position our work as an example of a stimulus-specific, reparative TI program. (Lines 356-379)

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      Thank you for the positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Reviewer 3 (Recommendations for the authors):

      (1) Current best practices recommend working with raw count data when using DESeq2 to ensure statistically robust differential expression analysis between samples. However, for visualization and clustering, like heatmaps, FPKMs can be used. Could the authors explain why they have used FPKM for differential gene expression analysis?

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Minor Comment

      (1) Line 92: remove extra word "that".

      We remove the extra word “that” from Line 92 in the revised manuscript.

      (2) Line 201: please state here what "GBP" stands for, as it appears first.

      We define “GBP” as “Guanylate-Binding Protein” at its first appearance in Line 201. (Lines 213)

      (3) Line 235: consider rewriting "we analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid"; added spacing for "day 7", "which", and "the".

      We revise the sentence in Line 235 to read: “We analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid…” to improve readability. (Lines

      246-247)

      (4) Line 290: consider rewriting " as seen in conditions such as rheumatoid arthritis and ...".

      We revise Line 290 to: “as observed in conditions such as rheumatoid arthritis and…” for clarity. (Lines 301-302)

      (5) Line 375-376: please check sentence starting lower case "with minor modifications, by assessing ".

      We correct the sentence to start with a capital letter: “With minor modifications, by assessing…” (Lines 422-423)

      (6) Line 399: kindly consider adding "was" after "cDNA".

      We revise Line 399 to include “was” as suggested: “cDNA was synthesized…” (Lines 446)

      (7) Line 346-347: consider adding "which" after "monocytes": "We transferred BGpreconditioned monocytes which significantly alleviated clinical symptoms".

      We revise Line 346-347 to include “which” as suggested for grammatical clarity. (Lines 385-386)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992) and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We softened this term in our revision to “nearly parallel to the microtubule” (Line 464). In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We acknowledge that our treatment of kinesin-3 was confusing. In response, we deleted any reference to kinesin-3 catch-bond in the Results section, and restricted it to the Discussion where it is interpretation. In Line 635 in the Discussion, we softened the statement of catch-bond activity to “…all three dominant kinesin transport families display catch-bond like behavior at stall…”. We acknowledge that, classically, the catch/slip bond nomenclature refers to simple protein-protein interactions and is easier to interpret there. However, the term ‘catch-bond’ has been used in the literature for myosin, dynein and kinesin, and thus we feel that it is sufficiently established to use it here.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution, and we calculated a corrected kinesin-3 stall duration due to these undetected slips. This data and analysis are included as a new Supplementary Figure S8. In the main text on Lines 283-293 we included the following text:

      “It was notable that the kinesin-3 stall durations at high load are longer than the ramp durations at low load, because this indicates that the kinesin-3 off-rate slows with increasing load. However, because kinesin-3 had the most slip events at stall, we were concerned that there may be undetected slip events below the 60 nm threshold of detection that led to an overestimation of the kinesin-3 stall duration. To test this hypothesis, we plotted the distribution of kinesin-3 slip distances at stall, fit an exponential, and calculated the fraction of missed slip events (Fig. S8). From this analysis, we calculated a correction factor of 1.42 that brought the kinesin-3 stall duration down 1.33 s. Notably, this stall duration value is still well above the kinesin-3 ramp duration value of 0.75 s in Fig. 3C and thus does not qualitatively change our conclusions.”

      We thank the reviewer for this suggestion.

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point. More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We revised this sentence to the following: “In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to continue generating force after a small rearward displacement, rather than fully detaching and ‘resetting’ to zero load.” (Line 339-342)

      It should be noted that, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. To address this point, we added in the Discussion on lines 654-656:

      “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      To address the question of neutravidin acting as a roadblock, we did the following. Because of the sequence of injections used to assemble the tensiometer in the flow cell, there are often some residual GFP-kinesin motors that aren’t attached to DNA and thus serve as internal controls for unloaded motility on the neutravidin-functionalized Mt. We quantified the run durations of these free kinesin-GFP and found that their run duration was 0.92 s (95% CI: 0.79 to 1.04 by MEMLET). This is slightly lower but not statistically different from the 1.04 s [0.78, 1.31] on control microtubules in Fig 2A. This result is included in Figure S6 in the revised manuscript.

      We don’t have a precise estimate for the amount of neutravidin on the microtubules. Based on Fig. 3C of Korten and Diez (Korten and Diez, 2008), the reduction in the unloaded run duration that we see corresponds to a ~2% biotinylation ratio. We polymerize Mt with 10% biotinylated tubulin and add 8 nM neutravidin to the flow cell, so in principle the microtubules could be 10% biotin-streptavidin coated. However, there are a number of uncertainties that push this estimate lower – a) the precise degree of biotinylation, b) whether the %biotinylated tubulin in polymerized microtubules is lower than the mixing ratio due to unequal incorporation, and 3) what fraction of the biotinylated tubulin are occupied by the neutravidin when using this neutravidin flow-in method. Thus, our best estimate is ~2% biotin-streptavidin functionalization.

      The ramp durations in Fig. 3 provide another argument that biotinylated microtubules are not affecting the motors. Compared to unloaded durations for each motor, the kinesin-1 ramps were longer, the kinesin-2 ramps were the same, and the kinesin-3 ramps were shorter duration. That argues against any systematic effect of biotinylation on motor run durations, with the caveat that family-dependent differences could in principle be masking an effect. The fact that ramp durations aren’t systematically longer or shorter than the unloaded run durations also argues that the stalls we see, which are at the expected extension length of the dsDNA, are not caused by neutravidin roadblocks.

      The final point the reviewer brings up is whether neutravidin may be contributing to the rescues from slips events that we observe. This is difficult to fully rule out. However, because the unloaded run durations aren’t significantly altered by the biotin-streptavidin on the microtubules, we don’t expect the rescue events following a slip to be significantly affected. In principle, we could systematically increase and decrease the biotinylation and see whether the slip rescues change, but we haven’t done this.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history-independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although non-parametric methods such as K-M make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6 s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). Specifically, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections to the kinesin-3 unloaded run durations due to finite microtubule lengths. To address this point in the revision, we added the following note in Table S2: “* Because the Markov-Bayesian model, which is unaffected by left and right censoring of data gave same unloaded run durations for kinesin-3 as the MEMLET fit, we did not the kinesin-3 unloaded run durations for any right censoring due to finite microtubule lengths.” We also added the following point in the legend of Fig. S1: “A fraction of kinesin-3 unloaded run durations were limited by the length of the microtubules, but fitting to a model that took into account missed events gave a similar mean duration as an exponential fit, and so no correction was made (Table S2).”

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6 kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. We clarified this in the revised Figure S6 legend. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      We addressed this point in lines 200-212 of the revised manuscript:

      “We carried out two additional control experiments. First, to confirm that the neutravidin used to link the DNA to the microtubule wasn’t affecting kinesin motility, we analyzed the run durations of kinesin-1 motors on neutravidin-coated microtubules and found no change compared to unlabeled microtubules (Fig. S6). Second, we measured the run duration of kinesin-1 linked to a DNA tether that was not bound to the microtubule and thus was being transported (Fig. S6). The kinesin-DNA run duration was 1.40 s, longer than the 1.04 s of motors alone (Fig. 2A). We interpret this longer duration to reflect the slower diffusion constant of the dsDNA relative to the motor alone, which enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event.“

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We changed this text (Lines 265-267) to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and the model is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the Discussion of the revision, we added text to note that this behavior is indicative of an ideal bond (not a catch-bond) on Lines 480-483: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics and instead characteristic of an ideal-bond.” We also added a sentence in the Introduction highlighting this work, Lines 84-87: “Fourth, when kinesin-1 was connected to a bead through a micron-long segment of DNA and hydrodynamic forces were imposed on the bead, motor interaction times were insensitive to hindering loads up to 3 pN, indicative of an ideal-bond.”

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. We added the following paragraph in Lines 101-111 in the Geometry Consideration section of the Supplementary Methods.

      “Another consideration when comparing the DNA tensiometer to optical trap measurements is the relative stiffness of the trap and dsDNA. Optical trap stiffnesses are generally in the range of 0.05 pN/nm [12,13]. To calculate the predicted stiffness of the dsDNA spring, we computed the slope of theoretical force-extension curve in Fig. 1B. The stiffness is highly nonlinear and is <0.001 pN/nM below 650 nm extension. At the predicted stall force of 6 pN (960 nm extension), the dsDNA stiffness ~0.2 pN/nm, which is stiffer than most optical traps, but it is similar to the estimated 0.3 pN/nm stiffness of kinesin motors themselves[12,13]. An 8 nm step at this stiffness leads to a 1.6 pN jump in force, so it is reasonable to expect that motors are dynamically stepping at stall. Therefore, there is no reason to expect that stiffness differences between optical traps and the dsDNA spring are affecting the motor detachment kinetics.”

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. In response to the point from Reviewer #3, we added the following sentence on Lines 654-656: “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (e.g. ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      We agree that at first glance those jumps are puzzling. To investigate this question the first thing we did was to go back to our tensiometer dataset and look systematically at jumps for all three motors. We found roughly 4-6 large jumps like these for all three motors (kinesin-1: 250 +/- 99 nm (mean +/- SD; N=5); kinesin-2: 249 +/- 165 nm (N=6); kinesin-3: 490 +/- 231 nm (N=4)). Thus, although the apparent jumps may be more pronounced due to the specific rebinding kinetics of kinesin-2, this behavior is not unique to this motor. (Note that the motor binding position distribution in Fig. S2 is taken from initial binding positions that follow a clear period of detachment; thus, not all jumps are captured there.)

      Our interpretation is that these apparent jumps are simply a reflection of the long length and high compliance of the dsDNA tether. For instance, below 650 nm extension the stiffness, k <0.001 pN/nM (see Reviewer #3, point #1 above). Thus, we expect large fluctuations of the tethered motor when not bound to the microtubule. One reason that these events look like ‘jumps’ is that the sub-ms fluctuations during detached periods are not captured by the ~25 fps movies (40 ms frame acquisition time). Instead, the fitted Qdot position represents the average position during the acquisition window. Actually, due to these rapid fluctuations (and the limited depth of the TIRF illumination field) the position often can’t be determined during these periods of fluctuation (e.g. see gaps at ~2.5 s, 11 s and 24 s in Fig. 1F).

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      Recommendations for the authors: 

      Reviewing Editor Comments:

      The reviewers are in agreement with the motivation and approach of this study. The use of DNA tethers is an important advance in tethering motor proteins to gain insight into how motors respond to load. However, all 3 reviewers express reservations on how well the results support the claims. In particular, the use of the term catch bond was problematic, with Reviewer #2 suggesting some alternative nomenclature. Reviewer #1 expressed concern with experimental evidence for the predicted force-extension curve shown in Figure 1. I agree with the reviewers that additional experimental evidence would be required to conclude the catch-bond detachment kinetics of kinesin.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) By eye, the run lengths, e.g., of kin-1 look very long in Figure S1 ... certainly above the expected 1 µm. Please check and comment.

      We agree that the long runs do stick out by eye in this figure. To address this point, we analyzed the run lengths and run times from the kymograph shown in Fig. S1. Fitting the run duration distribution gave t = 1.31 s with a 95% CI of 0.96 to 1.67. This is slightly longer than the 1.04 s duration in Fig. 2A, but the 95% CI include this population mean, and so the S1 data are not statistically significantly different. The run time distribution from the S1 kymograph is given in Author response image 1.

      Author response image 1.

      (2) The upper right kymograph in Figure 4A does not show a motor return to the baseline. Also, the scale bars, etc., are unreadable. Please modify.

      Our purpose for showing the kymographs in Fig. 4A was to show the specific features of slips and fast and slow reattachment. Because we blew up the kymographs to show those specific features, it precluded us from showing the entire return to baseline. As suggested, we magnified the scale bars and the labels on the kymograph labels to make them readable.

      Reviewer #3 (Recommendations for the authors):

      (1) The frequent references to 95% confidence intervals disrupt the flow of the text. Perhaps the confidence intervals could be listed in a table rather than in the body of the text.

      We deleted those from the text; they are shown in Fig. 2D and listed in Table S2.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Korten, T., and S. Diez. 2008. Setting up roadblocks for kinesin-1: mechanism for the selective speed control of cargo-carrying microtubules. Lab Chip. 8:1441-1447.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-ofwar models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna y J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:63716376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaher. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:11221126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaher. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17: e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This important study functionally profiled ligands targeting the LXR nuclear receptors using biochemical assays in order to classify ligands according to pharmacological functions. Overall, the evidence is solid, but nuances in the reconstituted biochemical assays and cellular studies and terminology of ligand pharmacology limit the potential impact of the study. This work will be of interest to scientists interested in nuclear receptor pharmacology.

      Strengths:

      (1) The authors rigorously tested their ligand set in CRTs for several nuclear receptors that could display ligand-dependent cross-talk with LXR cellular signaling and found that all compounds display LXR selectivity when used at ~1 µM.

      (2) The authors tested the ligand set for selectivity against two LXR isoforms (alpha and beta). Most compounds were found to be LXRbeta-specific.

      The majority of ligands were found to be LXRβ-selective; however, examples of non-selective and LXRα-selective ligands were identified. It should be noted that this is a small compound set of literature ligands with reasonable structural diversity.

      (3) The authors performed extensive LXR CRTs, performed correlation analysis to cellular transcription and gene expression, and classification profiling using heatmap analysis-seeking to use relatively easy-to-collect biochemical assays with purified ligand-binding domain (LBD) protein to explain the complex activity of full-length LXR-mediated transcription.

      Weaknesses:

      (1) The descriptions of some observations lack detail, which limits understanding of some key concepts.

      Changes to the submitted manuscript hopefully add clarity. Several observations reinforce aspects of the literature and are a corollary of the observation that the majority of ligands with agonist activity more strongly stabilize/induce coactivator-bound complexes with LXRβ. This results in general LXRβ selectivity for agonists and also more variability in the response of LXRα to different ligand chemotypes. The most significant observations were for partial agonists that stabilize corepressor binding, in particular of the complex with LXRα.

      (2) The presence of endogenous NR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data.

      This is generally a confounding factor for ligands with apparent antagonist activity and is a source of ambiguity in designating inverse agonists across the nuclear receptor research field. Theoretically, this could also impact weak and partial agonists; however, this requires further study.

      (3) The normalization of biochemical assay data could confound the classification of graded activity ligands.

      Normalization to TO (100%) and vehicle (0%) is applied to most data. It is not clear how this confounds data interpretation. TO is a very reliable and reproducible agonist without significant bias towards LXR isoforms.

      (4) The presence of >1 coregulator peptide in the biplex (n=2 peptides) CRT (pCRT) format will bias the LBD conformation towards the peptide-bound form with the highest binding affinity, which will impact potency and interpretation of TR-FRET data.

      Multiplex assays must be optimized to balance binding affinity of the coregulator peptides (bear in mind these are somewhat-artificial small peptide constructs that are hoped to reflect binding of the much larger coregulator protein itself). Since the dominant theory of NR tissue-selectivity is based on the cellular availability (read concentration) of coregulators, this balance exists in a cellular context.

      (5) Correlation graphical plots lack sufficient statistical testing.

      Correlations are now supported by statistical data and we have added hierarchical clustering analysis.

      (6) Some of the proposed ligand pharmacology nomenclature is not clear and deviates from classifications used currently in the field (e.g., hard and soft antagonist; weak vs. partial agonist, definition of an inverse agonist that is not the opposite function to an agonist).

      Classifications used currently in the field vary from one NR to another and the use of partial and inverse agonist, in particular, is usually qualitative, unclear, and often misleading. We expand on these classifications with respect to our use of labels to classify pCRT response to LXR ligands. In agreement with the reviewer, we have replaced IA (inverse agonist) with (RA) reverse agonist as a label specifically associated with pCRT analysis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript by Laham and co-workers, the authors profiled structurally diverse LXR ligands via a coregulator TR-FRET (CRT) assay for their ability to recruit coactivators and kick off corepressors, while identifying coregulator preference and LXR isoform selectivity.

      The relative ligand potencies measured via CRT for the two LXR isoforms were correlated with ABCA1 induction or lipogenic activation of SRE, depending on cellular contexts (i.e, astrocytoma or hepatocarcinoma cells). While these correlations are interesting, there is some leeway to improve the quantitative presentation of these correlations. Finally, the CRT signatures were correlated with the structural stabilization of the LXR: coregulator complexes. In aggregate, this study curated a set of LXR ligands with disparate agonism signatures that may guide the design of future nonlipogenic LXR agonists with potential therapeutic applications for cardiovascular disease, Alzheimer's, and type 2 diabetes, without inducing mechanisms that promote fat/lipid production.

      Strengths:

      This study has many strengths, from curating an excellent LXR compound set to the thoughtful design of the CRT and cellular assays. The design of a multiplexed precision CRT (pCRT) assay that detects corepressor displacement as a function of ligand-induced coactivator recruitment is quite impressive, as it allows measurement of ligand potencies to displace corepressors in the presence of coactivators, which cannot be achieved in a regular CRT assay that looks at coactivator recruitment and corepressor dissociation in separate experiments.

      Weaknesses:

      I did not identify any major weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Page 2. "The endogenous ligands ... activate LXR via canonical or alternate mechanisms." What is an alternate mechanism?

      Small modifications to Fig. 1 caption identify a mechanism alternative to the canonical mechanism: LXR transcriptional complexes are RXR heterodimers that can be activated by a canonical mechanism of coregulator recruitment or an alternative de-repression mechanism

      (2) Page 5: "Notably, the 25 amino acid SRC-1 peptide is the only coactivator tested for LXR binding that has the fluorophore remote from the coactivator peptide." What does this mean, and could it influence the results?

      The sentence has been expanded to clarify the meaning. Notably, the 25 amino acid SRC-1 peptide is the only coactivator, amongst those tested for LXR binding, which has the fluorophore remote from the coactivator peptide: i.e., the only coactivator tested that uses a fluorophore labeled anti-tag antibody to bind the tagged coactivator rather than a fluorophore-labeled coactivator. In methods based on fluorescent tags (CRT, TR-FRET, fluorescence polarization, etc.), a fluorophore that interacts directly with the receptor can generate a maximal signal that differs depending on this interaction: i.e. the identity of the coregulator used in CRT can influence the response. As seen in Figures 6 and S6, maximal response is dependent on ligand and coregulator.

      (3) Page 5: "The [CRT] assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." The dose-dependent activity in the CRT assays is more classically defined as a functional "potency", not "affinity".

      The text is changed to remove “measure of affinity”: The assay measures the ligand-dependent EC<sub>50</sub> for ligand-induced coactivator recruitment to LXR; the affinity of the ligand for the LXR:coregulator complex contributes to this potency

      (4) Page 5: "Perhaps surprisingly, considering the description of multiple LXR ligands as partial agonists, most agonists studied gave maximal response at the same level as T0, behaving as full agonists." Can the authors speculate as to why partial agonist activity is not observed in their CRT assays when it has been observed in CRT assays for other nuclear receptors?

      This section has been reworded and please note the apparent partial agonist activity observed in CRT assays for multiple coactivators as shown in Figures 6 and S6 (also see (2) above). Although many LXR ligands have been reported to display partial agonist activity, most agonists studied in this specific biotin-SRC-1 CRT assay, gave maximal response at the same level as T0, behaving as full agonists.

      (5) Page 5: "Conformational cooperativity of LBD residues beyond these two amino acids leads to different conformations of Leu274 and Ala275 that generally favor ligand binding to LXRβ." Where are these residues located? Why are they important?

      We have simplified this paragraph that introduces the interesting observations and interpretation of Ding et al. to illustrate potential contributions to isoform selectivity: The ligand binding pockets of the two LXR isoforms differ by only one amino acid located in helix-3. (H3: LXRα-Val263 and LXRβ-Ile277) Interestingly, correction of this difference by mutation of these residues to alanine (V263A and I277A) was observed to lower, but not to ablate isoform selectivity in reporter assays.[108] Supported by modeling studies, this observation by Ding et al. led to the suggestion that conformational cooperativity of LBD residues beyond these two amino acids, generally favors ligand binding to LXRβ. Therefore, most reported ligands, including those examined in the current work, are LXRβ-selective or non-selective.

      (6) Some correlation plots are described to show "poor" correlations without showing the underlying statistical fits. All correlation plots should show Pearson and Spearman correlation coefficients and p-values within the figures.

      This section of the manuscript has been completely reworked with full correlation analysis and stats . There is no substantive change in data interpretation.

      (7) The normalization of TR-FRET data could introduce undesired bias when comparing activities. The methods section should provide more details about normalization of CRT data, including stating whether the control compounds' activity data were collected on the same CRT 384-well plate on the same day, or different plates, or different days, etc.

      This is now clarified in SI materials and methods section. In-plate controls are always used.

      (8) The authors describe their pCRT assay as "multiplex", whereas "biplex" might be more accurate, as they only used two peptides.

      Biplex is commonly used referring to qPCR. Bio-Plex is a commercial version of an antibody assay. Duplex is obviously a term used in nucleic acid research. Therefore, multiplex is a simpler, more generic term that we feel is suitable and can be extended to add a third coregulator.

      (9) The pCRT assays use the same peptide concentrations (200 nM). However, the peptides will have different affinities for the LBD, which may bias ligand-dependent pCRT profiles. The peptide that binds with higher affinity in the absence of ligand will bias the LBD conformation and impact ligand affinity. Can the authors comment on any limitations of the pCRT approach vs. a normal CRT? Did the authors perform any optimization to see if increasing peptide concentrations (>200 nM) or having different concentrations (e.g., 400 nM SRC1 and 200 nM NCorR2) influences the pCRT data, extracted parameters, correlations, etc.?

      As we write in the Limitations section, our assays are focused on ligand-dependence, whereas other excellent studies focus more on coregulator-dependence. The length and affinity of peptide constructs varies and therefore it is important to “balance” corepressor and coactivator concentrations. The most important conclusions from our pCRT assays concern the ability of some ligands to stabilize corepressor binding in the monoplex CRT and the universal ability of coactivator complex stabilization to eject the corepressor in the multiplex assay. Furthermore, without measurements and correlations in “natural” cellular contexts, the CRT data obtained in cell-free conditions is somewhat artificial. We evaluated a range of peptide concentrations to assess signal-to-background and overall assay performance. Each new receptor added to the panel underwent rigorous optimization to establish robust and reliable assay conditions. This included identifying a suitable positive control for each receptor, determining the optimal coregulator selection and concentration, and refining other key parameters such as buffer composition and total well volume. The concentrations reported represent the optimized balance—producing a strong, reproducible signal without oversaturation or disproportionate contribution from any individual assay component.

      (10) Page 11. The authors introduce a few ligand classification terms that are not standard in the field and unclear: "soft" vs. "hard" antagonist, "weak" vs. "partial" agonist, and their definition of an inverse agonist that, in classical pharmacologic terms, should have an opposite (inverse) function to an agonist. Furthermore, the presence of endogenous LXR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data. See the following paper for an example of ligand-dependent classification and activation mechanisms when there are endogenous cellular ligands at play: https://elifesciences.org/articles/47172

      The paragraph discussing nomenclature went through many iterations of terminology and a further paragraph was removed that discussed problems with ligand classification in the broader field of NR pharmacology: this has now been added back. We apologise for not citing the excellent Strutzenberg et al. paper on RORa pharmacology, which is now included. In this paper, Griffin and co-workers also use terms that are not standard in the field, such as “silent agonist”, which covers, in part, ligands that we describe as “weak agonists”. A standard, definitive lexicon of terms across NRs is unfortunately problematic. We have added 2 paragraphs:

      The nomenclature for NR ligands often lacks precision and differs across NR classes. SERM (a subset of selective NR modulator) is used to describe varied families of ER ligands that show tissue-selective agonist and/or antagonist actions. Unfortunately, “partial agonist” is also widely used to describe SERMs, even though its use is usually pharmacologically incorrect and biased agonist may be a more accurate label.[124] The majority of reported ER ligands are SERMs, even some that cause ER degradation, because they are transcriptionally active. Consequently, the term “pure antagonist” (PA) has been used to differentiate transcriptionally null ligands[125]; although, pure antagonist/antiestrogen was originally introduced to describe antagonism of both AF1 and AF2 functions.[90]

      Elegant work by Griffin’s team on RAR-related orphan receptor C (RORɣ) is interesting, because it used a combination of HDX-MS and CRT and defined categories of RORɣ ligands.[126] In addition to full agonist, “silent agonist” was introduced to include endogenous and synthetic partial agonists; although, by definition, partial agonists should antagonize full agonists. On the antagonist side of the spectrum, “active antagonist” was used to describe ligands that reduce cellular activity to baseline; and “inverse agonist” for ligands that reduce cellular transcription below baseline and induce recruitment of corepressors. Curiously, inverse agonist has almost never been used to describe ER ligands and is used frequently for other NR ligands, mostly for ligands that reduce transcription below baseline, without any evidence for corepressor recruitment. GSK2033 and SR9238 show inverse agonist activity in cells (Figs 3, 5); however, neither is capable of recruiting SMRT2 or NCOR2 to LXR (Fig. 7).

      (11) Figure 9A and Figure S8. Could hierarchical clustering analysis be used to more rigorously compare the activities of the ligands?

      We have now added hierarchical clustering analysis (Figs 4 S4). It should be noted that the value of such an analysis is much higher when the number of ligands is increased.

      (12) How does cellular potency correlate to pCRT vs. CRT potencies? Does pCRT better explain cellular potency?

      We have added this specific correlation (multiplex CRT vs. monoplex CRT).

      (13) The authors should provide an SI table of parameters (potency values) used for correlation and heatmap analyses.

      Tables have been added to SI accordingly.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has many strengths, but can still be improved by addressing the following critiques:

      (1) I am surprised the team did not find a ligand with a higher efficacy than T0. Please would you explain why T0 seems to have maxed out ligand efficacy for both LXRalpha and LXRbeta?

      Several ligands gave superior efficacy to T0 in cell-based reporter assays and in CRT assays shown in Figures 6 and S6: AZ876, BE1218, and MK9 gave maximal response higher than that of T0.

      (2) In the subsection, "Activity and isoform selectivity of LXR ligands", you mentioned that "The assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." This is incorrect. EC50 is a measure of ligand potency, not affinity.

      See Reviewer-1 (3)

      (3) In Figure 3 it is unclear what was used to normalize the antagonist responses in Panel F. Also, I recommend changing the y-axis of Panel F to -100 to 50 to get a better view of the response.

      This has been clarified: zero is vehicle control. Change to y-axis is made.

      (4) In Figure 4, the correlation R-squared values should be presented as a Table to have a better qualitative assessment of the correlations. It is challenging to judge which correlations are better by relying only on visual inspection. I also recommend moving the two panels from Figure S3 to Figure 4 as panels E and F.

      Extensive changes to Figure 4 have been made in response to this comment and that of Reviewer 1, who wanted these values in the figures: Reviewer-1 points (6) and (12).

      (5) In Figure 5, the fold changes in panels G, H, and I could better be presented as a bar graph. Also, the cytotoxicity of ligands needs to be assessed. For instance, in BE1218, there is a sharp decrease in fold change going from ~1 uM to ~10 uM. This will also confirm if the downward trends for SR9238 and GSK2033 are "real" and not as a result of cells dying off at higher ligand concentrations.

      Across our many studies on potent NR ligands, at concentrations above 3 uM, cell growth inhibition is observed. This is true for ER ligands, such as tamoxifen, with explanations in the literature including membrane disruption and low-affinity cytoplasmic binding proteins. We include cell viability measurements in Supplemental as a specific response to the reviewer’s query. There is no loss of cell viability in HepG2 cells.

      (6) Several ligands induce recruitment of coactivators but with minimal ability to displace corepressors. Physiologically, what would be the expected effect of these ligands on LXR activity?\

      We have defined such ligands from pCRT analysis as weak agonists (WA); however, pCRT shows WA ligands induce corepressor loss in the presence of coactivator. Depending on coregulator balance and isoform expression and the importance of the derepression mechanism in a specific cell context, WA ligands might be expected to be differentiated from SA (strong agonist) ligands.

      (7) In the subsection, "synchronous coregulator recruitment by multiplex, precision CRT" you mentioned that "For LXRbeta, the correlation between SRC1 recruitment in monoplex and multiplexed CRT is good," but the data is not shown. I think it would be better to show this data for transparency.

      See query (4) and Reviewer-1. Done.

      (8) In Figure 9, Panel A, the heat map is quantitated as 0-150. Is this fold change? If so, add this label to the figure legend.

      It is Normalized Response as %, which is now added.

      (9) In Figure 9, Panel B, please explain why in all cases, CoA-bound LXR resides at a higher energy level than the CoR-bound, and the apo LXR is at a lower energy level than the CoA-bound protein. A coregulator-bound (holo) protein structure is generally a lower energy (more stable) structure than the unbound (apo) protein. The binding of a coregulator stabilizes the protein's conformation and shifts the equilibrium towards a more thermodynamically favorable state. Using the same argument, it does not make sense to me that the CoR-bound LXR is on the same energy level as the apo LXR.

      This schema reflects our observations in pCRT. No signal was observed for coactivator-bound (holo) protein in the absence of ligand; whereas, a signal was observed for corepressor-bound (holo) protein in the absence of ligand. Therefore, the CoA-bound LXR is higher energy than apo-LXR (+ unbound CoA). Conversely, the signal for CoR-bound LXR can be reduced or increased by ligands, requiring the CoA-bound LXR to be of similar energy to apo-LXR (+ unbound CoR).

      (10) In the Figure 9b caption, "measured at 1uM" pertains to the concentration of ligand or coregulator? This is unclear. You should report the concentration of both ligand and coregulator.

      Clarified in caption.

      (11) In Figure S4, signal for SR9238 shoot up to ~300 units for ligand concentrations >3 uM. Please explain what could have contributed to this anomalous activation and why this was moved to the Supplementary File and not shown in the main figure (Figure 5).

      The HepG2-SRE assay is a nano-luc reporter assay, unlike the CCF-ABCA1 that is a firefly luciferase assay. There is substantial anecdotal evidence that furimazine/nano-luc is susceptible to stabilization enhancement. The RT-PCR data presented in Fig. 5 confirms that this is an artifact for some biphenyl sulfones.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete. It could be strengthened by the use of sensitive RNA in situ hybridization approaches.

      Thank you for your valuable assessment. RNA in situ hybridization evidence has been added to the revised manuscript (Figure 5A-D) to support that GSC tumors produce BMP ligands.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what is seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment.

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells.

      (3) Appropriate use of quantification and statistics.

      We greatly appreciate your valuable comments.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc, or in a few germaria?

      This is a good question. Because the SGC phenotype depends on the presence of both germline tumor clones and out-of-niche wild-type germ cells, our quantification was restricted to germaria containing both. In 14-day-old fly ovaries, 70% of germaria (432/618) met this criterion (Line 103). Each of them contained an average of 1.5 SGCs (Figure 1K).

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      Our attempts to induce ovarian hs-FLP germline clones by heat-shocking adult flies were unsuccessful, with very few clones being observed. Therefore, we shifted our approach to an earlier developmental stage. Successful induction was achieved by subjecting late-L3/early-pupal animals to a twice-daily heatshock at 37°C for 6 consecutive days (2 hours per session with a 6-hour interval, see Lines 331-335) (Zhao et al., 2018).

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional character rization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      These 20-25% of SGCs are bamP-GFP<sup>+</sup> dad-lacZ<sup>-</sup>, not bam<sup>+</sup> dad-lacZ<sup>+</sup> (see Figure 2C and 3D). They would be cystoblast-like cells that may have initiated a differentiation program toward forming germline cysts (see Lines 122-130). The 70-75% of SGCs that have low BMP signaling exhibit GSC-like properties, including: 1) dot-like spectrosomes; 2) dad-lacZ positivity; 3) absence of bamP-GFP expression. While additional markers would be beneficial, we think that this combination of properties is sufficient to classify these cells as GSC-like.

      (4) All experiments except Figure 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than Figure 1) with hs-flp?

      Yes, we initially identified the SGC phenotype through hs-FLP-mediated mosaic analysis of bam or bgcn mutant in ovaries. However, as noted in our response to Weakness (2), this approach was very labor-intensive. Therefore, we switched to using the more convenient nos>FLP system for subsequent experiments. To our observation, there was no difference in inducing the SGC phenotype by these two approaches.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day-old adult females. What happens when they look at a young female (like 2-day-old). I assume that the nos>flp is working in larval and pupal stages, and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? Or do you see more SGCs at later time points?

      These are very good questions. The SGC phenotype was consistent over the 14-day analysis period (Figure 1J) and was specifically dependent on the presence of germline tumor clones. In 14-day-old fly ovaries, these clones were both larger and more frequent than in younger flies. This age-dependent enhancement in clone size and frequency significantly improved our quantification efficiency (see Lines 101-112).

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact the clonal analyses diagrammed in Figure 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated, so it is not possible to discern one vs two copies of GFP.

      Thank you for this valuable comment. It was also difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. In Figure 4A-F, to resolve this problem, we used a triple-color system, in which red germ cells (RFP<sup>+/+</sup> GFP<sup>-/-</sup>) are bam mutant, yellow germ cells (RFP<sup>+/-</sup> GFP<sup>+/-</sup>) are wild-type, and green germ cells (RFP<sup>-/-</sup> GFP<sup>+/+</sup>) are punt or med mutant. In Figure 4G-J, we quantified the SGC phenotype only in black germ cells (GFP<sup>-/-</sup>), which are wild-type (control) or mad mutant. In Figure 6, we quantified the SGC phenotype only in green germ cells (both GFP<sup>+/+</sup> and GFP<sup>+/-</sup>), all of which are wild-type.

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with the dpp-lacZ enhancer trap in Figure 5A, B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B)? It is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries, and yet LacZ is very faint in Figure 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significant. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues, including the ovary.

      Thank you for this critical comment. The settings of immunofluorescent staining and confocal parameters in the original Figure 5A were the same as those in 5B. To our observation, the levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results from the revised Figure 5. Instead, the HCR-FISH data have been added (Figure 5A-D) to support that bam mutant germline tumors secret BMP ligands.

      (8) In Figure 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      No. Given that bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in inducing the SGC phenotype (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would be redundant and would not alter the key conclusion of our study. Thank you for your understanding!

      Reviewer #2 (Public review):

      While the study by Zhang et al. provides valuable insights into how germline tumors can non-autonomously suppress the differentiation of neighboring wild-type germline stem cells (GSCs), several conceptual and technical issues limit the strength of the conclusions.

      Major points:

      (1) Naming of SGCs is confusing. In line 68, the authors state that "many wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology." However, bam or bgcn mutant GSCs are also referred to as "SGCs," which creates confusion when reading the text and interpreting the figures. The authors should clarify the terminology used to distinguish between wild-type SGCs and tumor (bam/bgcn mutant) SGCs, and apply consistent naming throughout the manuscript and figure legends.

      We apologize for any confusion. In our manuscript, the term "SGC" is reserved specifically for wild-type germ cells that maintain a GSC-like morphology outside the niche. bam or bgcn mutant germ cells are referred to as GSC-like tumor cells (Lines 89-90), not SGCs.

      (a) The same confusion appears in Figure 2. It is unclear whether the analyzed SGCs are wild-type or bam mutant cells. If the SGCs analyzed are Bam mutants, then the lack of Bam expression and failure to differentiate would be expected and not informative. However, if the SGCs are wild-type GSCs located outside the niche, then the observation would suggest that Bam expression is silenced in these wild-type cells, which is a significant finding. The authors should clarify the genotype of the SGCs analyzed in Figure 2C, as this information is not currently provided.

      The SGCs analyzed in Figure 2A-C are wild-type, GSC-like cells located outside the niche. They were generated using the same genetic strategy depicted in Figures 1C and 1E (with the schematic in Figure 1B). The complete genotypes for all experiments are available in Source data 1.

      (b) In Figures 4B and 4E, the analysis of SGC composition is confusing. In the control germaria (bam mutant mosaic), the authors label GFP⁺ SGCs as "wild-type," which makes interpretation unclear. Note, this is completely different from their earlier definition shown in line 68.

      The strategy to generate SGCs in Figure 4B-F (with the schematic in Figure 4A) is different from that in Figure 1C-F, H, and I (with the schematic in Figure 1B). In Figure 4B-F, we needed to distinguish punt<sup>-/-</sup> (or med<sup>-/-</sup>) with punt<sup>+/-</sup> (or med<sup>+/-</sup>) germ cells. As noted in our response to Reviewer #1’s Weakness (6), it was difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. Therefore, we chose to use the triple-color system to distinguish these germ cells in Figure 4B-F (see genotypes in Source data 1).

      (c) Additionally, bam<sup>+/-</sup> GSCs (the first bar in Figure 4E) should appear GFP<sup>+</sup> and Red>sup>+</sup> (i.e., yellow). It would be helpful if the authors could indicate these bam<sup>+/-</sup> germ cells directly in the image and clarify the corresponding color representation in the main text. In Figure 2A, although a color code is shown, the legend does not explain it clearly, nor does it specify the identity of bam<sup>+/-</sup> cells alone. Figure 4F has the same issue, and in this graph, the color does not match Figure 4A.

      The color-to-genotype relationships for the schematics in Figures 2A and 4E are provided in Figures 1B and 4A, respectively. Due to the high density of germ cells, it is impractical to label each genotype directly in the images. In contrast to Figure 4E, the colors in Figure 4F do not represent genotypes; instead, blue denotes the percentage of SGCs, and red denotes the percentage of germline cysts, as indicated below the bar chart.

      (2) The frequencies of bam or bgcn mutant mosaic germaria carrying [wild-type] SGCs or wild-type germ cell cysts with branched fusomes, as well as the average number of wild-type SGCs per germarium and the number of days after heat shock for the representative images, are not provided when Figure 1 is first introduced. Since this is the first time the authors describe these phenotypes, including these details is essential. Without this information, it is difficult for readers to follow and evaluate the presented observations.

      Thank you for this constructive suggestion. These quantification data have been added to the revised Figure 1 (Figure 1J, K).

      (3) Without the information mentioned in point 2, it causes problems when reading through the section regarding [wild-type] SGCs induced by impairment of differentiation or dedifferentiation. In lines 90-97, the authors use the presence of midbodies between cystocytes as a criterion to determine whether the wild-type GSCs surrounded by tumor GSCs arise through dedifferentiation. However, the cited study (Mathieu et al., 2022) reports that midbodies can be detected between two germ cells within a cyst carrying a branched fusome upon USP8 loss.

      Unlike wild-type cystocytes, which undergo incomplete cytokinesis and lack midbodies, those with USP8 loss undergo complete cell division, with the presence of midbodies (white arrow, Figure 1F’ from Mathieu et al., 2022) as a marker of the late cytokinesis stage (Mathieu et al., 2022).

      (a) Are wild-type germ cell cysts with branched fusomes present in the bam mutant mosaic germaria? What is the proportion of germaria containing wild-type SGCs versus those containing wild-type germ cell cysts with branched fusomes?

      (b) If all bam mutant mosaic germaria carry only wild-type GSCs outside the niche and no germaria contain wild-type germ cell cysts with branched fusomes, then examining midbodies as an indicator of dedifferentiation may not be appropriate.

      We appreciate your critical comment. bam mutant mosaic germaria indeed contained wild-type germline cysts, as evidenced by an SGC frequency of ~70%, rather than 100% (see Figures 2H, 4F, 4J, 6F, 6I, and Figure 6-figure supplement 3C). Since the SGC phenotype depends on the presence of bam or bgcn mutant germline tumors, we quantified it as “the percentage of SGCs relative to the total number of SGCs and germline cysts that are surrounded by germline tumors” (see Lines 103-108). Quantifying the SGC phenotype as "the percentage of germaria with SGCs" would be imprecise. This is because the presence and number of SGCs were variable among germaria with bam or bgcn mutant germline clones, and a small number of germaria entirely lacked these clones. The data of "SGCs per germarium with both germline clones and out-of-niche wild-type germ cells" have been added to the revised Figure 1 (Figure 1K).

      (c) If, however, some germaria do contain wild-type germ cell cysts with branched fusomes, the authors should provide representative images and quantify their proportion.

      Such germaria could be found in Figure 2G, 3B, 3C, 6D, 6E, and 6H. The percentage of germline cysts can be calculated by “100% - SGC%”.

      (d) In line 95, although the authors state that 50 germ cell cysts were analyzed for the presence of midbodies, it would be more informative to specify how many germaria these cysts were derived from and how many biological replicates were examined.

      As noted in our response to points a) and b) above, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for analyzing the phenotype. For this experiment, we examined >50 such germline cysts via confocal microscopy. As the analysis was performed on a defined cellular population, this sample size should be sufficient to support our conclusion.

      (4) Note that both bam mutant GSCs and wild-type SGCs can undergo division to generate midbodies (double cells), as shown in Figure 4H. Therefore, the current description of the midbody analysis is confusing. The authors should clarify which cell types were examined and explain how midbodies were interpreted in distinguishing between cell division and differentiation.

      We assayed for the presence of midbodies or not specifically within the wild-type germline cysts surrounded by bam or bgcn mutant tumors, not within the tumors themselves (Lines 96-97). As detailed in Lines 90-100, the absence of midbodies was used as a key criterion to exclude the possibility of dedifferentiation.

      (5) The data in Figure 5 showing Dpp expression in bam mutant tumorous GSCs are not convincing. The Dpp-lacZ signal appears broadly distributed throughout the germarium, including in escort cells. To support the claim more clearly, the authors should present corresponding images for Figures 5D and 5E, in which dpp expression was knocked down in the germ cells of bam or bgcn mutant mosaic germaria. Showing these images would help clarify the localization and specificity of Dpp-lacZ expression relative to the tumorous GSCs.

      Thank you for your constructive comment. RNA in situ hybridization data have been added to support that bam or bgcn mutant germline tumors secret BMP ligands (Figure 5A-D).

      (6) While Figure 6 provides genetic evidence that bam mutant tumorous GSCs produce Dpp to inhibit the differentiation of wild-type SGCs, it should be noted that these analyses were performed in a dpp⁺/⁻ background. To strengthen the conclusion, the authors should include appropriate controls showing [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] SGCs and [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] germ cell cysts without heat shock (as referenced in Figures 6F and 6I).

      Schematic cartoons in Figure 6A and 6B demonstrate that these analyses were performed in a dpp<sup>+/-</sup> background. Figure 6-figure supplement 1 indicates tha dpp<sup>+/-</sup> or gbb<sup>+/-</sup> does not affect GSC maintenance, germ cell differentiation, and female fly fertility. Figure 6C is the control for 6D and 6E, and 6G is the control for 6H, with quantification in 6F and 6I. We used nos>FLP, not the heat shock method, to induce germline clones in these experiments (see genotypes in Source data 1).

      (7) Previous studies have reported that bam mutant germ cells cause blunted escort cell protrusions (e.g., Kirilly et al., Development, 2011), which are known to contribute to germ cell differentiation (e.g., Chen et al., Frontiers in Cell and Developmental Biology, 2022). The authors should include these findings in the Discussion to provide a broader context and to acknowledge how alterations in escort cell morphology may further influence differentiation defects in their model.

      Thank you for teaching us! We have included the introduction of these two papers in the revised manuscript (Lines 197-199).

      (8) Since fusome morphology is an important readout of SGCs vs differentiation. All the clonal analysis should have fusome staining.

      SGC is readily distinguishable from multi-cellular germline cyst based on morphology. In some clonal-analysis experiments, fusome staining was not feasible due to technical limitations such as channel saturation or antibody incompatibility. Thank you for your understanding!

      (9) Figure arrangement. It is somewhat difficult to identify the figure panels cited in the text due to the current panel arrangement.

      The figure panels were arranged to optimize space while ensuring that related panels are grouped in close proximity for logical comparison. We would be happy to consider any specific suggestions for an alternative layout that could improve clarity.

      (10) The number of biological replicates and germaria analyzed should be clearly stated somewhere in the manuscript-ideally in the Methods section or figure legends. Providing this information is essential for assessing data reliability and reproducibility.

      The detailed quantification information is labeled directly in figures or described in figure legends, and all raw quantification data are provided in Source data 2.

      Reviewer #3 (Public review):

      Summary:

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring stem cells.

      Strengths:

      This study addresses an important biological question concerning the interaction between germline tumor cells and WT germline stem cells in the Drosophila ovary. If the findings are substantiated, they could provide valuable insights applicable to other stem cell systems.

      We greatly appreciate your valuable comments.

      Weaknesses:

      Previous work from Xie's lab demonstrated that bam and bgcn mutant GSCs can outcompete WT GSCs for niche occupancy. Furthermore, a large body of literature has established that the interactions between escort cells (ECs) and GSC daughters are essential for proper and timely germline differentiation (the differentiation niche). Disruption of these interactions leads to arrest of germline cell differentiation in a status with weak BMP signaling activation and low bam expression, a phenotype virtually identical to what is reported here. Thus, it remains unclear whether the observed phenotype reflects "direct inhibition by tumor cells" or "arrested differentiation due to the loss of the differentiation niche." Because most data were collected at a very late stage (more than 10 days after clonal induction), when tumor cells already dominate the germarium, this question cannot be solved. To distinguish between these two possibilities, the authors could conduct a time-course analysis to examine the onset of the WT GSC-like single-germ-cell (SGC) phenotype and determine whether early-stage tumor clones with a few tumor cells can suppress the differentiation of neighboring WT GSCs with only a few tumor cells present. If tumor cells indeed produce Dpp and Gbb (as proposed here) to inhibit the differentiation of neighboring germline cells, a small cluster or probably even a single tumor cell generated at an early stage might prevent the differentiation of their neighboring germ cells.

      Thank you for your critical comment. The revised manuscript now includes a time-course analysis of the SGC phenotype (Figure 1J). Our data in Figure 6 demonstrate that BMP ligands from germline tumors are required to inhibit SGC differentiation. Furthermore, we have incorporated into the manuscript the possibility that disruption of the differentiation niche may also contribute to the SGC phenotype (Lines 197-199).

      The key evidence supporting the claim that tumor cells produce Gpp and Gbb comes from Figures 5 and 6, which suggest that tumor-derived dpp and gbb are required for this inhibition. However, interpretation of these data requires caution. In Figure 5, the authors use dpp-lacZ to support the claim that dpp is upregulated in tumor cells (Figure 5A and 5B). However, the background expression in somatic cells (ECs and pre-follicular cells) differs noticeably between these panels. In Figure 5A, dpp-lacZ expression in somatic cells in 5A is clearly higher than in 5B, and the expression level in tumor cells appears comparable to that in somatic cells (dpp-lacZ single channel). Similarly, in Figure 5B, dpp-lacZ expression in germline cells is also comparable to that in somatic cells. Providing clear evidence of upregulated dpp and gbb expression in tumor cells (for example, through single-molecular RNA in situ) would be essential.

      We greatly appreciate your critical comment. In our data, the expression levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results in the revised Figure 5. RNA in situ hybridization data have been added to visualize the expression of BMP ligands within bam mutant germline tumor cells (Figure 5A-D).

      Most tumor data present in this study were collected from the bam[86] null allele, whereas the data in Figure 6 were derived from a weaker bam[BG] allele. This bam[BG] allele is not molecularly defined and shows some genetic interaction with dpp mutants. As shown in Figure 6E, removal of dpp from homozygous bam[BG] mutant leads to germline differentiation (evidenced by a branched fusome connecting several cystocytes, located at the right side of the white arrowhead). In Figure 6D, fusome is likely present in some GFP-negative bam[BG]/bam[BG] cells. To strengthen their claim that the tumor produces Dpp and Gbb to inhibit WT germline cell differentiation, the authors should repeat these experiments using the bam[86] null allele.

      Although a structure resembling a "branched fusome" is visible in Figure 6E (right of the white arrowhead), it is an artifact resulting from the cytoplasm of GFP-positive follicle cells, which also stain for α-Spectrin, projecting between germ cells of different clones (see the merged image). In both our previous (Zhang et al., 2023) and current studies, bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in its ability to block GSC differentiation and induce the SGC phenotype (Figure 1J). Given this, we believe that repeating the extensive experiments in Figure 6 with the bam<sup>Δ86</sup> allele would be scientifically redundant and would not change the key conclusion of our study.

      It is well established that the stem niche provides multiple functional supports for maintaining resident stem cells, including physical anchorage and signaling regulation. In Drosophila, several signaling molecules produced by the niche have been identified, each with a distinct function - some promoting stemness, while others regulate differentiation. Expression of Dpp and Gbb alone does not substantiate the claim that these tumor cells have acquired the niche-like property. To support their assertion that these tumors mimic the niche, the authors should provide additional evidence showing that these tumor cells also express other niche-associated markers. Alternatively, they could revise the manuscript title to more accurately reflect their findings.

      Dpp and Gbb are the key niche signals from cap cells for maintaining GSC stemness. Our work demonstrates that germline tumors can specifically mimic this signaling function, not the full suite of cap cell properties, to create a non-cell-autonomous differentiation block. The current title “Tumors mimic the niche to inhibit neighboring stem cell differentiation” reflects this precise concept: a partial, functional mimicry of the niche's most relevant activity in this context. We feel it is an appropriate and compelling summary of our main conclusion.

      In the Method section, the authors need to provide details on how dpp-lacZ expression levels were quantified and normalized.

      Because of the highly variable expression levels in terminal filament and cap cells, we have omitted the dpp-lacZ results in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Not all readers may be familiar with the nos>FLP/FRT or hs-FLP/FRT systems. It would be helpful if the authors could briefly introduce these genetic mosaic systems and explain how they were used in this study before presenting the results.

      Thank you for this constructive suggestion. Such brief introduction has been added to the revised manuscript (Lines 64-70).

      (2) Line 68-70: "Surprisingly, ...outside the niche retained a GSC-like single-germ-cell (SGC) morphology, even when encapsulated within egg chambers (Figure 1C, D, Figure 1- figure supplement 1).

      (3) The figure citation is not appropriate, as Figures 1C and 1D do not show "single germ cells (SGCs) encapsulated within egg chambers." To improve clarity, the authors could revise the sentence as follows: "Surprisingly, wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology (Figures 1C and D), even when encapsulated within egg chambers (Figure 1-figure supplement 1)." This modification would make the description consistent with the figure content and easier for readers to follow.

      Thank you for teaching us! The manuscript has been revised following this suggestion (Lines 70-73).

      (4) Line 106-110. The description is confusing. The authors state, "Under normal conditions... Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). However, Figure 2B shows the bam mutant mosaic germaria, and Figure 2C does not specify the genotypes of the germaria used for the analysis of GSCs, CBs, and SGCs. The authors should clarify the experimental conditions and genotypes corresponding to each panel. In addition, it would be more informative to indicate how many germaria these quantified GSCs, CBs, and SGCs were derived from.

      (5) Throughout the manuscript, the authors report the number of SGCs analyzed (e.g., Lines 149-151). However, it would be more informative to also indicate how many germaria these quantified SGCs were derived from. Providing this information would help readers assess the sampling size and variability across biological replicates.

      Thank you for your suggestion. As shown in Figure 2B, these wild-type (RFP-positive) GSCs and CBs were also derived from bam mutant mosaic germaria. The phrase "under normal conditions" has been deleted from the revised manuscript to prevent any potential ambiguity. Given the specificity of the SGC phenotype, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for its quantification (Lines 103-108). The data of “SGCs per germarium with both germline clones and out-of-niche wild-type germ cells” have been added to the revised Figure 1K.

      Reviewer #3 (Recommendations for the authors):

      (1) Additionally, the authors should clarify what the "red dot" signal in the GFP-positive cap cell in Figure 3 F (left panel) represents.

      The “red dot” is an asterisk that is used to mark a cap cell (Line 620).

      (2) Finally, on line 266, "bamP-GFP-positive" should be corrected to "bamP-GFP-negative."

      It should be “bamP-GFP-positive”, not “bamP-GFP-negative” (see Figure 2B).

      Reference:

      Mathieu, J., Michel-Hissier, P., Boucherit, V., and Huynh, J.R. (2022). The deubiquitinase USP8 targets ESCRT-III to promote incomplete cell division. Science 376, 818-823.

      Zhang, Q., Zhang, Y., Zhang, Q., Li, L., and Zhao, S. (2023). Division promotes adult stem cells to perform active niche competition. Genetics 224.

      Zhao, S., Fortier, T.M., and Baehrecke, E.H. (2018). Autophagy Promotes Tumor-like Stem Cell Niche Occupancy. Curr Biol 28, 3056-3064.e3053.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Lymphatic vessels drain tissue fluid, absorb lipids, and traffic immune cells. Recent studies on adaptive immunity have identified lymphatics as a potential key target to treat inflammation-associated diseases. In this context, studies on lymphatic sprouting, i.e. the process by which lymphatics expand, are timely. Although Zebrafish lymphatics are somewhat different from mammalian lymphatics, still, the zebrafish has been a useful model for the identification of the key players regulating lymphatic vessel growth, thus, presenting potential targets for pre-clinical studies.

      Woutersen et. al. have studied the shp2a and shp2b douple mutant zebrafish and identified a requirement for shp2 in lymphatic vessel formation 3-5 days post fertilization. The authors state that the shp2 is required for migration and differentiation of the future lymphatic vessels but not the formation of the venous intersegmental vessels (in contrast to other relevant genes, such as vegfr3). The phenotype is rescued by the expression of wild-type but not mutant shp2.

      Major comments:

      The authors use shp2 deleted strains, live imaging and mRNA rescue experiments. The results, as such, are convincing and the reporting is accurate, allowing reproduction of the experiments. Still, some of the conclusions are not fully backed up by the presented results and would need further experimentation as outlined below:

      1. The other "lymphatic vessel mutants", such as vegfr3, vegfc, and grb2, also cause blood vessel phenotypes, i.e. have an effect on venous intersegmental vessels. The authors state that the shp2 mutants are the first ones to have a lymphatic vessel-specific phenotype. Authors should discuss whether this is due to maternal contribution, i.e. long maternal shp2 mRNA or protein half-life? To back up the statement, authors should investigate later angiogenesis events (developmental or induced) to show that shp2 is not required. * We cannot exclude the possibility that maternally contributed Shp2 is responsible for normal venous intersegmental formation. However, this is unlikely, because at the same time, we did observe defects in lymphangiogenesis. It is unlikely that the half-life of Shp2 is regulated differentially in endothelial cells that contribute to future vISVs compared to future ISLVs.

      To show that shp2 has a lymphatic endothelium autonomous role, the authors show that the vegfc mRNA expression is not altered. Authors should quantify the in situ signals (vegfc and vegfr3) and use non-specific probes to show the level of non-specific staining. It is still possible that shp2 would have a lymphatic endothelium-independent role, for example, in Vegf-c processing. Authors should discuss this or delete shp2 in an endothelium-specific manner. Authors should also stain, use in situ hybridization or qPCR (of extracted flt4 reporter-expressing cells) to show that shp2 is expressed in lymphatic endothelial cells.

      * Expression of vegfc was assessed to establish whether loss of Shp2 affected its expression, not to show that Shp2 has a lymphatic endothelium autonomous role. In situ hybridization is semi-quantitative at best. The vegfc in situ hybridizations are similar between wild type and knock-out and do not provide an indication that vegfc expression is altered, warranting further investigation by qPCR. On the other hand, the flt4 in situ hybridizations show a clear reduction in signal in Shp2 double knockout embryos, which was confirmed by qPCR experiments (Fig. 3g). We cannot exclude the possibility that Shp2 has a role in Vegfc processing as suggested by the reviewer and we have included a statement to this effect in the Discussion of the revised version (line 411, 412). In situ hybridization patterns are not very informative for Shp2, because Shp2 is expressed in most, if not all cells, which results in rather indiscriminate expression patterns (Bonetti et al. 2014, PLoS ONE 9, e94884. doi:10.1371/journal.pone.0094884).

      Authors highlight lymphatic endothelial cells and precursors with flt4 (vegfr3) reporter. Furthermore, authors write "a pivotal role for Shp2 signaling in the migration and differentiation of lymphatic endothelial" but do not provide any evidence for the differentiation expect the presence of flt4 (vegfr3) reporter expressing cells. To use a second method for detecting lymphatic vessels and to investigate the differentiation, the authors should show and quantify Prox1 expression in PCV endothelial cells prior to sprouting and in migrating future lymphatic endothelial cells.

      * We changed “differentiation” in the title and in the abstract to “formation”, because we do not provide formal proof that Shp2 is involved in differentiation of lymphatic endothelial cells. We routinely use Tg(flt4:mCitrine; flt1:tdTomato) reporters to highlight lymphatic endothelial cells. We have also used Tg(fli1a:GFP; kdrl:mCherry) to highlight lymphatic endothelial cells. Because the signals were more robust, we mainly used the former transgenic line. We have included representative images of the Tg(fli1a:GFP; kdrl:mCherry) line in Supplementary Figure 1 as a second method for detecting lymphatic vessels. We included a statement to this effect in the text (line 182-188).

      SHP2 has not been linked to VEGFR3 earlier, but has been shown to control VEGFR2. However, it is not obvious whether SHP2 is a positive or a negative regulator of VEGFR2. Here, authors should try to stain pErk in sprouting control and shp2 deleted cells, similar to their previous study (Mauri et al. 2021), to show the effect of shp2 loss on the growth factor receptor downstream signaling.

      * We have considered staining pErk using whole mount immunohistochemistry. However, subsequent imaging of the target cells is extremely difficult, because we would be interested in a subset of endothelial cells, the ones that are sprouting. Timing is also an issue, because we would be interested to image these cells around the time they are sprouting. Only a small number of endothelial cells sprouts and these cells will be hard to discern from surrounding endothelial cells. Some of the surrounding endothelial and non-endothelial cells may express high levels of pErk as well. Hence, interpretation of the pErk immunohistochemistry data is extremely difficult. It would be interesting to use a reporter line for MAPK activation, which might allow for imaging specifically of the target cells in double or triple transgenic backgrounds, but this is beyond the scope of this paper.

      Reporting the sample numbers: In most of the experiments/figures, the authors do not have sufficient information. The number of independent experiments and biological replicates should be shown for each, even representative, experiment. Data should always be derived from more than one independent experiment.

      * We have included the number of experiments for the different experiments and we have increased the number of embryos for the different conditions to include the data of at least 8 samples for each experiment.

      Minor comments:

      P.13 rows 269-271: "In addition, we observed normal perfusion and blood flow in the established vISV connections of the ptpn11a-/-ptpn11b-/- embryos and their siblings, suggesting that Shp2 is dispensable for the formation of vISVs.". The authors should show all the data mentioned in the manuscript. If this is shown in a provided movie, please, indicate which one.

      * In the revised version, we refer to Figure 7d, where perfusion of vISVs is evident (line 278).

      Figure legend 6: change "arrow" to "arrowhead".

      * This has been corrected

      **Referee cross-commenting** No further comments

      Reviewer #1 (Significance (Required)):

      The current manuscript is focused on the characterization of the shp2 mutant embryo phenotype and the rescue experiments. Upon completion of the above-mentioned experiments, the manuscript presents shp2 as a novel regulator of lymphatic vessel formation/lymphatic endothelial cell survival. As such, this notion is quite isolated, since there is no biochemical evidence of, for example, VEGFR3-SHP2 interaction. Broader impact (and audience) would be reached if the authors could show the molecular mechanisms governed by Shp2. Now, in the absence of this data, the impact is moderate. Still, lymphangiogenesis researchers would find the results interesting, thus potentially opening new avenues.

      Reviewer's field of expertise: Lymphatic endothelium. No expertise in zebrafish.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Woutersen et al. describe the effect of single and double knockouts of the zebrafish SHP2 orthologs ptpn11a and ptpb11b. Although some effects of single deletion of ptpn11a are observed, compound deletion results in profound ablation of VEGFR3 (flt4 in zebrafish)-dependent but interestingly, not Tie1-dependent lymphangiogenesis. Rescue experiments with genes encoding WT and mutant forms of SHP2 indicate that intact SH2 domains, PTP activity, and C-terminal tyrosines are required. They also observe differential rescue by the zebrafish analogs of Noonan syndrome (NS) and Noonan syndrome with multiple lentigines (NS-ML) mutants.

      Overall, this is a comprehensive analysis of the effects of WT and mutant SHP2 in lymphatic development in zebrafish. I support its publication with minimal revisions addressing the points below.

      1) For the general reader, it would be helpful to include (in the Supplementary Materials or in Fig. 1) a diagram showing the steps in lymphatic development described in the Introduction that shows the position of the various structures that are subsequently referred to only by abbreviations.

      * In the introduction, we refer to Hogan and Schulte-Merker 2017 Dev Cell 46, 567-583, a review that shows schematics and all the abbreviations we use in our manuscript.

      2) For several figures, there is no statement of what the arrowheads and asterisks point to either in the text or figure legends (e.g. Fig. 2, Fig. 5, Fig. 7). Also, Fig. 6 has "arrowheads", not "arrows". Please check all figure legends carefully to ensure that they fully describe the results shown).

      * We have included statements of what the arrowheads and asterisks in all figures indicate in the revised version.

      3) In the legend to Fig. 1, the authors state that ptpn11a-/- embryos have a "slim" phenotype. How was this assessed-and can it be quantified?

      * We have not systematically quantified this trait of ptpn11a-/- fish and we have not studied the functional consequences, if any. This is a qualitative characteristic that is obvious when analyzing the embryos. We do not want to put much emphasis on the slim phenotype and we have removed the statement from the legend of Fig. 1 in the revised version (line 738).

      4) In the experiments shown in Fig. 6 (and Supplemental movie 1), the authors show that initial sprouting occurs in double mutant embryos, but the sprouts are unable to connect to an aiSV. There are clearly sprouts in the double mutant embryos shown, but there appear to be fewer of them. Do normal numbers of initial sprouts form?

      * Close analysis of the imaging data indicates that normal numbers of initial sprouts form in the double mutant, one sprout for each intersegmental vessel.

      5) If possible, the authors should show immunoblots for all the rescue experiments to convince the reader that each construct was expressed appropriately.

      * Whereas this is an interesting suggestion, this is technically not feasible, because the amount of material from individual embryos is not sufficient for detection of microinjected Shp2 protein by immunoblotting. In fact, only part of the embryo would be available, because a part is needed for genotyping, as we use incrosses of heterozygous fish to generate embryos for the injections. Instead, we expressed constructs encoding GFP and the autoproteolytic peptide 2A linker to the N-terminal side of Shp2a and variants. In line 121, we provide a reference to the paper where we first used this construct, which includes a schematic representation of the construct (Bonetti et al., 2014, Development 141, 1961-1970, DOI: 10.1242/dev.106310). We assessed GFP fluorescence at 1 dpf and discarded embryos that did not express GFP, thus selecting for embryos that did express Shp2 (variants).

      6) The finding of incomplete, or in the case of ptpn11D61G, lack of rescue of lymphangiogenesis by RASopathy-associated mutants is particularly interesting. Have the authors looked at why this is so-i.e., does sprouting occur in D61G-reconstituted embryos? Is migration then blocked or accelerated? Is fusion to aiSVs defective? Although not necessary for the current publication, such information would certainly strengthen the paper. Also, I am not sure that I agree with the authors' statement that the two NS-ML mutants rescue equally to WT; A462T, in particular, is at least nominally less effective and if the n was higher, it might well show statistically lower rescue. The authors should consider tempering this statement.

      * We are planning to investigate in-depth the effects of Shp2-D61G and other NS-associated genes on lymphangiogenesis, but this is beyond the scope of this paper. Here we demonstrate that Shp2 variants rescue or not, upon expression of synthetic mRNA encoding Shp2 variants by microinjection at the one-cell stage. We have tempered our statement about the NS-ML mutants in the text (line 369-372): “Both NSML variants rescued the lymphangiogenesis defects in ptpn11a-/-ptpn11b-/- embryos to the extent that there was no significant difference with their wild type and heterozygous siblings anymore (Figure 10b).”

      7) In the Discussion, the authors reference recent papers on lymphatic defects in NS patients. Although there is no harm in citing these papers, lymphatic abnormalities have been noted in NS patients since the initial descriptions of the syndrome. Either those papers or a review should be cited as well.

      * We have included a reference (line 486) to the review by Roberts et al. 2013 Lancet 381,333-342, https://doi.org/10.1016/S0140-6736(12)61023-X in addition to the recent papers we cited that report lymphatic anomalies in human NS patients, based on lymphangiograms.

      8) The authors might want to note that peripheral edema has been universally associated with SHP2 inhibitor treatment in patients.

      * It is an interesting notion that peripheral edema is the second most frequently occurring side effect in response to SHP2 inhibitor treatment in human subjects (Johnson ML et al. 2024 Mol Cancer Ther 2025;24:384–91 doi: 10.1158/1535-7163.MCT-24-0466). We have included a statement to this effect in the Discussion of the manuscript (line 423-430).

      9) Also, why do the authors think that Tie1 signaling does not require SHP2? It would be interesting to note for the reader that SHP2 has been reported to bind to activated Tie1 and discuss anything known about SHP2 requirements for Tie1 action in mammalian systems.

      * SHP2 interacts with many RTKs that are involved in many developmental processes. Zebrafish embryos lacking functional Tie1 display reduced endothelial and endocardial cell numbers and reduced heart size (Carlantoni et al. 2021 Dev Biol. 469:54-67. doi: 10.1016/j.ydbio.2020.09.008). Whereas we have not investigated this in detail, we have not observed obvious defects in cardiac development. Yet, Tie1 signaling has been implicated in lymphangiogenesis and we cannot exclude involvement of defective Tie1 signaling due to lack of functional Shp2 in the Shp2 double knockouts.

      **Referee cross-commenting** No further comments

      Reviewer #2 (Significance (Required)):

      Thie is a comprehensive study of the role of SHP2 in lymphatic development, using zebrafish as a model. Although descriptive, this paper is important because mutations in SHP2 are associated with lymphatic abnormalities and SHP2 inhibitors cause lymphedema. Also, the unique features of the zebrafish system allow the authors to define the steps and signaling pathways defective in these models.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      SHP2 is an adaptor protein that plays an important role in the RAS/MAPK pathway. Abnormal activity in this pathway has been involved in various cancer as well as developmental disorders like Noonan Syndrome. Here, the authors show the important role of Shp2 in physiological lymphatic development in zebrafish using various Shp2 mutants. This promising manuscript, however, needs some adjustments and further clarifications.

      Results section:

      • Transmitted light images of ptpn11a-/- ptpn11b-/- embryos are not consistent throughout the figures. Larvae in figure 1 is particularly severe compared to images of the same line at 5dpf in the rest of the article (ex. Supp fig1 c, Supp fig4 c&l). Authors should have a consistent representative images. Was there a range of phenotype severity in this model ? Additional phenotype details and quantifications should be included about this double knockout model.

      * We consistently observed a range of phenotypes in the double mutant embryo since the first description of the phenotype (Bonetti et al. 2014, PLoS ONE 9, e94884. doi:10.1371/journal.pone.0094884). The variation depends on the families that are being used to generate the embryos. This is why we include non-injected controls for all injection experiments. Whereas not all double homozygous embryos show edemas, edemas are representative of the phenotype.

      • Line 165-167 : "Loss of functional Shp2a in ptpn11a-/- ptpn11b+/+ embryos induced a pleiotropic phenotype from 4 days post fertilization (dpf) onwards (Figure 1a-d) and was previously shown to be embryonic lethal". Line 178 : "Wild-type siblings and single mutants showed normal lymphatic vasculature...". There is a discrepancy between these 2 sections because one of the single mutant is embryonically lethal. What was the cause of lethality in this model and was it vascular-related ? Could the authors provide more detail about that ?

      * In our view, there is no discrepancy between these sections. The ptpn11a-/-ptpn11b+/+ embryos start to show a morphological phenotype at 4 dpf, but lymphangiogenesis is normal in these embryos. The embryos lacking functional Shp2a do not survive long after reaching 5 dpf and we have never obtained adult ptpn11a-/- fish. Hence, Shp2a is required for normal zebrafish embryogenesis, but lymphangiogenesis is only impaired in embryos lacking all Shp2. We have not investigated lethality of ptpn11a-/-ptpn11b+/+ embryos or larvae in detail, but the absence of a functional swim bladder (Fig. 2c) is likely causing lethality. We have no indication that lethality was vascular related.

      • Authors managed to create various mutant zebrafish model crossed with the double transgenic flt4:mCitrine;flt1:tdTomato. In the double mutant, it is surprising to see an important decrease in the tdTomato arterial expression. Please choose a more representative image or add further explanations.

      * The tdTomato signal in this particular experiment is reduced in the double mutant compared to the other genotypes we show here. We believe that by coincidence the embryo in Figure 2d is heterozygous for tdTomato, whereas the other embryos are homozygous. The conclusion of this experiment is not affected by this apparent difference in expression: double homozygous embryos lack the lymphatic vasculature.

      • Authors had shown clear defects in the zebrafish model in figure 1. It is confusing since zebrafish were imaged at 4dpf (line 176) but figure 2 shows images at 4dpf whereas the TD is fully visible and developed at 5dpf. Authors should correct that or show both set of images at 4 and 5 dpf (one can be placed in supplementary). Also, text refers the presence of TD at 5 dpf (line 184-185) and correlated quantification (figure 2e) whereas images from figure 2 are from 4dpf fish.

      * The thoracic duct is detectable in all segments of zebrafish embryos at 4 dpf (Fig. 2a). Morphological defects do not necessarily correlate with defective development of the thoracic duct. However, severe edemas in the double knockouts distort the vasculature and/or interfere with imaging of the thoracic duct and therefore we assessed the presence of the thoracic duct at 4 dpf. Line 193 – the quantifications were done using embryos at 4 dpf. We have corrected this mistake in the text of the revised version.

      • Line 167 & 173: authors mentioned embryonically lethal model without explaining how old the larvae were, could you please add the information.

      * The term “embryonic lethal” is technically not correct, because the embryos do not die in significant numbers before they reach 5 dpf. We have rephrased this to “lethal after the embryonic stage” (line 168 and 174) to be more accurate. We have not established exactly when the larvae died. Most embryos survive until 5 dpf, and we never obtained adult ptpn11a-/- fish. Establishing when the larvae die is considered an animal experiment under European law. We have chosen not to sacrifice larvae just to establish when they died.

      • Authors claim that no significant lymphatic deficiencies were observed in the single Shp2a or Shp2b alone. Is this result due to compensatory mechanisms from one isoform to the other ? Further molecular quantifications such as qPCR or Western blot could be performed in both single mutant to characterize this phenomenon.

      * Indeed, we believe that redundancy between Shp2a and Shp2b is the cause that there are no lymphatic deficiencies in the single mutants. Previously, we have shown that Shp2a and Shp2b are both functional, that both Shp2a and Shp2b rescue developmental defects and that Shp2a and Shp2b are both expressed in zebrafish embryos (Bonetti et al., 2014 PLoS ONE 9: e94884, doi:10.1371/journal.pone.0094884). Moreover, expression of either Shp2a or Shp2b rescued defects in the lymphatic vasculature in double knockout embryos (Fig. 4), which is consistent with Shp2a and Shp2b having compensatory roles.

      • Figure 3 - the authors show differential development of the head vasculature. It would be consistent with the rest of the figures to keep the same labelling and colors rather than black and white images. Authors nicely added figure 3c and 3f as great schematic, it would be helpful to highlight all of them in the zebrafish images (ex. BLEC) and add different colors of arrows for each structure. Adding single mutant images as supplementary figures would be important to confirm that there are no significant defects.

      Measurements and quantification should be performed to validate the authors claim of missing and impaired lymphatic structures. Could the authors provide details about the vascular vessels of the head, is there any consequence in the blood vasculature ?

      Additionally, using a nuclear line or a nuclear staining is essential before making any conclusion about lymphatic cell population abnormality.

      * We provide the representation as shown in Figure 3, because the contrast of the flt4:mCitrine signal is superior in this black and white representation compared to the green signal on black background representation. We have included differently colored arrowheads to indicate the different lymphatic structures and we have included representative images of the single mutants in Supplementary Figure 2.

      Our conclusions regarding the lymphatic vasculature in the head are qualitative. Most lymphatic structures are missing altogether in the double mutant, which does not allow meaningful quantification. We have not observed obvious defects in the blood vasculature in the double mutant.

      We conclude that lymphatic vasculature does not develop normally. A nuclear reporter line would be required to conclude that the number of lymphatic cells is aberrant in the double mutant, which is interesting, but is not what we conclude from these experiments.

      • Figure 4 - Authors performed rescue experiments with injection of mRNA to demonstrate that the lymphatic KO phenotype was due to the lack of functional Shp2. Successful mRNA injection and so Shp2a/Shp2b increased expression should be confirmed using qPCR to validate the experiment in the first place. Representative images correlating with quantifications should be added in the figure to support the authors results.

      * The constructs we used for the rescue experiments contain GFP fused to the autoproteolytic peptide 2A and Shp2 (variant) (Bonetti et al., 2014, Development 141, 1961-1970, DOI: 10.1242/dev.106310). These constructs drive expression of the fusion protein, which is cleaved into GFP and the Shp2 variant. Hence, expression of GFP is indicative of expression of Shp2. We routinely discarded embryos that did not express GFP at 1 dpf, thus selecting embryos that express the Shp2 (variants).

      • Figure 5 - Authors should perform experiment with a nuclear line or a nuclear staining in the fish lines before making any conclusion about the number of PL cells. Additional clarifications about the methods of quantification should be included. The authors should count the number of segments/missing segments instead. Individual values with standard deviation should be shown in the graph instead of the total mean value and standard variation and should be specified in the figure legend.

      * We agree with the reviewer that counting cells with a nuclear reporter would be superior to the way we quantified the number of PL cells in the transgenic flt4:mCitrine reporter line. It is possible that if two PL cells are very close together, they will be counted as one and hence that the numbers we provide are an underestimate of the total number of PL cells. We feel that this potential intrinsic error in counting would be the same for all conditions/ genotypes. The point of Figure 5 is that the double mutants have no PL cells and the other genotypes have similar numbers of PL cells. The potential intrinsic error would not alter the conclusion of this figure. We have included how we counted the number of PL cells in the legend to Fig. 5 and we included the standard deviation in Fig. 5e.

      • Figure 6 - Time-lapse imaging shows aberrant sprouting in the double mutant compared to control larvae. However, it is not clear if that process is just delayed or completely impaired in the mutant : time-lapses experiment should be performed in later stages. It seems that the chosen time-points images are different from the wild-type and the mutant groups, it would be best to have the same time-point to highlight the difference between the two groups. Authors affirm that vISV formation is unaffected in the double mutant larvae, however, it is hard to confirm that statement with black and white images and supplementary movies. Raw confocal images and movies should be included instead to distinguish lympho-venous and arterial structures.

      * The supplementary movies and Fig. 6, which is derived from these movies, show lack of PL cell formation in the double mutant (Fig. 6B). PL cell formation is clearly visible in wild type embryos (Fig. 6A). The sprouts that (are supposed to) give rise to PL cells are indicated with arrowheads. In both embryos, vISV formation is evident in the ISVs next to the ones where PL cells start to form, i.e. the ISVs next to the ones indicated with arrowheads. Sprouting of the endothelial cells is best observed in the time lapse movies. Whereas the exact timing may be different due to the exact conditions, the developmental timing of the sequence of images is similar between the wild type and the double mutant. The black and white representation gives higher contrast than the original fluorescent movies/ pictures, which is why we prefer this representation.

      • Figure 7 - Figure 7d does not correlate with previous imaging included in figure 2, in fact, fluorescent expressions appear inverted between the two figures. Please standardize this as they are not comparable. Quantification of the percentage of veins may not be the best parameter to investigate the normality of the vISV. Measurements of the diameter of the vISV would be more relevant. Individual values with standard deviation should be shown in the graph instead of the total mean value and standard variation and should be specified in the figure legend.

      * We believe the intensities of the signals in Figure 7d and Figure 2d may be different, because the embryo in Figure 2d may be heterozygous for the flt1:tdTomato transgene, whereas the embryo in Figure 7d is homozygous. Whereas the intensities of tdTomato are different, we clearly observe the absence of the lymphatic vasculature in Figure 2d and normal formation of vISVs in Figure 7d. We have indicated in the legend of the figure that the percentage of vISVs was determined in the number of embryos indicated and that the average percentage is plotted in the graph with the error bars indicating the standard deviation (lines 787-789).

      • Figure 8 - Authors have analyzed flt4 and vegfc expression in the mutant embryos to further characterize Lymphangiogenesis processes in the model. Fold change expression of flt4 appears to be decreased in the double mutant compared to control. It would be useful to also quantify it in uninjected and ptpn11a+/- ptpn11b-/- groups as additional appropriate control groups. Images of ptpn11a+/+ ptpn11b+/+ embryos should be added. Lack of consistency between images and quantification are confusing.

      Considering that quantifications in other figures were performed in a high number of larvae and only 3 were included in this figure in the double mutant group, it would be important to increase the number of ptpn11a-/- ptpn11b-/- embryos for this experiment. To confirm that vegfc expression is normal, fold change expression should be included as performed for flt4 expression.

      Figure number is missing.

      QPCR was done with ptpn11a+/+ptpn11-/- and ptpn11a-/-ptpn11b-/- embryos, correlating to the genotypes that were used for in situ hybridization. There were no injections performed in the framework of this experiment. Because ptpn11a+/+ptpn11b-/- embryos formed lymphatic vasculature like wild type embryos (Figure 2), we focused on embryos derived from an incross of ptpn11a+/-ptpn11-/- fish, generating ptpn11a-/-ptpn11b-/- double mutant embryos as well as ptpn11a+/+ptpn11-/- and ptpn11a+/-ptpn11b-/- siblings. In situ hybridization indicated that flt4 expression was reduced, which was confirmed by QPCR. We have not included vegfc in the QPCR experiments, because the in situ hybridization experiments did not suggest a difference in expression between the genotypes. The Figure number was added.

      • Figure 9: A different background line was used for this figure (fli1a:eGFP;kdrl:mCherry vs flt4:mCitrine;flt1:tdTomato), could the authors explain the purpose of this change and add a brief experiment to confirm the findings and phenotype do not change from one line to another. The overall purpose of this set of experiment is not very clear, maybe one or two sentences of transition as well as rephrasing parts of this section could help better understand the objective and results.

      * A different transgenic background was used for this figure. Like Tg(flt4:mCitrine;flt1:tdTomato), the Tg(fli1a:eGFP;kdrl:mCherry) line allows analysis of the lymphatic vasculature (all lymphatic vessels are labeled with eGFP, not mCherry). The results were the same between the two transgenic lines. The flt4:mCitrine signal is more robust than the flia:eGFP signal, which is why we showed images of the former in most of the figures. Representative images of the Tg(fli1a:eGFP;kdrl:mCherry) line are shown in Supplementary Figure 1. We have included a statement to explain the objective of this part (line 311-312): “We used mutants of Shp2a to assess which signaling functions of Shp2 are required for normal lymphangiogenesis.”

      • Figure 10 - Correlating zebrafish data with human disease is very interesting and highlight the importance of this work. The authors characterize the effect of NS and NSML variants on morphological and lymphatic defects in zebrafish embryos and find that these variants significantly rescued anomalies in double mutant larvae. Since these variants have opposite effects (increase signaling activity in NS and decreasing activity in NSML), authors should add a few words about how two opposite variants could have the same outcome on the zebrafish model. It may also be helpful to include information about these diseases in the introduction, including the lymphatic complications.

      * In the discussion, we included a paragraph where we discuss the effects of the NS and NSML variants and why both variants may rescue the phenotype in Shp2 double knockout embryos (lines 458-488).

      • On supplementary figure 4, double mutant expressing Shp2a A462T fish seems to develop edema. Similarly to figure 8, on all supplementary figures, data were collected from only 3 larvae per group in some groups (2 in supplementary fig 2l) is weak considering that this in vivo model allows to generate a very high number of embryos. Authors should increase the number of larvae per group to reach at least N=10/group to be more robust.

      Line 357 "... was observed more frequently in Shp2a-D61G injected double mutant embryos" this statement should be supported by the appropriate quantifications and statistical analysis.

      * We increased the number of embryos that we evaluated for each condition of the injection experiments to at least 9.

      Line 361-362 " (cf. Figure 4, 10b)" incorrect typo?

      * We have altered the statement (line 369-372) to: “Both NSML variants rescued the lymphangiogenesis defects in ptpn11a-/-ptpn11b-/- embryos to the extent that there was no significant difference with their siblings anymore (Figure 10b).

      Materials and Methods section :

      Overall, this section needs significant clarifications considering the amount of work and data that have been collected. Additionally, each reagent, material, solution, objective, need to be rigorously referenced with reference number and supplier name.

      * The catalog numbers of special reagents have been added.

      Each software should also have the version specified and be correctly cited (ex: ImageJ software version 2.14.0/1.54f. and reference: Schneider, C. A., Rasband, W. S., & Eliceiri, K. W. (2012). NIH Image to ImageJ: 25 years of image analysis. Nature Methods, 9(7), 671-675) .

      * We have indicated the version number and included a reference to the Image J software in the revised version (line 136, 137)

      • Constructs, mRNA synthesis : Were the sequences validated ? If yes, how? Please explain.

      * The constructs were validated by sequencing. The mRNA synthesis was verified by running aliquots of the mRNA on agarose gels. Based on the signal on gel, the concentration was adjusted to ensure that equal amounts of mRNA of each Shp2 variant were injected at the one-cell stage.

      Microscopy : Precise references of the objectives that were used to capture images.

      * We included references to the objectives that were used in microscopy in the Materials and Methods section.

      • Quantification: Please specify how all quantifications were made. How figure 5e and 7e were collected?

      * In the legend to Fig. 5, we indicated how the data were quantified (line 772-774): “Quantification of the number of PL cells in the trunk at 54 hpf. The number of PL cells was counted in the trunk of 54 hpf embryos over the length of 10 somites and the average number of PL cells is depicted. The error bars indicate the standard deviation..” In the legend to Fig. 7 we have included a statement how the percentage of venous ISVs was determined (line 787-789): “The percentage of veins in siblings and double homozygous mutants was determined in the indicated number of embryos (n) and is depicted. The error bars indicate the standard error.”

      Statistical analysis: Specify how data are expressed (ex. Mean {plus minus} s.e.m). The authors have made a serious confusion in choosing the statical tests. Differences between the experimental groups should be evaluated with the use of the Mann-Whitney test only when two groups are compared. Differences between three or more experimental groups (your case in this paper) should be evaluated with the use of an analysis of variance test (ANOVA), followed by a Tukey-Kramer post hoc test when the results were significant (P* We use the Mann-Whitney test to compare the groups in pairs, i.e. the ptpn11a+/+ptpn11b-/- control group compared to ptpn11a+/-ptpn11b-/-, or compared to ptpn11a-/-ptpn11b-/- double knock-out. This is reflected in the brackets we use to indicate significance or the lack thereof between samples, e.g. Figure 4.

      Suggestions on additional supplemental figures :

      • Beginning of introduction gives an impression of a review article about vascular development in larvae, authors should shorten it and/or add a supplementary schematic to support this long description.

      * We try to be complete to help the reader understand the rest of the paper better.

      • Alignment of the different proteins of the study both in human and zebrafish to show homology

      * For an alignment of the Shp2a and Shp2b proteins with human SHP2, we refer to our previously published paper: Bonetti et al., 2014, PLoS One 9, e94884, doi:10.1371/journal.pone.0094884).

      Schematic of protein domains, binding domains and location of variants

      * This is an interesting suggestion, but for space reasons, we decided not to include such schematics.

      **Referee cross-commenting** No further comments

      Reviewer #3 (Significance (Required)):

      SHP2 is an adaptor protein that plays a critical role in regulating the RAS/MAPK signaling pathway. Dysregulation of this pathway has been implicated in various cancers and developmental disorders, including Noonan Syndrome. In this study, the authors demonstrate the essential function of Shp2 in physiological lymphatic development in zebrafish by examining multiple Shp2 mutant models. This promising manuscript, however, needs some adjustments and further clarifications.

      I believe the appropriate audience for this research is specialized - primarily scientists and researchers working in basic biomedical research, particularly in molecular biology, developmental biology, and signaling pathways. The study's focus on zebrafish models and the mechanistic role of Shp2 in lymphatic development positions it within the scope of fundamental biology rather than translational or clinical application, though it has relevance to both.

      As a member of a vascular malformations laboratory, my research focuses on advancing biomedical research through an integrative approach combining in vivo research, molecular biology, translational medicine, and public health. More specifically, my current work focuses on specific genes causing complex lymphatic anomalies and drug discovery using zebrafish models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. Author response:

      General Statements

      First, we would like to thank the editor at Review Commons for the efficient handling of our manuscript. We also apologize for our delayed response.

      We would like to thank all three reviewers for their careful evaluation of our work and their constructive feedback, which will provide a valuable basis for improving the figures and the text, as described below. We expect to be able to complete the revision following the plan described below quickly.

      We would like to note that the reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the following point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this does not restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). We will revise the manuscript text accordingly to clarify this point.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And, do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not easy even for a smaller selection of sybodies. We have data that show direct binding of Smc to sybodies by various methods including ELISA, pull-downs and by biophysical methods (GCI). Initially, we omitted these data from the manuscript as we are convinced that the mapping data obtained with chimeric SMC proteins is more definitive and relevant.  During the revision we will incorporate the ELISA data showing direct binding and also indicating a lack of preference for a specific state of Smc.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the main binding site is located on the SMC coiled coils, the later scenario would likely be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This showed that they are all roughly equally expressed and that they localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We will include this data in the revised version of the manuscript.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As eluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes. We will add the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state (add Vazquez Nunez et al., 2021).”

      “ELISA data confirm that nearly all clones bind Smc-ScpAB; however, their binding shows little or no dependence on the presence of ATP or DNA.”

      Minor comments:

      (1) It was surprising that no sybodies were found that could target both bacillus and spneu Smc. For example, sybodies targeting the head regions of Smc that might work in a more universal manner. Could the authors comment on the coverage of the sybodies across the protein structure?

      It is rather common that sybodies (like antibodies and nanobodies) exhibit strong affinity differences between highly conserved proteins (> 90 % identity). The underlying reasons for such strong discrimination are i) location of less conserved residues primarily at the target protein surface and ii) the large interaction interface between sybody and target which offers multiple vulnerabilities for disturbance, in particular through bulky side chains resulting in steric clashes. Another frequently observed phenomenon is sybody binding to a dominant epitope, which also often applies to nanobodies and antibodies. A great example for this are the dominant epitopes on SARS-CoV-2 RBDs.

      (2) Growth curves (Fig. S3) show a large jump in recovery in growth under sybody induction conditions. Could the authors address this observation here and in the text?

      We suppose that this recovery represents suppressor mutants and/or (more likely) improved growth in the absence of functional Smc during nutrient limitation (see Gruber et al., 2013 and Wang et al., 2013). We will add this statement to the text.

      (3) L41- Sentence correction: Loop can be removed.

      Ah, yes, sorry for this confusing error. Thank you.

      (4) L525 - bsuSmc 'E' :extra E can be removed.

      To do. Thank you.

      (5) References need to be properly formatted.

      To do. Thank you.

      (6) The authors should add in figure legend for Fig 1i) details on representation of the purple region, and explain the grey strokes for orientation of the loop.

      To do.

      (7) How many cells were analysed in the cell biological assays? Legends should include these information.

      To Be Included.

      Reviewer #1 (Significance):

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Review: "Single Domain Antibody Inhibitors Target the Coiled Coil Arms of the Bacillus subtilis SMC complex" by Ophélie Gosselin et al, Review Commons RC-2025-03280 Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      In summary, the authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Some specific comments:

      Line 75: "likely stabilizing otherwise rare intermediates of the conformational cycle." - sorry, why is that being concluded? Why not stabilizing longer-lived oncformations?

      We will clarify this statement!

      Line 89: Sorry, possibly our lack of understanding: why first ribosome and then phage display?

      Ribosome display offers to screen around 10^12 sybodies per selection round (technically unrestricted library size), while for phage display, the library size is restricted to around 10^9 sybodies due to the fact that production of a phage library requires transformation of the phagemid plasmid into E. coli, thereby introducing a diversity bottleneck. This is why the sybody platform starts off with ribosome display. It switches to phage display from round 2 onwards because the output of the initial round of ribosome display is around 10^6 sybodies, which can be easily transferred into the phage display format. Phage display is used to minimize selection biases. For more information, please consult the original sybody paper (PMID: 29792401).

      Line 100: Why was only lethality selected? Less severe phenotypes not clear enough?

      Yes, colony size is more difficult to score robustly, as the sizes of individual transformant colonies can vary quite widely. The number of isolated sybodies was at the limit of further analysis.

      Line 106: Could it be tested somehow if convex and concave library sybodies fold in Bs?

      We did not focus on the non-functional sybody candidates and only sybodies of the loop library turned out to cause functional consequences at the cellular level. Notably, we will include gfp-imaging showing that non-lethal sybodies are expressed to similar levels that toxic sybodies. Given the identical scaffold of concave and loop sybodies (they only differ in their CDR3 length), we expect that the concave sybodies fold in the cytoplasm of B. subtilis. For the convex sybodies exhibiting a different scaffold, this will be tested.

      Line 125: Could Pxyl be repressed by glucose?

      To our knowledge and experience, repression by glucose (catabolite repression) does not work well in this context in B. subtilis.

      Line 131: The SMC replacement strain is a cool experiment and removes a lot of doubts!

      Thank you! (we agree).

      Line 141: The mapping is good and looks reliable, but looks and feels like a tour de force? Of course, some cryo-EM would have been lovely (lines 228-229 understood, it has been tried!).

      Yes, we have made several attempts at structural biology. Unfortunately, Smc-ScpAB is not well suited for cryo-EM in our hands and crystallography with Smc fragments and sybodies did not yield well-diffracting crystals.

      Line 179: Mmmh. Do we not assume DNA binding on top of the dimerised heads to open the CC (clamp)?

      We will clarify the text here.

      Line 187: Having sybodies that presumably keep the CC together (closing) and some that do not allow them to come together correctly (opening) is really cool and probably important going forward.

      Thank you!

      Figure 1 Ai is not very colour-blind friendly.

      We are sorry for this oversight. We will try to make the color scheme more inclusive. Thank you for the notification.

      Optional: did the authors see any spontaneous mutations emerge that bypass the lethal phenotype of sybody expression?

      No, we did not observe spontaneous mutations suppressing the phenotype, possibly due to the limited number of cell generations observed. We tried to avoid suppressors by limiting growth, but this may indeed be a good future approach for further fine map the binding sites and to obtain insights into the mechanism of inhibition.

      Optional: we think it would be nice to try some biochemical experiment with BMOE/cysteine-crosslinked B. subtilis Smc in the mid-region (4N or next to it) of the Smc coiled coils to try to further strengthen the story. Some of the authors are experts in this technique and strains might already exist?

      We have indeed tried to study the impact of sybody binding on Smc conformation by cysteine cross-linking. However, we were not convinced by the results and thus prefer not to draw any conclusions from them. We will add a corresponding note to the text.

      Reviewer #2 (Significance):

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Thank you!

      Reviewer #3 (Evidence, reproducibility and clarity):

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition oft he Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the „transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc „neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism ist hat the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only idenfity sybodies that bind to a rather small part oft he large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      As explained above, we are quite confident the Smc ATPase mutation did not bias the selection in an obvious way. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results much, but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then likely few other sybodies are effectively lethal in B. subtilis, with the exception of the ones isolated and characterized. We have added this notion to the manuscript. We have also tested the expression of non-lethal sybodies by gfp-tagging and imaging. These results will be included in the revision.

      Fig. 2B: is is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the „counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point and will add a corresponding comment to the text.

      Testing binding sites of sybodies tot he SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we will add ELISA results and briefly discuss grating coupled interferometry (GCI) data and pull-downs.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and will carefully rephrase this statement. Thank you.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils, which are otherwise largely neglected in the SMC literature, likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Reviewer #3 (Significance):

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

      Description of analyses that authors prefer not to carry out

      As pointed out above, there are a few minor points that we prefer not to experimentally address. In particular, we do not consider it as necessary to determine the expression levels of sybodies which were non-inhibitory. We also wish to note that we attempted to obtain structural additional biochemical data and to that end performed cryo-EM, crystallography and cysteine cross-linking experiments. Unfortunately, we did not obtain sybody complex structures and the cross-linking data were unfortunately not conclusive.  We also wish to note that the first author has finished her PhD and left the lab, which limits our capacity to add additional experiments. However, as the reviewers also pointed out, the main conclusions are well supported by the data already.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tkacik et al describe their efforts to reconstitute and biochemically characterize ARAF, BRAF, and CRAF proteins and measure their ability to be paradoxically activated by current clinical and preclinical RAF inhibitors. Paradoxical activation of MAPK signaling is a major clinical problem plaguing current RAF inhibitors, and the mechanisms are complex and relatively poorly understood. The authors utilize their preparations of purified ARAF, BRAF, and CRAF kinase domains to measure paradoxical activation by type I and type II inhibitors, utilizing MEK protein as the substrate, and show that CRAF is activated in a similar fashion to BRAF, whereas ARAF appears resistant to activation. These data are analyzed using a simple cooperativity model with the goal of testing whether paradoxical activation involves negative cooperativity between RAF dimer binding sites, as has been previously reported. The authors conclude that it does not. They also test activation of B- and CRAF isoforms prepared in their full-length autoinhibited states and show that under the conditions of their assays, activation by inhibitors is not observed. In a particularly noteworthy part of the paper, the authors show that mutation of the N-terminal acidic (NtA) motif of ARAF and CRAF to match that of BRAF enhances paradoxical activation of CRAF and dramatically restores paradoxical activation of ARAF, which is not activated at all in its WT form, indicating a clear role for the NtA motif in the paradoxical activation mechanism. Additional experiments use mass photometry to measure BRAF dimer induction by inhibitors. The mass photometry measurements are a relatively novel way of achieving this, and the results are qualitatively consistent with previous studies that tracked BRAF dimerization in response to inhibitors using other methods. Overall, the paper establishes that WT CRAF is paradoxically activated by the same inhibitors that activate BRAF, and that ARAF contains the latent potential for activation that appears to be controlled by its NtA motif. The biochemical activation data for BRAF are qualitatively consistent with previous work.

      Strengths:

      While previous studies have put forward detailed molecular mechanisms for paradoxical activation of BRAF, comparatively little is known about the degree to which ARAF and CRAF are prone to this problem, and relatively little biochemical data of any sort are available for ARAF. Seen in this light, the current work should be considered of substantial potential significance for the RAF signaling field and for efforts to understand paradoxical activation and design new inhibitors that avoid it.

      Weaknesses:

      There are, unfortunately, some significant flaws in the data analysis and fitting of the RAF activation data that render the primary conclusion of the paper about the detailed activation mechanism, namely that it does not involve negative cooperativity between active sites, unjustified. This claim is made repeatedly throughout the manuscript, including in the title. Unfortunately, their data analysis approach is overly simplistic and does not probe this question thoroughly. This is the primary weakness of the study and should be addressed. A full biochemical modeling approach that accurately captures what is happening in the experiment needs to be applied in order for detailed inferences to be drawn about the mechanism beyond just the observation of activation.

      The authors' analysis of their RAF:MEK "monomer" paradoxical activation data (Figures 1, 3, and Tables 1, 2) suffers from two fundamental flaws that render the resulting AC50/IC50 and cooperativity (Hill) parameters essentially uninterpretable. Without explaining or justifying their choice, the authors use a two-phase cooperative binding model from GraphPad Prism to fit their activation/inhibition data. This model is intended to describe cooperative ligand binding to multiple coupled sites within a preformed receptor assembly, and does not provide an adequate description of what is happening in this complicated experiment. Specifically, it has two fundamental flaws when applied to the analysis in question:

      (a) It does not account for ligand depletion effects that occur with high-affinity drugs, and that profoundly affect the shapes of the dose-response curves, which are what are being fit 

      The chosen model is one of a class of ligand-binding models that are derived by assuming that the free ligand concentration is effectively equal to the total ligand concentration. Under these conditions, binding curves have a characteristic steepness, and the presence of cooperativity can be inferred from changes in this steepness as described by a Hill coefficient. However, many RAF inhibitors, including most of the type II inhibitors in this study, bind to the dimerized forms of at least one of the RAF isoforms with ultra-high affinity in the picomolar range (particularly apparent in Figure 1 with LY inhibiting BRAF). Under these conditions, the model assumption is not valid. Instead, binding occurs in the high-affinity regime in which the drug titrates the receptor and effectively all the added drug molecules bind, so there is hardly any free ligand (see e.g. Jarmoskaite and Herschlag eLife 2020 for a full description of this "titration" regime). The shapes of the curves under these conditions reflect the total amount of RAF protein (and to some extent drug affinity), rather than the presence of cooperativity. Fitting dose response curves with the chosen model under these conditions will result in conflating binding affinity and protein concentration with cooperativity.

      (b) It does not model the RAF monomer-dimer equilibrium, which is dramatically modulated by drug binding, rendering the results RAF-concentration dependent in a manner not accounted for by the analysis.

      The chosen analysis model also fails to consider the monomer-dimer equilibrium of RAF. This has two ramifications. Since drug binding is coupled to dimerization to a very strong degree, the observed apparent affinities of drug binding (reflected in AC50 and IC50 values) are functions of the concentration of RAF molecules used in the experiment. Since dimerization affinities are likely different for ARAF, BRAF, and CRAF, the measured AC50 values also cannot be compared between isoforms. This concentration dependence is not addressed by the authors. A related issue is that the model assumes drug binding occurs to two coupled sites on preformed dimers, not to a mixture of monomers and dimers. "Cooperativity" parameters determined in this manner will reflect the shifting monomer-dimer equilibrium rather than the cooperativity within dimers. Additionally, the inhibition side of the activation/inhibition curves is driven by binding of the drug to the single remaining site on the dimer, not to two coupled sites, and so one cannot determine cooperativity values for this process in this manner.

      As a result of both of these issues, the parameters reported in the tables do not correctly reflect cooperativity and cannot be used to infer the presence or absence of negative cooperativity between RAF dimer subunits. To address these major issues, the authors would need to apply a data analysis/fitting procedure that correctly models the biochemical interactions occurring in the sample, including both the monomer-dimer equilibrium and how this equilibrium is coupled to drug binding, such as that developed in e.g., Kholodenko Cell Reports 2015. Alternatively, the authors should remove the statements claiming a lack of negative cooperativity from the manuscript and alter the title to reflect this.

      The bell-shaped dose response model that we employed models the sum of two dose-response curves – one that activates and one that inhibits. That is a simple way of capturing the essence of paradoxical activation -- the superposition of drug-induced activation at low inhibitor concentrations with inhibition at higher concentrations. That said, we agree completely with the reviewer that the model does not capture the complexity of what is happening in the experiment. We worked extensively with the Kholodenko model (which we implemented in Kintek Explorer), which accounts for the effect of drug on the monomer/dimer equilibrium and for the affinity of drug for each protomer of a dimer (and can therefore model positive or negative cooperativity as well as non-cooperative binding). We could obtain excellent fits with this model with positive cooperativity – perhaps not surprising considering that this is a 12 parameter model – with reasonable Kd values for drug binding and monomer/dimer equilibrium. However, we ultimately chose not to include this analysis when we realized that the fits were not at steady-state. The underlying Kon and Koff rates for the reasonable Kd’s for monomer/dimer formation were unreasonably slow. We could also obtain superficially reasonable fits with negative or non-cooperative binding, but close inspection revealed that they did not accurately fit the steepness of the inhibition phase of the dose-response curves for type II inhibitors. Even the Kholodenko model does not capture all the key aspects of our experiment. Perhaps most notably competition with ATP, the effect of ATP on the monomer dimer equilibrium, and the divergent conformations of the kinase required for binding ATP vs a type II inhibitor. We put some effort into explicitly including ATP in the model, but quickly decided that it was beyond our modeling expertise (and it also was not feasible to implement in Kintek explorer). In the end, we settled on the bell-shaped dose-response model because it was the simplest model that fit the data. We expect to include a supplemental figure/note in the revised manuscript to discuss our work with the Kholodenko model. We will also acknowledge the limitations of the bell-shaped dose response model.

      This reviewer is also concerned that the steepness of the inhibition phase of the curves may be the result of enzyme-titration with these tight-binding inhibitors, rather than a result of positive cooperativity. We are reasonably sure that this is not the case. The shape of these curves and the IC50/AC50 values obtained is relatively insensitive to enzyme concentration, and we will include additional data in our revision to demonstrate this. Also, the steep hill slopes are unique to the type II inhibitors, which require a distinct inactive conformation of the kinase. Type I inhibitor SB590885 is similarly potent to the type II inhibitors, but does not exhibit this effect. If we were simply titrating enzyme, we would expect to see this with SB590885 as well.

      Also, we will clarify in the revised manuscript that our interpretation of positive cooperativity of inhibition by type II inhibitors is also supported by our prior work with 14-3-3-bound RAF dimers (Tkacik et al, JBC 2025). This is a much simpler experiment, as dimers are pre-formed. We have now done a thorough study of the effect of enzyme concentration on the IC<sub>50</sub> and apparent cooperativity in dimer inhibition, which we will include in our revised manuscript. These experiments confirm that we are not in a regime where we are titrating enzyme.

      As an aside, with respect to models that incorporate free inhibitor concentration, we did try to fit our 14-3-3-bound dimer inhibition data (in Tkacik et al, JBC 2025) with the Morrison equation for tight-binding inhibitors, which does take into account free ligand concentration. The fits were not reasonable with type II inhibitors, at least in part due to the non-ATP-competitive behavior of the type II drugs. Also the Morrison equation does not model cooperativity.

      Some other points to consider

      (1) The observation that ARAF is not activated by type II inhibitors is interesting. A detailed comparison of the activation magnitudes between inhibitors and between A-, B-, and CRAF is hampered by the arbitrary baseline signal in the assay, which arises from a non-zero FRET ratio in the absence of any RAF activity. The authors might consider background correcting their data using a calibration curve constructed using MEK samples of known degrees of phosphorylation, so that they can calculate turnover numbers and fold activation values rather than an increase over baseline. This will likely reveal that the activation effects are more substantial than they appear against the high background signal.

      We will explore this for our revision.

      (2) The authors note that full-length autoinhibited 14-3-3-bound RAF monomers are not activated by type I and II inhibitors. However, since this process involves the formation of a RAF dimer from two monomers, the process would also be expected to be concentration dependent, and the authors have only investigated this at a single protein concentration. Since disassembly of the autoinhibited state must also occur before dimerization, it might be expected to be kinetically disfavored as well. Have the authors tested this?

      Good points. We have carried out this experiment at more than one enzyme concentration and differing reaction times, and also failed to see activation. However, we have not systematically explored either variable.

      (3) ATP concentration modulates activation. While this is an interesting observation, some of this analysis suffers from the same issue discussed above, of not considering high-affinity binding effects. For instance, LY is not affected by ATP concentration in their data (Figure 4D), but this is easily explained as being due to its very tight binding affinity, resulting in titration of the receptor and the shape of the inhibition curve reflecting the amount of RAF kinase in the experiment and not the effective Kd or IC50 value.

      As discussed above, we’ve convinced ourselves that we are not simply titrating enzyme. It occurred to us that such an effect could explain both the steepness of the inhibition curves with LY and other type II inhibitors and the apparent ATP-insensitivity. Our studies of concentration-dependence and the correlation of this effect with the type II binding mode argue against this possibility.

      Finally, as an overarching comment to this Reviewer and the others, we understand well that our enzyme inhibition studies (here and in Tkacik 2025) do not rise to the level of a formal demonstration of cooperative ligand binding. We envision a future study in which we could address this directly, perhaps by using single molecule fluorescence to observe on/off rates for binding of fluorescently tagged inhibitors to immobilized RAF dimers. (This is clearly beyond the scope of the present work).

      Reviewer #2 (Public review):

      This manuscript by Tkacik et al. uses in vitro reconstituted systems to examine paradoxical activation across RAF isoforms and inhibitor classes. The authors conclude that paradoxical activation can be explained without invoking negative allostery and propose a general model in which ATP displacement from an "open monomer" promotes dimerization and activation. The biochemical work is technically sound, and the systematic comparison across RAF paralogs (along with mutational/functional analysis) across inhibitor classes is a strength.

      However, the central mechanistic conclusions are overgeneralized relative to the experimental systems, and several key claims, particularly the dismissal of negative allostery and the proposed unifying model in Figure 6, are not directly supported by the data presented. Most importantly, the absence of RAS, membranes, and relevant regulatory context fundamentally limits the physiological relevance of several conclusions, especially regarding the current clinical type I.5 RAF inhibitors and paradoxical activation.

      Overall, this is a potentially valuable biochemical study, but the manuscript would benefit from more restrained interpretation, clearer framing of scope, and revisions to the model and title to better reflect what is actually tested.

      (1) A central issue is that the biochemical system lacks RAS, membranes, 14-3-3 and endogenous regulatory factors that are known to be required for paradoxical RAF and MAPK activation in cells. As previous work has repeatedly shown and the authors also acknowledge, paradoxical activation by RAF inhibitors is RAS-dependent in cells, and this dependence presumably explains why full-length autoinhibited RAF complexes are refractory to activation in the authors' assays.

      Importantly, the absence of paradoxical activation by type I.5 inhibitors in this system is therefore not mechanistically informative. Type I.5 inhibitors (e.g., vemurafenib, dabrafenib, encorafenib), but not Paradox Breakers (e.g., plixorafenib), robustly induce paradoxical activation in cells because binding of the inhibitor to inactive cytosolic RAF monomer promotes a conformational change that drives RAF recruitment to RAS in the membrane, promoting dimerization. The inability of the type 1.5 inhibitor to suppress the newly formed dimers is the basis of the pronounced paradoxical activation in cells. In the absence of RAS and membrane recruitment, failure to observe paradoxical activation in vitro does not distinguish between competing mechanistic models.

      As a result, conclusions regarding inhibitor class differences, and especially the generality of the proposed model, should be substantially tempered.

      We will emphasize the limitations of our highly simplified experimental system in the revised manuscript, and temper some of our interpretations. And while the lack of membranes/RAS/14-3-3 in our system and the lack of observed PA with type I.5 inhibitors is a limitation of our study, we disagree that it renders our study of type I.5 inhibitors mechanistically uninformative. As seen here and consistent with prior studies, the binding mode of these compounds disfavors formation of the kinase dimer. While this may be overcome by 14-3-3 binding and other effects in the cellular context, it reflects a fundamental mechanistic difference as compared with type I and type II inhibitors, which also exhibit paradoxical activation.

      (2) The authors argue that their data argue against negative allostery as a central feature of paradoxical activation. However, the presented data do not directly test negative allostery, nor do they exclude it. The biochemical assays do not recreate the cellular context in which negative allostery has been inferred. Further, structural data showing asymmetric inhibitor occupancy in RAF dimers cannot be dismissed on the basis of alternative symmetric structures alone, particularly given the dynamic nature of RAF dimers in cells.

      Most importantly, negative allostery was proposed to explain paradoxical activation by Type I.5 RAF inhibitors, yet these inhibitors do not paradoxically activate in the assays presented here. The absence of paradoxical activation in this system, therefore, cannot be used to argue against a mechanism that is specifically invoked to explain cellular behavior not recapitulated by the assay.

      To be clear, we are not dismissing the possibility of negative cooperativity. And we do not think of our model as an alternative to the negative cooperativity model – rather it is a generalization that can account for paradoxical activation by diverse inhibitor classes, irrespective of positive, negative or non-cooperative modes of inhibition. We will emphasize these points in the revised manuscript.

      If negative allostery were a requisite feature of PA, we would not expect to see PA with type II inhibitors. As discussed in our response to Reviewer 1, we see clear evidence of positively cooperative inhibition of 14-3-3-bound RAF dimers by type II inhibitors (Tkacik JBC 2025) and in the present study, we find clear paradoxical activation by type II inhibitors (and there are many reports in the literature of PA by type II inhibitors in cellular contexts).

      (3) The model presented in Figure 6 is conceptually possible but remains speculative. Key elements of the model, including RAS engagement, membrane recruitment, 14-3-3 rearrangements, and the involvement of cellular kinases and phosphatases, are explicitly absent from the experimental system. Accordingly, the model is not tested by the data presented and should not be framed as a validated or general mechanism. The figure and accompanying text should be clearly labeled as a working or conceptual model rather than a mechanistically supported conclusion.

      We will revise the text to more clearly reflect that this is a working model, and importantly, that it is based on a large literature in this area in addition to the relevant experimental work in this manuscript.

      (4) The manuscript states that type I.5 inhibitors do not induce paradoxical activation in the biochemical assay because their C-helix-out binding mode disfavors dimerization. While this is true in isolation, it overlooks the well-established fact that type I.5 inhibitors (with the exception of paradox breakers) clearly promote RAS-dependent RAF dimerization in cells. This distinction is critical and should be explicitly acknowledged when interpreting the in vitro findings.

      We will explicitly make this point in the revised manuscript.

      (5) The title suggests a general mechanism for paradoxical activation across RAF isoforms and inhibitor classes, whereas the data primarily address type I and type II inhibitors acting on isolated kinase-domain monomers. A more accurate framing would avoid the term "general" and confine the conclusions to C-helix-in (type I/II) RAF inhibitors in a reduced biochemical context.

      As noted above, and in our response to Reviewer 3 below, we will clarify the contribution of data in present manuscript to the model and that it is based more broadly on the literature on PA and our insights into RAF structure and regulation. We will also revise the title to avoid the implication that the model arises mainly from the experimental data in the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Tkacik et al. systematically characterized all three RAF kinase isoforms in vitro with all three types of RAF inhibitors (Type I, I1/2, and II) to investigate the mechanism underlying paradoxical activation.

      In this study, the authors reconstituted heterodimers of A-, B-, and C-RAF kinase domains bound to non-phosphorylable MEK1 (SASA), mimicking the monomeric auto-inhibited state of RAF. These "RAF monomers" were tested for MEK phosphorylation with an increasing concentration of all three types of RAF inhibitors (Type I, I1/2, and II). This study is reminiscent of a previous study of the same team measuring RAF kinase activity in the presence of all three types of inhibitors in the context of dimeric RAF isoforms stabilized by 14-3-3 proteins (Tkacik et al 2025 JBC). RAF monomers had little to no activity at low concentrations of inhibitors (consistent with their "monomeric state"). Addition of type I1/2 inhibitor did not induce paradoxical activation as, in this context, they do not induce RAF dimerization required for activation, as observed by MP. Addition of type I and type II inhibitors led to paradoxical activation consistent with the RAF dimerization induced by these inhibitors, as observed by MP. Interestingly, type II inhibitors induced activation only for B- and C-RAF and not A-RAF.

      At high concentrations of type II inhibitors, kinase activity is inhibited with a strong or weak positive cooperativity for BRAF and CRAF, respectively. This observation is very similar to what the authors previously observed with their dimeric RAF system. Interestingly, when the NtA motif is modified by phosphomimetic mutations in A- and C-Raf, basal kinase activity is stronger, but most importantly, inhibitor-induced paradoxical activation is much stronger with both type I and II inhibitors. This demonstrates that mutation of the NtA motif of ARAF and CRAF sensitized them to paradoxical activation by type II inhibitors.

      The authors also tested the effect of ATP in the paradoxical activation observed in their RAF "monomer" system. As previously published in their assay with 14-3-3 stabilized dimeric RAF, the authors observed an expected shift of the IC50 with Type I inhibitors, while Type II inhibitors seem to behave as a non-competitive inhibitor. The authors next reconstituted the MAP kinase pathway (with RAF monomers at the top of the phosphorylation cascade) to test paradoxical activation amplification. Again, Type I1/2 inhibitors did not induce paradoxical activation, while Type I and II inhibitors did. The authors tested the inhibitors with FL auto-inhibited RAF/MEK/14-3-3 complexes, where, contrary to the "RAF monomers" experiments, FL B- and C-RAF were not paradoxically activated but were inhibited by all three types of inhibitors.

      Overall, Tkacik et al. tackle an important question in the field for which definitive experiments and thorough biochemical investigation to understand the molecular mechanisms for the inhibitor-induced paradoxical activation are still missing, and of high importance for future drug development.

      Strengths:

      The biochemical experiments here are rigorously executed, and the results obtained are highly informative in the field to decipher the intricate mechanisms of RAF activation and inhibitor-induced paradoxical activation.

      Weaknesses:

      The interpretation of the results in the context of the current state of the art is ambiguous and raises questions about the relevance of introducing a new model for inhibitor-induced paradoxical activation, particularly since the findings presented here do not clearly contradict established paradigms. I believe some clarification and precision are required.

      While our model does not conflict with established paradigms (because it can allow for negative cooperativity) our experimental findings (here and in Tkacik et al JBC 2025) are in conflict with the negative allostery model. We will work to clarify this in the revised manuscript.

      Main comments:

      (1) Figure 2:

      The authors comment on the expected greater increase (for a cascade assay) in the magnitude of ERK phosphorylation compared to what was observed for MEK phosphorylation. However, this observation might be reflective of the stoichiometries used in the assay, with 40 times more MEK compared to RAF concentration (250nm vs 6nM), which might favour pERK vs pMEK.

      The authors should clarify their rationale for the protein concentration used in this assay and explain how protein stoichiometry was taken into account for the interpretation of their results.

      The Reviewer makes a good point, the concentrations and ratios chosen are expected to make a substantial difference in observed amplification. We intended this experiment more as a qualitative demonstration of cascade amplification and will clarify this in the revised manuscript.

      In addition, the authors should justify comparing pMEK and pERK TR-FRET values when different anti-phospho antibodies were used. Antibodies may have distinct binding affinities for their epitopes. Could this not lead to differences in FRET signal amplitudes that complicate direct comparison?

      Also a good point, we will note this limitation in the revised manuscript.

      (2) Supplementary Figure 2:

      The author mentioned that the inhibitors did not activate the FL auto-inhibited RAF complexes; however, they did inhibit the TR-FRET signal.

      Can the authors comment on the origin of the observed basal activity? Would the authors expect self-release of the RAF kinase protein from the auto-inhibited state in the absence of RAS, leading to dimerization and activation? Alternatively, do the inhibitors at low-concentration relieve the auto-inhibited state, thereby driving dimerization and activation?

      We think that the baseline activity that is being inhibited is due to low concentrations of active dimer in our autoinhibited state preparations.

      Did the author test the addition of RAS protein in their in vitro system to determine whether "soluble" RAS is sufficient to release the protective interactions with RBD/CRD/14-3-3 and lead to inhibitor-induced paradoxical activation of FL RAF?

      We did not, but we’ve thought about it. We expect that soluble RAS would not be activating. We have previously carried our extensive studies of BRAF activation by soluble vs. farnesylated RAS in a membrane environment (liposomes) and observed partial activation in the latter (Park et al, Nature Communications 2023).

      (3) Figure 5B:

      The authors said that the Kd values obtained from their MP assay are consistent with prior studies of RAF homodimerization and RAF:MEK heterodimerization. While this is true from the previous studies of RAF:MEK interaction by BLI (performed from the same team), the Kd of isolated RAF kinase homodimerization has been measured around ~30µM by AUC in the cited ref (24,27 & 37).

      The authors should discuss the discrepancy between their Kd of homodimerization and the reported Kd values in the literature. At the concentration used for MP, it is surprising to observe RAF dimerization while the Kd of homodimerization has been measured at ~30µM (in the absence of MEK).

      We will cite/discuss these differences in our revised manuscript.

      Would the authors expect the presence of MEK to influence the homodimerization affinity for the isolated KD?

      Perhaps, but likely only modestly. We do not think this explains the discrepancy noted above.

      (4) Conclusions:

      Several times in the introduction and the conclusion, the authors suggest that the negative allostery model (where "inhibitor binding to one protomer of the dimer promotes an active but inhibitor-resistant conformation in the other") is a model that applies to all types of RAF inhibitors (I, I1/2, and II).

      However, from my understanding and all the references cited by the authors, this model only applies to type I1/2 inhibitors, where indeed the aC IN conformation in the second (inhibitor-free) protomer of the RAF dimer might be incompatible with the type I1/2 inhibitors inducing aC OUT conformation. The type I and type II inhibitors are aC IN inhibitors and are expected to bind both protomers from RAF dimers with similar affinities. Therefore, the negative allostery model does not apply to the type I and type II inhibitors. The difference in the mechanism of action of inhibitors is even used to explain the difference in the concentration range in which inhibitor-induced activation is observed in cells. The description of the state of the art in this study is confusing and does not help to properly understand their argumentation to revise the established model for paradoxical RAF activation.

      We will work to clarify these complicated issues in the revised manuscript. While the reviewer is correct that the negative allostery model was developed in the context of Type 1.5 inhibitors, there are many examples in the literature of it being used to explain PA by type I and type II inhibitors as well.

      Can the authors clarify their analysis of the state of the art on the different mechanisms of action for the paradoxical activation of RAF by the different types of RAF inhibitors?

      We’ll try!

      5) Conclusions:

      "Our results suggest that negative allostery (or negative cooperativity) is not a requisite feature of paradoxical activation. The type I and type II inhibitors studied here induce RAF dimers and exhibit paradoxical activation but do so without evidence of negative cooperativity, nor do they appear to inhibit intentionally engineered RAF dimers with negative cooperativity (25). Indeed, type II inhibitors exhibit apparent positive cooperativity while type I inhibitors are non-cooperative inhibitors of RAF dimers (25)."

      Can the authors explain how results on the paradoxical activation induced by type I and type II inhibitors inform or challenge a model that specifically applies to type I1/2 inhibitors?

      As noted above, the negative allostery model has also been widely applied irrespective of inhibitor type (rightly or wrongly). Essentially any review or discussion of the topic will explain in one way or another how inhibitor binding to one side of a dimer leaves the opposite side active but resistant to inhibitor. Our model is agnostic with respect to cooperativity of inhibition – essentially we are pointing out a simple circumstance that seems to have been lost in the focus on negative allostery. Paradoxical activation is a result of drug action on RAF monomers, while inhibition is a result of drug action on RAF dimers. Because these are distinct molecular species/complexes, they can be expected to differ in their affinity for RAF inhibitors, irrespective of type. Because binding of ATP in the active site of RAF monomers stabilizes the inactive monomeric state, displacing ATP can promote activation/dimerization. For any inhibitor that is more potent at displacing ATP from a monomer that from an active dimer, we could expect to observe a window of paradoxical activation.

      The authors often refer to their previous study (reference 25), where they tested the inhibition of all three types of inhibitors with engineered RAF dimers. While I agree with the authors that in reference 25 the Type I and type II inhibitors inhibit RAF dimers without exhibiting negative cooperativity (as expected from the literature and the current model), the authors did observe some negative cooperativity for Type I1/2 inhibitors in their study most particularly for the type I1/2 PB (with hill slope ranging from -0.4 to -0.9, indicative of negative cooperativity).

      Correct! Although we do note the caveat that weak inhibition can also give rise to apparent negative cooperativity.

      While the observations that type II inhibitors display positive cooperativity is both novel and very interesting, from what I understand the results from thakick et al 2025 and the current study appear more in line with the current paradigm in the field (which describe paradoxical activation with negative cooperativity for type I1/2 inhibitors and no negative cooperativity for the Type I and II inhibitors) rather than disapproving of the current model and supporting for a new model. 

      In this context, can the authors clarify how their results challenge the current model for paradoxical activation?

      While the difference in binding modes and structural effects of type I.5 vs type I and type II inhibitors are well known in the field, we do not know of any work that suggests paradoxical activation arises from anything other than negative allostery. As one example to the contrary, Rasmussen et al. observe allosteric coupling asymmetry in binding of type II inhibitors to BRAF and attribute the observed paradoxical activation to “induction of dimers with one inhibited and one catalytically active subunit” (Rasmussen et al., Elife 2024). They also studied type I inhibitors in this work, but did not observe paradoxical activation.

      (6) Conclusions:

      The authors describe the JAB34 experiment from Poulikakos et al. 2010 to conclude that "While this experiment cleanly demonstrates inhibitor-induced transactivation of RAF dimers, it is important to recognize that the differential inhibitor sensitivity of the two subunits in this experiment is artificial - it is engineered rather than induced by inhibitor binding as the negative allostery model proposes."

      Indeed, the JAB34 experiment demonstrated the inhibitor-induced transactivation, but the Poulikakos et al. 2010 study does not discuss differential inhibitor sensitivity. The negative allostery model was proposed later by poulikakos team in other papers (Yao et al 2015 and Karoulia et al, 2016), in which JAB34 was not used.

      Can the authors clarify how the JAB34 experiments question differential inhibitor sensitivity?

      Good point, we neglected to discuss the Yao and Karoulia papers and will do so in our revised manuscript.

      (7) Conclusions:

      "Considering that the conformation required for binding of type I.5 inhibitors destabilizes RAF dimers, it is unclear how an inhibitor binding to one protomer would be able to transmit an allosteric change to the opposite protomer, if that inhibitor's binding causes the existing dimer to dissociate."

      The authors should comment on whether 14-3-3 proteins might overcome negative regulation by type I1/2 inhibitors, similar to what has been shown for ATP, which acts as a dimer breaker like type I1/2 inhibitors.

      Certainly we expect that they will, and we will discuss this in our revised manuscript.

      (8) Conclusions:

      "Furthermore, the complex effects of type I.5 inhibitors on dimer stability and the clear resistance of active RAF dimers to these inhibitors complicates interpretation of inhibition data - weak or incomplete inhibition of an enzyme can be difficult to discern from true negative cooperativity (43). As we discuss below, the clear resistance of RAF dimers to type I.5 inhibitors is alone sufficient to explain their ineffective inhibition during paradoxical activation, without invoking negative allostery." 

      The authors should explain how they reconcile this statement and their proposal of a new model that does not rely on negative allostery with their previous findings showing negative cooperativity for RAF dimer inhibition with type I1/2 inhibitors.

      As discussed above and in responses to other Reviewers, we do not exclude negative cooperativity for Type I.5 inhibitors. That said, we are skeptical, even in light of our own findings of apparent negative cooperativity by type 1.5 compounds, due in part to the caveats the reviewer highlights above.

      (9) Conclusions:

      Here, the authors propose a new universal model to explain paradoxical activation of RAF by all types of RAF inhibitors:

      " Our findings here, in light of structural studies of RAF complexes and prior cellular investigations of paradoxical activation, lead us to a model for paradoxical activation that does not rely on negative allostery and is consistent with activation by diverse inhibitor classes. In this model, the open monomer complex is the target of inhibitor-induced paradoxical activation (Figure 6). Binding of ATP to the RAF active site stabilizes the inactive conformation of the open monomer, which disfavors dimerization. Displacement of ATP by an ATP-competitive inhibitor, irrespective of class, alters the relative N- and C-lobe orientations of the kinase to promote dimerization (30, 35). Once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK."

      From my understanding, the novelty of this new model is twofold: a) the open monomer is the target of the inhibitor-induced paradoxical activation and b) once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK.

      Novelty a) implies, as the authors stated, that "Inhibitor-induced activation and inhibition act on distinct species - activation on the open monomer and inhibition on the 14-3-3-stabilized dimer". The authors should explain what they mean by "activation of the open monomer", while only RAF dimers are catalytically active (except for BRAF V600E mutant)?

      We will clarify – by activation we mean promoting conversion of the open monomer to a dimer.

      For novelty b), the authors should explain more clearly what experimental results support this new model.

      We will more explicitly detail how our results here as well as prior work in the field support this model.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      1) Summary

      This study investigates the mechanochemistry of Arp2/3-mediated branched actin networks at the level of individual branch junctions under load. Using microfluidic single-filament/branch force assays (including constant-force flow and open-chamber imaging) the authors quantify debranching, re‑nucleation, and mother- vs daughter‑interface stability across nucleotide states of Arp2/3 (ADP-Pi, ADP, and an ADP-BeFx proxy for ADP-Pi). They further test effects by two branch regulators (GMF and cortactin). Key findings include: (i) ADP-Pi and ADP complexes share similar force dependence but differ markedly (~20×) in intrinsic dissociation rate; (ii) phosphate turnover on the Arp2/3 complex is rapid ii) affinity for Pi drops when Arp2/3 loses its daughter filament; (iii) quantification from model fits uncovers large stability differences between daughter and mother interfaces of the Arp2/3 complex; (iv) extraordinary high stability of ADP-Pi-like Arp2/3 on the mother filament; and (v) distinct effects of GMF and cortactin on force‑dependent stability. Overall, the work combines technically demanding measurements with mechanistic modeling to probe how nucleotide state and regulatory factors tune branch mechanics.

      2) Major comments:

      1. Low force kinetics and completeness of survival curves (Figure 1). "For all forces, the surviving curves exhibited a clear single exponential behavior...." While the data can be fitted to monoexponential decay curves, data at low forces is clearly incomplete. >90% of branches have not dissociated by the end of the experiment. For the particular data shown in 1C (F00nN, n=60 total branches) it means that the time information is coming from

      Essential; experiment might already be performed. Otherwise straightforward to do (weeks time).

      In figure 1B, we indeed show a Survival curve for ADP-Arp2/3 complex branch dissociation at 0 pN up to 900 seconds. As now shown in updated supp figure S2, the data was in fact acquired for at least 5000 seconds for ADP-Arp2/3 and ADP-Pi states (N=2 repeats for each condition, with n = 60 and 90 branches for ADP-Arp2/3 branches, and 90 and 132 branches for ADP-Pi-Arp2/3 branches). The debranching rates reported in the initial submission were already obtained by fitting the surviving curves over the whole duration of the experiments.

      1. Stability Analysis (Figure 4). I can follow much of the arguments presented in the stability analysis of the daughter vs mother interfaces, which is in principle extremely interesting! However, there are some concerns here:

      i) The authors emphasize the zero force ratio derived from fits (which is linked to the stability difference of the two interfaces in the absence of force) despite this being only weakly constrained by data. Intuitively in the model, the stability difference should grow to very large values as the re-nucleation ratio approaches 1 at low force. This combined with the noise in the data poses an issue in my opinion. Looking at the data and the error margin, I think that the authors cannot state with high confidence that there is a real difference between the relative stability of the daughter and mother interfaces between the two nucleotide states of the complex.

      Essential; analysis and textual revision only

      We thank the reviewer for this comment. The difference in stability between the two interfaces is strongly constrained by the shape of the branch renucleation ratio versus force curve, and its value at 0 pN. This is illustrated in the figure shown below (new Supp Fig. S8), showing the dissociation rates of the two interfaces (in ‘dashed’ and ‘point-dashed’ style) that contribute to the overall debranching rate in each nucleotide condition. Despite the limited force range at which we probed the debranching rate, the branch renucleation ratio curve informs us on which interface is the weakest, and how this evolves with force.

      We have assessed the confidence intervals of the parameters obtained from the fits, taking into account the error bars on our experimental datapoints. It seems to indicate that the simultaneous fits of the debranching rate and the branch renucleation ratio curves indeed constrain the parameters quite strongly. These confidence intervals are now reported in the main text and in the summarizing table.

      We have repeated branch renucleation experiments for ADP-BeFx- and ADP-Pi-Arp2/3 complex branches (see new figure 4C&D, and our response to the next point). We believe these new measurements allow a better assessment of the relative stability between the two interfaces for Arp2/3 complex branch junctions in the ADP-BeFx state.

      Still, we agree with the reviewer that the dispersion of the experimental data does not allow us to have a strong confidence on the crossover force and relative stability difference of the interfaces. Therefore, we have slightly toned down the way we present and discuss the differences in stability when comparing the two nucleotide states.

      ii) For ADP-Pi, the renucleation ratio essentially remains flat over the measured force range. Hence, the data can only provide little leverage to estimate both the zero force ratio and, more importantly, the differential distance to the transition state in the slip-bond model in my opinion, which will show in the crossover force. Consequently, the quoted ">100×" stability difference at F=0 and the crossover force >20pN are driven largely by extrapolation rather than direct constraint by data. Given the high number of free parameters in the model, I would anticipate that several crossover forces and differential distances might explain the data nearly equally well. Instead of loosely reporting exact number from fits, I would have hoped for some sort of sensitivity analysis, for instance relying on profile likelihoods. Also parameter values could be reported as bounds (e.g crossover force≫measured range) rather than precise point estimates. This issue re-occurs (albeit not as drastically) for the cortactin experiments (Figure 6).

      Essential; analysis and textual revision only

      As mentioned in our response to the previous point, we have repeated renucleation experiments for ADP-BeFx- (and also for Arp2/3 complex branches in the presence of 50 mM Pi) (see new figure 4C&D) to better characterize the differential distance between to the transition force. The crossover force for the ADP-BeFx state is now 13.5 pN and the ratio of the stability between the two interfaces is roughly 100 times.

      We agree with the reviewer that the dispersion of the experimental data does not allow us to have a strong confidence on the crossover force and relative stability difference of the interfaces. We have thus toned down the way we report these values. We do believe though that the difference we report between the ADP and ADP-BeFx state appears to be significant and needs to be acknowledged.

      As a side note, it has proven to be challenging to pull on branches at forces higher than 7 pN. To apply a large force on the branch junction, we need to have a high flow rate. In this case, it appeared that the height of the filaments (both mother and daughter filaments) above the surface seem to deviate from what we have established in our previous studies (Jegou et al, Nat. Comm. 2013 & Wioland et al, PNAS 2019). This may originate from the fact branched filaments have a more complex shape than an individual filament. Characterizing accurately the evolution of the branch height as a function of the flow rate and applied force would require quite extensive additional characterization, which, we believe, is beyond the current focus of this study on the stability of Arp2/3 complexes.

      iii) One important expectation from the "two slip bond" model is that branch dissociation rates should not necessarily scale mono-exponentially as they mostly do over the accessible force range of the paper. However, once the "minor" pathway of dissociation from the mother starts to dominate at high forces, rates become more force sensitive. This is nicely recaptured by the model fits in Figure S6 but deserves some explanation in the text. Otherwise, people will simply remember the "ADP-Pi is 20-fold more stable than ADP at all forces" message.

      Essential; textual revision only

      We now have rephrased the key sentences (in the Abstract and Results sections) to more clearly state that the debranching rate is not increasing mono-exponentially with force.

      In the Abstract: “Remarkably, we find that branch junctions are over 30-fold more stable when the Arp2/3 complex is in the ADP-Pi rather than ADP state, and that force accelerates debranching with similar exponential factors in both states.”

      In the Results section: “The debranching rate seems to increase exponentially with the applied pulling force, in the range of 0 to 6 pN (Fig. 1F; see more refined analysis below). This behaviour is predicted by the Bell-Evans model for a slip bond.”

      iv) One important prerequisite for the model is that isolated Arp2/3 complexes (without a daughter filament) should dissociate with equal rates from mother filaments at all flow rates. Since the Arp2/3 complex prefers mother filament curvature, forces experienced by the mother might change its off-rate. It would be good to refer to this assumption in the text and experimentally verify it. I could not find it in the paper nor in Ghasemi et al 2024.

      Essential; simple experiment (a weeks time).

      We thank the reviewer for this important comment.

      First, we investigated whether the viscous drag force, applied on the ADP-Arp2/3 complexes which remain bound to mother filaments could affect their stability. We have performed branch renucleation experiments at different flow rates but with the same pulling force on branch junctions (average force 3.9 pN) by adapting the length of the daughter filament. As shown in new supp. figure S11 (shown below), we did not observe any significant differences between ‘low’ and ‘high’ flow rates. If the off-rate of the surviving Arp2/3 was significantly affected by the flow, this would have led to a variation of the renucleation ratio with the flow rate.

      Second, we have investigated the impact of the tension experienced by the mother filament at the location of the branch junction for ADP-Arp2/3 complex branches, with the same pulling force on the branches (average 4.1 pN pulling force on branches). We have quantified the debranching rate from three groups of branches depending on their position along mother filaments. As shown in new supp. figure S12 (shown below), we can observe a small trend, where the debranching rate decreases with the tension on the mother filament at the branching point.

      Doubling the tension on the mother filament from 15 to 30 pN decreases the debranching rate by a third. Though, pairwise logrank tests performed between the survival fractions of the three binned groups do not report any statistical significant difference (all p values > 0.05). One possible explanation for this is the height of the mother filament in the microfluidics flow that increases linearly from the anchoring point to the free barbed end. As a consequence the pulling force on the branches will be higher, as branches experience faster flows.

      For these same groups, upon branch dissociation, all remaining-bound Arp2/3 complexes are exposed to the same flow rate; the branch renucleation ratios were similar. Thus branch renucleation ratio seems to not significantly depend on the tension experienced by the mother filament at the branching point.

      Similarly, Pandit et al PNAS 2020, Extended figure S1, also reported no detectable impact of the mother filament tension on the debranching rate in their assay.

      v) The force dependence of the branch re-nucleation rate (Fig 3D) has been measured previously by the same group (Ghasemi et al). While the data in the older paper has not been fitted by a model, the trend of the data in the previous paper looks conspicuously different. Are there any explanations for this? I speculate that it might be related to actin and ATP not being saturated (low-force re-nucleation rate rarely exceeds 80%) in Ghasemi et al., but it would be good to know what the authors think about this. Essential; textual revision only

      This is a good point. We have plotted the data of the renucleation ratio from ADP-Arp2/3 complex from figure 1F of Ghasemi et al, Sc. Adv. 2024 (performed at 0.3 and 1 µM actin), together with the data of the current study from figure 4D (performed at 1.5 µM actin). We feel this comparison could be of interest to the readers, and have thus integrated it in the manuscript as new supp. figure S13 (shown below).

      As expected, the branch renucleation ratio is lower with lower concentrations of actin. The experimental data points from Ghasemi et al are similarly well fitted by the branch renucleation function obtained for 1.5 µM multiplied by a scaling parameter, which reflects the fact that the branch renucleation ratio is actin concentration dependent (Fig. 6A in Ghasemi et al). This scaling parameter was the only free parameter of those fits.

      Since the branch renucleation ratio depends on the actin concentration as follows, 0.97.kon.([actin] - Cc)kon.([actin] - Cc)+koffATP-Arp2/3 , with kon = 3.4 µM-1.s-1 and koff ATP-Arp2/3 = 0.66 s-1 from (Ghasemi et al. 2024), the scaling parameter obtained by the fits give estimates of the actin concentration in these experiments, of 0.6(±0.05) and 0.9(±0.2) µM for the experiments performed at 0.3 and 1 µM respectively in (Ghasemi et al. 2024).

      1. Stability of the authentic ADP-Pi-Arp2/3 complex on the mother filament. The extraordinary stability of the isolated ADP-BeFx-Arp2/3 complex on mother filaments is surprising, especially considering that both ATP and ADP states are much more labile (Ghasemi et al 2024). I would recommend repeating this experiment in the authentic ADP-Pi state with labelled Arp2/3 complexes as a more direct readout, even if this would require working with very high phosphate concentrations.

      Essential; simple experiment (a weeks time).

      We have followed the recommendation of the reviewer and have performed new experiments using fluorescent Arp2/3 complexes for ADP, ADP-BeFx and ADP-Pi states, now displayed in new figure 5C (also shown below).

      For fluorescent Arp2/3 complexes remaining bound to the mother filament, the Arp2/3 complex - mother filament interface is ~ 100 times more stable in the ADP-BeFx state (0.0046 s-1) compared to the ADP state (0.56 s-1). We also assessed the dissociation of surviving ADP-BeFx-Arp2/3 complexes using unlabelled Arp2/3 complexes (previously in figure 4B, repeated experiment shown in new supp. figure S10), which also indicates a remarkable stability.

      The dissociation curve of surviving Arp2/3 complexes in the presence of 50 mM Pi and 200 µM ATP in solution reflects the mixture of Arp2/3 dissociating in the ADP/ATP state and ADP-Pi-Arp2/3 that can either dissociate in the ADP-Pi state or lose their Pi and dissociate in the ATP state. Despite the presence of 50 mM Pi, the rate at which ADP dissociates and ATP reloads rate is much faster than Pi binding. Fitting this survival curve with a function that accounts for the initial double populations and the evolution of the ADP-Pi population (see Methods) gives a good estimate of the Pi release rate.

      OPTIONAL: Further, but beyond the scope of the present paper, would be titrating phosphate in these experiments, which would even allow the authors to independently verify the reduced Pi affinity for Arp2/3 in the mother filament. Of note, this affinity difference is needed to satisfy detailed balance in the reaction scheme (Fig 4 D)!

      We thank the reviewer for this suggestion. High concentrations of phosphate in the buffer renders glass surfaces quite sticky in our assays. We’ve tried several different passivation strategies (BSA, PLL-PEG, K-casein, …) but none gave satisfactory results. So titrating phosphate, by going beyond 50 mM phosphate, proved to be quite challenging.

      Detailed balance, considering the two possible routes connecting the ADP-Pi-Arp2/3 complex branch junction state and the surviving ADP-Arp2/3 complex state, can be written as KPi rel.branch junction . Kdebranching ADP-Arp2/3 = KdebranchingADP-Pi-Arp2/3 . KPi rel.surviving Arp2/3.. Some of these affinity constants are not known, because of the inability to determine reverse reactions rates such as the rebinding of a daughter filament to a surviving Arp2/3. It is thus hard to determine how the affinity of Pi for Arp2/3 complex changes between Arp2/3 complexes at branch junctions and surviving Arp2/3 complexes on mother filaments.

      While we cannot determine the affinity constant of Pi for a surviving Arp2.3 complex, our data indicates that the dissociation rate of Pi is higher from Arp2/3 complexes at branch junction (koff = 0.21 s-1) than from surviving Arp2/3 complexes (koff = 0.05 s-1). This unexpected finding indicates that surviving Arp2/3 complexes adopt a conformation where the nucleotides are readily exchanged, but where the ‘back door’ for Pi release is less open. We now discuss this point in our revised manuscript.

      1. Importance of "surviving" ADP-Pi-Arp2/3 complexes. The authors show a) rapid turnover of Pi on the ADP-Arp2/3 complex in both branch- or mother filament-bound state and b) the lowered Pi affinity of the latter. Nonetheless, they emphasize the importance of long-lived "surviving" ADP-Pi bound complexes on the mother (even stated in the abstract). I understand that this fraction shows under some experimental conditions (BeFx), but unless I am missing something, most complexes should rapidly lose their phosphate and either exchange nucleotide or dissociate from the mother under physiological conditions. Please clarify or tone done.

      Essential; textual revision only

      We thank the reviewer for their remark. We have tried to clarify this aspect in the manuscript.

      As shown now with the departure rate of fluorescent surviving Arp2/3 complexes together with branch renucleation data, we show that surviving ADP-Pi-Arp2/3 complexes are quite stable on mother filaments, because they detach and release their Pi slowly, such that branch regrowth will occur provided there is actin in solution. In the absence of actin monomers, as the reviewer correctly points out, the surviving ADP-Pi-Arp2/3 will predominantly release its Pi and thus become a surviving ADP-Arp2/3 complex. We have modified the text to avoid any confusion.

      1. GMF mechanism. The authors claim that GMF "...accelerates the departure of the surviving Arp2/3 complex from the mother...". I assume that they infer this from decrease in the re-nucleation ratio. However, alternatively GMF could simply dwell on the complex, inhibiting re-nucleation without promoting dissociation from the mother. The authors should either monitor Arp2/3 dwell times directly to discriminate between these possibilities or be more cautious in their conclusions.

      Essential; simple experiment (a weeks time) or textual revision.

      In Ghasemi et al. Sci. Adv. 2024, we examined the departure of Arp2/3 from the mother filament after GMF-induced debranching using fluorescent Arp2/3. Most of the fluorescent Arp2/3 dissociated from mother filaments within the same frame as the branch, i.e. within 0.5 seconds after the debranching event, and none were visible after another second . This could be due to Arp2/3 departing with the branch or an accelerated departure after branch dissociation. In any case, this rules out the possibility that GMF would dwell on the surviving complex for a substantial amount of time without promoting dissociation from the mother.

      In the present manuscript, we now show that increasing the ATP concentration 10-fold (from 0.2 to 2 mM) is sufficient to restore the branch renucleation ratio to its level without GMF. This shows that GMF does not cause Arp2/3 to leave with the branch, but rather that it (also) acts on the surviving Arp2/3 complex, in a way that is countered by high concentrations of ATP. More specifically, it suggests that GMF accelerates the departure of the surviving ADP-Arp2/3 complex, either directly and by hindering the reloading of ATP, and that GMF does not affect the surviving Arp2/3 complex once it has reloaded ATP.

      We now discuss these two non-mutually exclusive possibilities for the accelerated dissociation of the surviving ADP-Arp2/3 complex in the manuscript.

      6.Cortactin mechanism and the "leash model". I must say that the cortactin data are the most puzzling part of the paper and hard to reconcile with what we know from structure. I was hoping to find some of this resolved in the discussion. However, I do not understand the "leash model" in the discussion section for cortactin-mediated branch stabilization: "This would explain the observed increase in branch survival compared to the absence of cortactin. As the pulling force is increased, this rebinding mechanism becomes less efficient." According to my understanding of the data, this is opposite to what happens. Cortactin only stabilizes the labile interface at elevated forces! Some re-writing might help here.

      Essential; textual revision.

      We thank the reviewer for having us think more thoroughly about the model we initially proposed. We now believe that our ‘leash’ mechanism is not able to fully recapitulate our observations in a simple and satisfactory manner.

      We now propose a much simpler model, where the binding of cortactin to the Arp2/3 complex at the branch junction simply changes the energy landscape of the Arp2/3-daughter interface without the need to invoke a rebinding of the daughter filament upon branch departure. We have updated our interpretation of the data in the Discussion section accordingly.

      Overall, our results on the impact of cortactin on branch renucleation highlights a surprising behaviour that would require further investigation to fully decipher the underlying molecular mechanism.

      3) Minor comments

      Organization: - I do not want to impose on how to best tell the story, but I felt that Fig1 A-D and Fig 2 A-B belong to one logical unit (nucleotide dependence), whereas Fig 1 E-F and Fig 2 C belong to the other (Pi binding and exchange). Perhaps consider re-organizing to streamline presentation?

      We thank the reviewer for their suggestion. We agree that it flows more naturally as suggested, and have made the changes! Thank you.

      Semantics/Typos: - Abstract: „... ADP-Pi and ADP-Arp2/3 detach with the same exponential increase as a function of force...". Increase should refer to the dissociation rate, which should be added to the sentence.

      We have corrected this.

      Results page 8: "...and the majority of Arp2/3 complexes detach from the mother filament while remaining bound to the branch at the debranching time." "Branch" should likely be daughter here, as there is no branch after dissociation of either interface.

      We have corrected this, thank you.

      Results page 13: "Exposing ADP-BeFx-Arp2/3 complex branch junctions to a saturating amount of GMF...". It is strange to imply saturation, because GMF likely simply does not bind to the complex in this nucleotide state with appreciable affinity. Suggest to change to "high".

      We have made the changes accordingly.

      Discussion page 18: "Moreover, in mammalian Arp2/3, His80 in Arp3 (corresponding to His73 in mammalian actin) is not methylated, and corresponds to residue N77 in Arp3, which is also not modified." N77 likely belongs to Arp2?

      We have made the changes accordingly.

      Discussion page 19: "We showed that Pi affinity for Arp2/3 complexes at branch junctions is around 3.7 mM (Fig. 1), a value which lies within the reported 1-10 mM Pi concentration measured in the cytosol in different mammalian cell types". Notably, this is not too different from F-actin, which should be mentioned. By this measure alone, free inorganic phosphate could also directly regulate actin filament stability!

      We now mention this and discuss that intracellular Pi can also impact actin filament nucleotide state.

      Future interest (non essential): - It would be utterly exciting (but beyond current scope) to quantify how instantaneous debranching rates evolve for naturally aging branches starting from ATP-Arp2/3 complexes!

      We thank the reviewer for this remark. It is indeed quite beyond the scope of the current study, as this would require a way to probe ATP-Arp2/3 complex branches while daughter filaments are still quite short (so pulling on them is difficult). An interesting alternative could be to use ATP analogs, such as App-NHp (aka AMP-PNP), to stabilize this state. However, some studies have mentioned that App-NHp is not very stable.

      Significance

      General assessment:

      This is a compelling and carefully executed study that delivers a clear mechanistic framework for how Arp2/3 branch junctions fail and re‑form under load. The central strength is the tight integration of state‑of‑the‑art reconstitutions with careful and original kinetic analysis. The experimental design is elegant and experiments have been carried out to a masterful standard. The figures are clear, the statistics are appropriate with some exceptions as detailed above. There are very few labs in the world that could have achieved this feat!

      A few aspects could be further strengthened, most notably the explanation and application of the "two slip bond" model as well as slightly more restraint in speculating around specific regulatory mechanisms. However, these are minor refinements that do not detract from the important contributions of the paper.

      Overall, the clearly work merits publication with high priority after revision; most requested changes are textual/analytical with very few targeted experiments, which would substantially strengthen core claims.

      We thank the reviewer for their positive evaluation of our manuscript. We hope that our responses to the detailed points above, along with the corresponding revisions of the manuscript, will alleviate their concerns.

      Advance relative to prior literature: The major novel findings of the paper are already summarized above. There is some recent work done on the subject of branch mechanics by the authors (Ghasemi et al 2024, PMID: 38277459) and others (Pandit et al 2020 PMID: 32461373), but the focus of the present work is clearly unique and the there is plenty of novel insight.

      Audience and impact: Primary audience: specialists in cytoskeleton dynamics, in vitro reconstitution single molecule biophysics, and mechanobiochemistry. Secondary: researchers in cell motility, morphogenesis and mechanobiology, physicists working on active matter and modelers studying force producing and load-bearing biopolymer networks. The results and analysis framework should inform quantitative models of branched network turnover under load and the interpretation of regulatory factor action in vivo and in cells.

      Reviewer expertise: Actin dynamics; biochemical reconstitution; single molecule approaches; biophysics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Xiao et al examine the molecular events occurring when Arp2/3 complex-mediated actin filament branches are removed from mother actin filaments. They do this using microfluidics assay with purified proteins combined with single filament TIRF imaging of branched actin filaments with distinct fluorescent labels. The contribution of different nucleotide states of Arp2/3 complex are tested in conjunction with the relationship force exerted on the branches and regulatory protein involvement from GMF and cortactin. The data seem comprehensive and highly quantified in response to concentration, force, fraction of branches and survival times and branching rates. They find that ADP-BeFx and high phosphate concentrations (leading to the ADP-Pi state) leads to a slower debranching rate at a given level of force applied. The ability to rapidly switch the buffer gives powerful information about response times of debranching compared with other actin remodelling events. They use renucleation experiments to determine that the previous debranching event most often occurs at the Arp2/3 complex/daughter interface, showing that filaments will be ready to re-branch in the stable ADP-Pi bound state. GMF addition allows debranching of the ADP state to occur at a lower force. Cortactin acts similarly to the ADP-Pi state to increase branch stability.

      Specific comments

      The pulling force on the branches seems to arise from different flow rates in the microfluidics. Viscous drag is mentioned and I can see there is methylcellulose in the buffer. It would be helpful to have the explanation of the conversion between flow and force, even if it has been standard in previous work.

      We apologize if this was unclear: in microfluidics experiments, the buffer does not contain methylcellulose. Methylcellulose is only used for ‘open chamber’ experiments, where no force is applied to Arp2/3 branches, to maintain them in the TIRF field of excitation (Figure S2).

      To better clarify the conversion between flow and force, we have rephrased and extended the Methods section to explain how the force on the branch junction is computed based on the local flow velocity and the length of the daughter filament.

      Pg 5 - what was the motivation to titrate phosphate? It seems a stretch that intracellular Pi levels are tuning branching inside cells more than protein-mediated control (GMF or cortactin) - can the authors evidence this at all?

      We are not claiming that the level of Pi plays a stronger regulatory role than proteins. We show that inorganic phosphate tunes the state of the Arp2/3 complex, which in turn modulates the action of regulatory proteins, such as GMF and cortactin.

      Nonetheless, we do show that the contribution of inorganic phosphate is quite central as it can (1) strongly stabilize branch junctions (~30-fold decrease in the dissociation rate), and (2) tune the activity of GMF and cortactin on Arp2/3 complexes at branch junctions as well as on the ‘surviving’ Arp2/3 complexes that remain bound to mother filaments.

      We thus titrated phosphate and found that its impact on Arp2/3 complex stability is significant in the range of Pi concentration that is explored in cells. For the sake of completeness, and following a comment from reviewer #1, we now also mention the affinity of Pi for actin subunits in filaments in the Discussion, and discuss the impact of intracellular Pi on actin itself.

      Minor comments

      • In the introduction, while the structural and mutagenesis evidence is clearly stated, in other cases a bit more detail would be helpful e.g. 'biochemical studies', which referred measurement of hydrolysis rates using radiolabelling

      We have made changes to more precisely define which biochemical assays were used in previous studies.

      • Page 3 Figures shouldn't be referenced in the introduction

      We have removed the references to the figures from the introduction.

      • Page 3 slip bond behaviour needs explanation

      We now explain the concept when first using this concept in the manuscript, as follows: “The debranching rate seems to increase exponentially with the applied pulling force, in the range of 0 to 6 pN (Fig. 1F; see more refined analysis below). This behaviour of accelerated debranching with the increase of the applied force is similar to the ‘slip bond’ concept, as predicted by the Bell-Evans model of the force-dependent lifetime of the interaction between two proteins”.

      • Figure 1B seems to be a theoretical schematic which is superfluous

      We suppose that the reviewer is actually referring to figure 3B of the initial manuscript, describing the energy potential of a molecular interaction as a function of the reaction coordinate. We agree with the reviewer that it is not absolutely required and we have removed it.

      • Figure 4D is helpful, different weight lines might help even more to explain the dominant pathways

      We have made modifications to the biochemical reaction scheme in this figure (now figure 5F in the revised version). We hope we succeeded in improving its readability. Since the different paths depend on mechano-chemical parameters, there is no real dominant pathway per se.

      **Referee cross-commenting**

      Rev1 sounds like the specialist here. I can't comment on their requests. Some similar points arise between the reviewers which need addressing.

      Reviewer #2 (Significance (Required)):

      Significance

      Taking a look at references 16 and 19, I do not find it clear what is achieved differently in the current work compared to these papers and what agrees and what disagrees. If it's a species difference I might expect the two species would be analysed side-by-side in this paper.

      We thank the reviewer for this important comment. The goal of our study was not to compare the behaviour of mammalian and yeast Arp2/3 complexes.

      We now try to better explain that the motivation of the present work is to address how the nucleotide state of the Arp2/3 complex tunes actin branch mechanosensitive stability, and regulates interactions with well known Arp2/3 complex binding proteins. Most of the reactions are quantified here for the first time. Moreover, the experiments with branch junctions in different nucleotide states are done under controlled mechanical conditions, providing the first direct measurements of the force-dependence of the debranching reactions. Our detailed kinetic analysis of the full reaction scheme allows us to model the different binding interfaces of the Arp2/3 complex.

      In addition, it is worth noting that:

      1. Species matter and this is why ref 16 and 19 can give the impression to disagree on the ability to renucleate branches thanks to the stability of surviving Arp2/3 complexes on mother filaments.
      2. In ref 16 (Pandit et al, PNAS 2020) species are mixed (yeast Arp2/3 and mammalian alpha actin from skeletal muscle), likely leading to a different behaviour compared to the only mammalian protein situation we examine in our current work. In particular, with mixed species one misses the ability to renucleate, as shown in our previous study Ghasemi et al (ref 19). However, since mixing species does not correspond to anything physiological, we do not think it is worth repeating these conditions alongside our experiments.
      3. Further, the analysis carried out in ref 16 suffers from important limitations: the force was unknown (not calibrated) and the data was fitted by a model that compounded several reactions, providing only an indirect estimation of the rates, in particular at zero force. In contrast, we have worked with calibrated forces (including dedicated experiments at zero force) and we have carried out specific experiments to directly measure several rates.
      4. In ref 19 (our earlier work) we did not investigate the impact of the nucleotide state of the branch junction at all, and we did not systematically measure the dissociation rates as a function of force.

      Contrary to Pandit et al, we directly measure the difference in branch stability at zero force between ADP and ADP-Pi states and show that the ~ 30 fold difference holds true at all probed forces. Last, the force dependence of the branch renucleation success rate gives us crucial information on which of the two Arp2/3 complex interfaces ruptures first.

      I'm not understanding how the authors can distinguish effects of adding phosphate and BeFx on Arp 2 and 3 compared to effects on actin. Importantly, are possible accompanying changes in the actin filament a confounding factor?

      We have checked that the nucleotide state (ADP-BeFx and ADP-Pi versus ADP) of the mother and daughter filaments have no impact on branch stability:

      • In the experiments shown in figure 2F, where the buffer condition to which branches are exposed is quickly changed from phosphate buffer to buffer without phosphate, we observe a rapid change of branch stability. Actin subunits at the branch junction are in F-actin conformation according to recent cyroEM observations (ref. Chavani et al, Nat Comm. 2024; Liu et al, NSMB 2024). These actin subunits, initially in the ADP-Pi state, are expected to age and become ADP with a rate of ~ 0.007 s-1 (ie half-time of 100 s; ref. Jegou et al, PLoS Biology 2011, Ooosterhert et al, NSMB 2023), a much lower rate than the observed change of the debranching rate (0.21 s-1). This means that the debranching rate is independent of the nucleotide state of daughter and mother filaments.

      • In new supp. Figure S4, we show that the debranching rate is similar for ADP-Arp2/3 complex branch junctions initiated from ADP- or ADP-BeFx-actin mother filaments.

      • In new supp. Figure S9, we initially exposed branch junctions to a BeFx solution then monitored debranching and branch renucleation in our standard buffer (ie without BeFX or Pi). We observed multiple rounds of branch renucleation, the first with ADP-BeFx-actin daughter filaments, and the following with daughter filaments never exposed to BeFx. They all had the same debranching rates and renucleation success rates.

      The paper is quite specialist to read and the advance appears to be incremental. My expertise is in molecular pathways to actin regulation outside the main area of the paper.

      The results we present in this study are often unexpected, and some go counter long-standing assumptions. The regulation of Arp2/3-nucleated branches is of importance for the stability and the force-generating capabilities of many actin networks in cells. Last, most of the measurements that we present had never been done, mainly because experiments are difficult to achieve, and require specific tools to monitor several events while controlling the applied force.

      We believe our results are of broad interest as they go counter long-standing assumptions. We have rewritten the text in several instances to convey our message more clearly.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Please find enclosed the review of the manuscript "Inorganic phosphate in Arp2/3 complex acts as a rapid switch for the stability of actin filament branches" by Xiao et al.

      The authors provide a detailed investigation of how the nucleotide bound to the Arp2/3 complex affects branch stability under flow force. From a kinetic perspective, this is an elegant study with generally high-quality data, although some conclusions rest on assumptions rather than direct experimental evidence.

      We thank the reviewer for their positive feedback. We have improved our manuscript and performed important additional experiments to provide more direct experimental evidence of our conclusions.

      A key question concerns the physiological relevance of these findings. For instance, the concept of branch regrowth may not be applicable in cellular contexts, since forces by actin polymerization would displace existing branches away from sites where they generate this active forces. The authors should clarify the relevance of regrowth during active force generation by branched networks.

      We thank the reviewer for this comment. Our in vitro results indeed point to a previously unreported property of branched actin networks, i.e. the ability of Arp2/3 complexes to readily renucleate branches in the ADP-Pi state and that it does require reloading ATP within Arp2/3.

      Branched actin networks, especially the lamellipodia or endocytotic patches, do exert active force thanks to actin polymerization of the individual branches at the forefront. Though, the whole actin network is exposed to stress, and the architecture of the network (inter-branch distance, crosslink between branches, …) presumably strongly impact its mechanical properties.

      In the case of other types of branched actin networks, such as the actin cortex, myosin motor put the whole network under tension. Such pulling forces on actin branches, depending on the amplitude of the pulling force, can lead to branch regrowth, and network self-repair.

      We have modified the text to make the physiological relevance clearer.

      Additionally, all experiments employ flow conditions that branches would probably not experience in cells-notably, the flow direction in the cellular context would be reversed. Altering the flow direction relative to the branches could affect not only the relationship between flow rate and branch stability, but potentially other system properties as well.

      We agree with the reviewer that in cells branches will not experience flow conditions similar to the ones we use in our in vitro assay. Nonetheless, in cells we expect mechanical stress on the branch junction to be applied in all directions. In lamellipodia, the compressive force applied at the leading edge is expected to result in diverse local orientations of the force on individual branch junctions within the network (as explained in Lappalainen et al. Nat Rev MBC 2022). Also, branch junctions are found in the cell cortex, where they are exposed to pulling forces resulting from the action of myosin motors and crosslinkers on mother and daughter filaments.

      This impact of the direction of the flow was addressed in our previous publication (Ghasemi et al, Sc. Adv. 2024, figure 2) and, to a lesser extent, by the lab of Enrique de la Cruz in Pandit et al, PNAS 2020 (ref. 16). We reported that flow direction has a minimal effect, if any, on branch dissociation rate and renucleation ratio.

      Reviewer #3 (Significance (Required)):

      Furthermore, the study appears not to account for the mother filament (particularly its nucleotide state) or the actin subunit bound to the Arp2/3 complex. The authors should discuss why their interpretation focuses exclusively on the Arp2/3 complex rather than on the actin filaments or Arp2/3-bound actin subunit.

      We have checked that the nucleotide state (ADP-BeFx and ADP-Pi versus ADP) of the mother and daughter filaments has no impact on branch stability :

      • In the experiments shown in figure 2F, where the buffer condition to which branches are exposed is quickly changed from phosphate buffer to buffer without phosphate, we observe a rapid change of branch stability. Actin subunits at the branch junction are in F-actin conformation according to recent cyroEM observations (ref. Chavani et al, Nat Comm. 2024; Liu et al, NSMB 2024). These actin subunits, initially in the ADP-Pi state, are expected to age and become ADP with a rate of ~ 0.007 s-1 (ie half-time of 100 s; ref. Jegou et al, PLoS Biology 2011, Ooosterhert et al, NSMB 2023), a rate much lower than the observed change of the debranching rate (0.21 s-1). This means that the debranching rate is independent of the nucleotide state of daughter and mother filaments.

      • In new supp. Figure S4, we show that the debranching rate is similar for ADP-Arp2/3 complex branch junctions initiated from ADP- or ADP-BeFx-actin mother filaments.

      • In new supp. Figure S9, we initially exposed branch junctions to a BeFx solution then monitored debranching and branch renucleation in a regular buffer. We observed multiple rounds of branch renucleation, the first with ADP-BeFx-actin daughter filaments, and the following with daughter filaments never exposed to BeFx. They all had the same debranching rates and renucleation success rates.

      An important concern involves the use of KPi (inorganic phosphate). Based our experience, KPi appears to have effects beyond simply impacting nucleotide state-actin filaments seem to assemble differently in the presence of KPi. The authors should exercise caution in their interpretation of KPi-based experiments.

      Concentration of KPi (up to 50 mM Pi) did not slow down barbed end elongation rate in our experiments.

      Overall, while the technical quality and kinetic analyses are state-of-the-art, relating this work to physiological contexts remains challenging, and some conclusions appear overstated.

      We have made changes in the discussion to try to more clearly relate our in vitro observations and conclusions with the cellular context where branch renucleation could have a strong impact on the architecture and mechanics of actin networks.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      While the authors have proved their hypothesis by temporally increasing the activity of cholinergic neurons at different life stages through the auxin-inducible degron system, their work raises two major concerns. First, they might want to discuss the conflicting data from Zullo et al (Nature 2019, vol 574, pp 359-364). For example, the authors show that increasing the activity of acr-2-expressing neurons after the 7th day of adulthood increases lifespan. However, Zullo et al (2019) show that the reciprocal experiment, inhibiting cholinergic neuron activity on the 1st day or the 8th day of adulthood, also increases lifespan. Is this because the two studies are using different promoters, that of the acr-2 ACh receptor (this work) versus that of the unc-17 vesicular ACh transporter (Zullo et al., 2019)? The two genes are expressed in different subsets of cells that do not completely overlap. CeNGEN shows that acr-2 is expressed in motor and non-motor neurons, but some of these neurons are also different from those that express unc-17. Is it possible that different cholinergic neurons also have opposite lifespan effects during adulthood? Or is it because both lack of signaling and hypersignaling can lead to a long-life phenotype? Leinwand et al (eLife 2015, vol 4, e10181) previously suggested that disturbing the balance in neurotransmission alone can extend lifespan. A simple discussion of these possibilities in the Discussion section is likely sufficient. Or can the auxin treatment and removal be confounding factors? Loose and Ghazi (Biol Open 2021, vol 10, bio058703) show that auxin IAA alone can affect lifespan and that this effect can depend on the time the animal is exposed to the auxin.

      We thank the reviewer for the thoughtful comments and valuable suggestions. In response, we have expanded the Discussion section to address the points raised, as detailed below.

      We fully agree with the reviewer that the different results between our study (activating acr-2-expressing neurons) and Zullo et al. (inhibiting unc-17- expressing neurons) are most likely due to the distinct cholinergic neurons targeted. Our new preliminary data further support this neuron-specific model, as inhibition of acetylcholine synthesis at mid-late life stages produces opposing lifespan effects in different cholinergic neurons. At the same time, we cannot rule out the alternative possibility raised by the reviewer (eLife, 2015) that both activation and inhibition of neuronal activity may extend lifespan by similarly disrupting the balance of neurotransmission. This hypothesis requires further experimental validation in the context of cholinergic motor neurons. Regarding the potential technical concern related to auxin exposure (Biol Open, 2021), our control experiments using 0.5 mM auxin did not show non-specific lifespan effects.

      Accordingly, in the revised manuscript, we have discussed the first two possibilities in the Discussion by stating (page 17-18): “Nevertheless, it is still unclear whether other neuronal populations share similar temporal regulatory mechanisms. A previous study reported that inhibiting cholinergic neurons activity (using unc-17 promoter) extends lifespan regardless of timing[2], which is different from the temporal lifespan regulation we observed in cholinergic motor neurons (using acr-2 promoter). This discrepancy is likely due to differences in subsets of neurons, as the unc-17 promoter labels a broad repertoire of cholinergic neurons, while the acr-2 promoter mainly marks cholinergic motor neurons[53]. Thus, the distinct lifespan-modulating effects of cholinergic motor neurons may be overshadowed by opposing contributions from other cholinergic subtypes when a mixed population is manipulated. Alternatively, both activation and inhibition of cholinergic activity may perturb neurotransmission balance, leading to similar effects on lifespan[54]. It will be interesting to test these hypotheses in future studies.”

      Second, the daf-16-dependence of the early longevity-inhibiting effect of ACh signaling needs clarification and further experimentation. The authors present a model in Figure 6D, where DAF-16 inhibits longevity. This contradicts published literature. Libina et al (Cell 2003, vol 115, pp 489-502) have shown that intestinal DAF-16 increases lifespan. From the authors' data, it is possible that ACh signaling inhibits DAF-16, not promotes it as they have drawn in Figure 6D.

      We thank the reviewer for this important point. We agree that intestinal DAF-16 promotes longevity. Our original model Figure 6D aimed to show that the larval pathway shortens lifespan by inhibiting DAF-16, not that DAF-16 itself shortens lifespan. The arrowhead style used in the original Fiugure 6D might have given an impression that DAF-16 shortens lifespan. Our apologies. We have now fixed this error in Figure 6D. In addition, as suggested, we have performed additional daf-16 experiments (see below).

      In Figure 3F, the authors used Pacr-2::TeTx, which inhibits cholinergic neuron activity, to show an increase in the expression of DAF-16 targets. Why did the authors not use the worms that express the transgene Pacr-2::syntaxin(T254I), which increases cholinergic neuron activity? What happens to the expression of DAF-16 targets in these animals? Do their expression go down? What happens if intestinal daf-16 is knocked down in animals with increased cholinergic neuron activity, instead of reduced cholinergic neuron activity?”

      Thanks for these insightful questions. In Figure 3F-H, we used TeTx instead of syntaxin(T254I) to investigate the function of DAF-16 in the early stage pathway based on the two main reasons. First, Pacr-2::TeTx transgene extends lifespan in early life by inhibiting cholinergic activity, which provides a genetic background complementary to that of syntaxin(T254I) for characterizing the role of DAF-16. Second, TeTx pathway is expected to activate DAF-16 and upregulate its target genes. This approach is more sensitive than measuring gene downregulation in Pacr-2::syntaxin(T254I) transgenic worms.

      We fully agree with the reviewer that performing the corresponding experiments in the syntaxin(T254I) background would strengthen the overall evidence. As suggested, we have now examined the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, and performed intestine-specific RNAi of daf-16 in the same background. We found that these worms exhibit downregulation of DAF-16 target genes. Furthermore, intestinal daf-16 knockdown did not further shorten the already reduced lifespan of these transgenic worms. Together, these results from both the TeTx and syntaxin(T254I) lines confirms that cholinergic motor neurons require DAF-16 in the intestine to regulate lifespan. These new data has now been described in Figure S5A-5D (page 11-12): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons.”, and “RNAi of daf-16 in the intestine abolished the ability of cholinergic motor neurons to regulate lifespan at early life stage (Figure 3G, 3H and Figure S5C-S5E).”

      Recommendations for The Authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) “The Methods section needs to be clarified/expanded.”

      (a) “For example, are the authors using indole-3-acetic acid or a synthetic auxin? How long does it take for syntaxin to be made after the removal of the auxin?”

      We have now included auxin information and recovery time in the Method for auxin treatment by stating (page 24): “natural auxin indole-3-acetic acid (G&K Scientific)”, and “Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (b) “How much FUDR was used in some of the lifespan assays?”

      2 μg/mL FUDR was used in some of the lifespan assays. We have now included the concentration in the Method for lifespan assay by stating (page 23 line 526): “2 μg/mL 5-Fluoro-2’-deoxyuridine (FUDR) was included in assays involving TeTx transgene worms, unc-31 and unc-17 mutant worms, which show a defect in egg laying.”

      (c) “In line 494 of the Methods section, worms were anesthetized with 50 mM sodium azide. That concentration seems a bit high.”

      It is an error indeed. We used 5 mM NaN3. This has now been fixed in the text and in line 548.

      (d) “What are the concentrations of the transgenes used in the extrachromosomal arrays?”

      We have now included the concentrations in the Method for strains and genetics by stating (line 507-509 on page 22): “Microinjections were performed using standard protocols. Each plasmid DNA listed above in the transgenic line was injected at a concentration of 50 ng/μL. Each marker for RNAi was co-injected at a concentration of 25 ng/μL.”

      (2) “Gene expression can vary in different parts of the worm intestine. Do the measurements in Figure 6C represent the entire intestine or only certain parts of the intestine?”

      We have now included the intestine area used for quantification in the Method for microscopy by stating (page 24): “and the entire intestine area was selected by ImageJ”, and in the legends of Figure 6C by stating (page 36): “The entire intestinal area was selected for measurement.”

      (3) “In Figure S1C, does tph-1 have a slight effect? Might serotonin partly counteract the effects of ACh?”

      We thank the reviewer for raising this interesting point regarding the potential role of serotonin. We have re-examined our data in Figure S2C (the original Figure S1C) and agree that loss of tph-1 partly counteracted the lifespan-shortening effect of Pacr-2::syntaxin(T254I) transgene in early life stage, thought the whole-life suppression effect is slight. To assess whether the acr-2 promoter-driven manipulation might directly affect serotonergic neurons, we checked the CeNGen. We found that the transcript expression of acr-2 can be detected in serotonergic neurons (ADF, HSN, and NSM), but the levels are extremely low. In this regard, it is unlikely that the Pacr-2::syntaxin(T254I) transgene exerts its primary effect by substantially altering serotonin release. While a potential indirect interaction between cholinergic and serotonergic signaling in lifespan regulation remains, it falls beyond the primary focus of the current study. We would like to follow up this in future studies. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4), GABA (unc-25), serotonin (tph-1), dopamine (cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (4) “Where else is GAR-2 expressed? Might there be redundancies between neuronal and intestinal GAR-2?”

      We appreciate this insightful question. Based on available single-cell gene expression atlases of C. elegans at both embryonic and adult stages[1,2], gar-2 expression has been detected not only in neurons and the intestine, but also in additional tissues such as the muscle. Regarding the observed lack of effects upon neuronal or intestinal gar-2 RNAi on the ability of cholinergic motor neurons to extend lifespan in mid-late life, and also suggested by another reviewer, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 in the muscle will be further investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-2 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had an effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (1) Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, doi:10.1126/science.aax1971 (2019).

      (2) Roux, A. E. et al. Individual cell types in C. elegans age differently and activate distinct cell-protective responses. Cell Rep 42, 112902, doi:10.1016/j.celrep.2023.112902 (2023).

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (4) Izquierdo, P. G. et al. Cholinergic signaling at the body wall neuromuscular junction distally inhibits feeding behavior in Caenorhabditis elegans. J Biol Chem 298, 101466, doi:10.1016/j.jbc.2021.101466 (2022).

      (5) “In line 344, please correct "fwork" to "work".”

      This has now been fixed.

      (6) “In line 360, please correct "acts" to "act".”

      This has now been fixed.

      (7) “Please check citations within the main text. Some of the citations do not fit the cited material. For example, in line 112, reference 28 is not about GABAergic neurons.”

      We thank the reviewer for pointing out these important details. We have now carefully checked and corrected the citations throughout the manuscript as suggested.

      Reviewer #2 (Recommendations for The Authors):

      (1) “How are the authors assessing the efficacy of the TeTx manipulations in their strains? Likely TeTx has a concentration-dependent effect. Are there any phenotypes associated with the loss of cholinergic signaling? Also, does TeTx expression in cholinergic neurons alter the neuronal activity of other associated neurons, or alter muscle integrity?”

      Thanks for the question. Our observations show that overexpression of TeTx results in defects including small size, slow growth, egg-laying deficiencies, and severe locomotion impairment, which are all associated with the loss of cholinergic signaling. While we did not directly examine the activity of interconnected neurons in our strains, we tested the muscle integrity by recording muscle reaction to 1 mM levamisole and found that overexpression of TeTx does not affect muscle integrity. To circumvent these pleiotropic complications, we instead employed Syntaxin(T254I) transgenic worms, which exhibits only slight locomotion defects, to further characterize the temporal effect of cholinergic motor neurons on lifespan. This data has now been described in Figure S1A by stating (page 6): “Overexpression of TeTx induces characteristic phenotypes of cholinergic deficiency, such as developmental delay and severe locomotion impairment[32], yet does not compromise muscle function (Figure S1A).”

      (2) “The authors are expressing TeTx throughout the lifespan of the animal, including during development. How does this contribute to the organismal phenotype?”

      As described above, chronic TeTx expression from egg stage results in developmental delay, which is similar to the development phenotype of unc-17 mutant worms defective in acetylcholine transmission. However, unc-17 mutation has no effect on lifespan[3], which is different from TeTx overexpression, indicating that the developmental delay caused by TeTx overexpression may not affect the lifespan phenotype.

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (3) “A previous study has shown that increasing cholinergic activity by altering ACR-2 expression can cause neurodegeneration (DOI: https://doi.org/10.1523/JNEUROSCI.1515-10.2010). Does overexpressing syntaxin, or AID-mediated degradation of syntaxin cause motor neuron degeneration, which could also contribute to the lifespan phenotype?”

      We thank the reviewer for raising this important point regarding potential motor neuron degeneration. In response, we performed confocal microscopy to assess the motor neurons. We found that worms expressing the transgene Pacr-2::syntaxin::mCherry do not exhibit a defect in the number or morphology of labeled neuronal cell bodies compared to control worms expressing Pacr-2::mCherry. This observation indicates that chronic, increased cholinergic activity through syntaxin overexpression, under our experimental conditions, does not induce motor neuron degeneration. This data has now been described in Figure S1B by stating (page 7): “This transgene simply shortened lifespan without causing a pleotropic effect (Figure 1B), and critically, without inducing motor neuron degeneration (Figure S1B).”

      (4) “Figures 1I-1L: The authors do not show how long it takes for the expression of syntaxin to be restored following the removal of auxin from plates. This would be important to assess the age-dependent effects of neuronal signaling.”

      We thank the reviewer for pointing this out. In general, complete restoration of syntaxin expression occurred within 24 hours after auxin withdrawal. We have now pointed this out in the text by stating (the last sentence on page 24):“Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (5) “In Figures S1A-E: Although the mutant backgrounds decrease the lifespan of animals expressing the Pacr2::syntaxin(T254I) transgene, the lifespan of these transgenic animals appears to be extended compared to what was shown in Figure 1B. Is this the case? (can these experiments be repeated alongside wild-type N2s to assess if their lifespan is indeed extended compared to the N2?). Also, if so, could it be that the lifespan effects are modified to different extents by other small neurotransmitters?”

      We thank the reviewer for pointing this out. All the experiments presented in current Figure S2 (original Figure S1) were performed with wild-type N2 controls, which are now included in the updated Figure S2. This data shows that, in the Pacr-2::syntaxin(T254I) transgenic background, loss of unc-25 (GABA) or tph-1 (serotonin) leads to a further extension of lifespan, while loss of other genes had no effect. Importantly, while unc-25 mutation also extends lifespan in wild-type worms, tph-1 mutation does not. This observation indicates that the lifespan effects of cholinergic signaling can be modulated by serotonin. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4),, GABA (unc-25), serotonin (tph-1), dopamine ,(cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (6) “RNAi of several of the receptors appear to modulate wild-type lifespan. Although I understand that this is not the main focus of the manuscript, the fact that this occurs should be mentioned in the results and discussed later on.”

      We thank the reviewer for pointing this out. As suggested by the reviewer, we have now pointed this out in the text by stating (page 9):“Notably, RNAi of several ACh receptors such as acr-11 appears to shorten wild-type lifespan, whereas RNAi of several other ACh receptors such as acr-9 extends wild-type lifespan, suggesting lifespan-modulating potential of ACh receptors (Figure S3).”

      (7) “Cholinergic signaling and ACR-6 have been previously shown to regulate pharyngeal pumping/feeding behavior. (https://doi.org/10.1016/j.jbc.2021.10146”). Could the requirements for ACR-6/cholinergic signaling in longevity be related to caloric restriction/nutritional intake which in turn could be expected to alter DAF-16 and HSF-1 activity? These previous studies should be referenced and discussed.”

      Thanks for the suggestion. As suggested by the reviewer, we have examined the pumping rate of acr-6 mutant worms. Our results showed that acr-6 mutation slightly reduced the pumping rate. As the decrease is relatively minor, we do not expect a major DR effect, though we cannot completely rule out such a possibility. Furthermore, as acr-6 acts in the pharynx to regulate pumping but in the intestine to regulate the role of cholinergic signaling in lifespan, we do not expect this would have a major contribution to our pathway. This new data has now been described in Figure S4I. As suggested by the reviewer, we have now pointed this out in the text by stating (page 10): Previous data has shown that cholinergic signaling and ACR-6 may control pharyngeal pumping[42]. As expected, we found that acr-6 mutation slightly reduced pumping rates (Figure S4G).”

      (8) “The expectation for the studies in Figure 3/DAF-16, is that animals expressing Ex[Pacr-2::syntaxin(T254I)], should have downregulated DAF-16 in the intestine. This needs to be shown through some method (increased daf-16 activation upon loss of cholinergic signaling does not necessarily imply that the converse is also true).”

      We thank the reviewer for the insightful suggestion. The reviewer has suggested us performing additional measurements to confirm that DAF-16 is the downstream transcription factor in the intestine. Specifically, the reviewer suggested testing if syntaxin(T254I) transgene signaling could inhibit DAF-16 activity. We have now followed the reviewer’s suggestion by performing two different assays. First, as also suggested by the first reviewer, we detected the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, which exhibited downregulation of these genes, consistent with the notion that increasing cholinergic motor neuron activity inhibits DAF-16. This data has now been described in Figure S5A. Second, we performed an assay to detect DAF-16 subcellular localization pattern in the intestine. We found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, suggesting that ACR-16 inhibits DAF-16, which is consistent with our model. This new data has now been described in Figure S5E. As suggested by the reviewers, we have now pointed this out in the text by stating (page 11): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons. To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promoted nuclear translocation of DAF-16, confirming that ACh signaling inhibits DAF-16 activity (Figure S5B).”

      (9) “Similarly, it would be good to have additional lines of evidence that signaling through GAR-3 impinges on HSF1, and that the lifespan effects are not due to non-specific effects of hsf-1 knockdown, which could lead to several un-related deficiencies and compromise lifespan (Figure 5b).”

      We thank the reviewer for the valuable suggestions. The reviewer correctly noted that the observed lifespan effect from hsf-1 RNAi could involve non-specific deficiencies. In response, we performed an assay to detect HSF-1 subcellular localization in the intestine upon gar-3 overexpression by using the strain EQ87 (iqIs28[pAH71(hsf-1p::hsf-1::gfp) + pRF4(rol-6)]). We found that the induced nuclear translocation of HSF-1 was weak. This result suggests that GAR-3 may modulate HSF-1 activity through a mechanism distinct from, or more subtle than, robust nuclear accumulation, or that its effect is highly dependent on the expression level and timing.

      (10) “Figure 6: An N2 control should be provided to assess the specificity of the mCherry signal from the intestine (given autofluorescence in the animals' gut).”

      Thanks for the suggestion. As suggested by the reviewer, we have now included the control in Figure S10.

      Reviewer #3 (Recommendations for The Authors):

      (1) “While the model is consistent with the data, there are alternatives that were not addressed. Additionally, there are some deficiencies in the interpretation of results that should be addressed, in my opinion. Possibly most importantly given the claims, the authors should address an alternative model: that it is the level of acetylcholine signaling that matters. Is it possible that the level auxin-inducible degradation of syntaxin(T254I) in acr-2 expressing cells is age dependent, such that one level increases lifespan and the other shortens it, and that the timing doesn't matter at all? A chronic dose response to auxin concentration would address if the level of syntaxin is a non-monotonic determinant of lifespan.”

      We sincerely thank the reviewer for raising this important alternative model. The reviewer suggested that the apparent temporal effect we observed might instead be explained by an age-dependent change in the efficiency of AID system in degrading syntaxin(T254I) in acr-2 expressing cells. That is, different levels of acetylcholine signaling, rather than timing, produce opposite lifespan outcomes. We agree that this is a formal possibility that our current data cannot fully rule out. On the other hand, other data in the manuscript suggests otherwise. For example, the expression of ACR-6 and GAR-3 in the intestine exhibited a temporal switch in early and mid-late life, providing support for a time-dependent mechanism. In addition, the differential requirement of the downstream transcription factors DAF-16 and HSF-1 in the early and mid-late life, respectively, provides further evidence supporting a temporal mechanism. Thus, while we agree that the possibility raised by the reviewer cannot be formally ruled out, the temporal mechanism we proposed may play an important role.

      The reviewer suggested performing a chronic dose-response experiment with varying auxin concentrations. Actually when we first employed the AID system to temporally manipulate motor neuron output at different life stages, we tested potential effects of auxin concentration. Using the soma-expressed TIR1 system, we found that, restoring syntaxin(T254I) activity from day 10 of adulthood extends lifespan, regardless of whether the prior suppression was maintained with 0.1 mM or 0.5 mM auxin. This suggests that the pro-longevity effect is likely not triggered by differences in the efficacy of prior suppression within this concentration range. We acknowledge that the tested dose range may not cover potential threshold concentrations. Furthermore, we cannot exclude the possibility of a non-linear relationship between auxin concentration and degradation efficiency. We agree that a comprehensive chronic dose-response analysis remains a valuable future direction, and we plan to employ more precise tools in the future to investigate the interplay between signal level and temporal context in lifespan regulation. The auxin concentration data have now been described in Figure S1C-1D by stating (page 7): “Comparable outcomes were obtained with both 0.1 mM and 0.5 mM auxin treatments (Figure S1C-1D).” As suggested by the reviewer, we have discussed the alternative model in the Discussion by stating (page 19): “An alternative mechanism based on differential levels of cholinergic signaling could also contribute to the observed lifespan effects.”

      (2) “Several times, including in several section headings, it is claimed that daf-16 (eg line 205-206) and acr-6 (eg line 185-186) function "early in life". This was not tested, so the claim is not warranted. For instance, these genes could act later in life to respond to signals made or sent early in life, or they could act both early and late, or only early (as they claim).”

      We thank the reviewer for this precise and important clarification. The reviewer is correct that our genetic interventions do not by themselves define the temporal window.

      Our experimental rationale was based on the observation that the lifespan-shortening effect of Pacr-2::syntaxin(T254I) expression is similar whether it is induced throughout life or specifically during larval stages (early life), indicating the detrimental effect results from enhanced motor neuron output in early life. Therefore, we used the lifelong expression paradigm as a tool to genetically dissect the downstream pathway triggered by early-life neuronal activation. We acknowledge the reviewer's point that this design does not formally prove that daf-16 or acr-6 acts only in early life; they could be required continuously or again later. However, we would like to note that our expression data show that the gut expression of ACR-6 is restricted to early life, which is consistent with a primary early-life function in this context.

      To reflect this more accurate interpretation, we have revised all relevant statements, including section headings. We now consistently state that daf-16 is required for the lifespan-shortening effect of cholinergic motor neuron, rather than claiming it functions "in early life". We have also toned down the discussion regarding their temporal function by stating (page 12): “Because this lifespan-shortening effect results from enhanced motor neuron output in early life and overwrites its beneficial effect at later stages, we propose this signaling circuit mediates the lifespan-shortening effect in early life.”

      (3) “In line 118, they note that such intervention led to a complex effect on the lifespan curve "by initially promoting worm's survival followed by inhibiting it at later stages." I think that while findings from later experiments support a time-dependent lifespan effect stemming from syntaxin function in the cholinergic motor neurons, this experiment's TeTx expression in those neurons is not time-dependent. Lifespan is an endpoint measure, so there is no sense in which a non-timed perturbation has an early or late effect on an individual. Rather, the effect on survival they observed is at the population level, their intervention increases the average lifespan while decreasing the worm-to-worm variation in lifespan.”

      We thank the reviewer for the critical and precise comment regarding our interpretation of the survival curves of TeTx transgenic worms. As suggested by the reviewers, we have revised the text by stating (page 6): “Surprisingly, such intervention led to a complex effect on the population survival curve by reducing both early mortality and the proportion of long-lived individuals (Figure 1A). Specifically, the 25% lifespan of these worms was prolonged, while their 75% and maximal lifespan were slightly shortened, leading to a mean lifespan slightly increased or unchanged compared to that of wild-type worms. This suggests that inhibiting cholinergic motor neurons may exert temporally distinct effects on survival, leading to decreased individual variation in lifespan.”

      (4) “The layout of the plots separating the responses of wild type and mutants to different panels makes it often difficult to interpret the results. For instance, do acr-6, gar-3, and other receptor mutants or knockdowns affect lifespan on their own? If they do, it matters to the interpretation whether they live longer or shorter than the wild type: which of the mutants phenocopy the lack of a lifespan-extending signal that activates them? Which phenocopy lacks a lifespan-shortening signal that activates them? Could they phenocopy the effect of an inhibitory signal? And critically, are the effects of these mutants on lifespan consistent with their model?”

      “The paper would be stronger if they determined when ACR-6 and GAR-3 functions are necessary and sufficient. Is it possible that the receptor doesn't matter, just that there be one of the two expressed in the intestine, and that other mechanisms determine the lifespan response to modulation of syntaxin(T254I)? What does time-dependent knockdown of these receptors do to daf-16 and hsf-1 localization and to the transcription of the targets of these transcription factors?”

      We thank the reviewer for these insightful comments. We have addressed the points as follows:

      As suggested, we have reorganized the lifespan data in Figure S4 to directly compare wild type and mutant/RNAi conditions within the same panels. This new presentation clarifies the autonomous effects of these genes. The data shows that loss of acr-6 or gar-2 (via RNAi or mutation) has minimal effect on lifespan. Notably, acr-8 RNAi shortens lifespan, whereas the acr-8 mutation does not, supporting our hypothesis of tissue-specific or compensatory roles for this receptor, as detailed in our following response to point (5). The reviewer's key question regarding when these receptors are necessary and sufficient is central to our model. We agree with the reviewer that complementary loss-of-function experiments with temporal precision, such as time-specific knockdown of the two receptors, would provide even stronger evidence. To this end, we attempted to generate endogenous degron-tagged alleles of acr-6 and gar-3 to apply the AID system for precise, stage-specific degradation. Unfortunately, despite multiple design attempts and screening efforts, we were unable to obtain homozeygous strains with the desired genomic edits using the same gRNA we used to knock in mCherry or other gRNAs. This is rather frustrating. Consequently, we are currently unable to perform the ideal temporally controlled loss-of-function experiments suggested by the reviewer.

      (5) “Why does RNAi but not mutation of acr-8 and gar-2 suppress the lifespan shortening effect of Pacr-2::syntaxin(T254I)?”

      Thanks for this important question regarding the differential effects of feeding RNAi versus mutation of acr-8 and gar-2. The discrepancy likely arises from the potential off-target effects of RNAi. RNAi is not strictly specific as it may target other related genes, generating a non-specific effect, whereas precise mutations in acr-8 and gar-2 alone may not produce the same effect.

      (6) “sid-1(-); Ex[Pacr-2::tetx lives longer than sid-1(-); in daf-16(+) worms in Figure 3G; so it is very hard to interpret the lack of effect of Pacr-2::tetx in daf-16(-) worms, since this transgene behaves differently in sid-1 mutants than in wild type worms. This would be clear if the two plots were combined (appropriately, since it is the same experiment). It looks like daf-16 RNAi has a shortening effect in the sid-1 mutant, but not in in sid-1 mutants expressing Pacr-2::text.”

      Thanks for this helpful suggestion. As suggested by the reviewer, we have now merged Figure 3G and 3H into one figure to present as Figure S5F. This combined presentation clarifies the comparison and shows that intestinal daf-16 RNAi shortens lifespan in both sid-1 mutants and sid-1 mutants expressing Pacr-2::TeTx.

      Reviewer #4 (Recommendations for The Authors):

      (1) “Lines 50-52: I would replace "leading to increased incidents in age-related diseases and probability of death" with "leading to the onset of age-related diseases and increased probability of death". Instead of "such an aging process" I would use "the aging process".”

      This has now been fixed.

      (2) “Figure 2E-F: By rescuing the expression of ACR-6 in neurons or intestinal cells alone, the authors show that the release of ACh from cholinergic neurons has effects on the intestine to shorten lifespan. Is ACR-6 expressed in other tissues (e.g. muscle?) It might be interesting to assess whether ACh also regulates lifespan through activating the ACR-6 receptor in other tissues or specifically targets the intestine. This question is partially answered with the tissue-specific RNAi experiments for DAF-16, but it is possible that ACR-6 also modulates other pathways beyond the tested transcription factors.”

      Analyzing the role of other tissues could also be applied to understand how GAR-3 influences lifespan. Along these lines, it would be interesting to expand the tissue-specific knockdown experiments for GAR-3 to other tissues. More importantly, these experiments can address whether activation of ACR-6 and GAR-3 can also have different effects on lifespan by regulating distinct tissues in addition to the intestine, and not only due to temporal expression patterns. For instance, whereas DAF-16 regulates lifespan primarily through its effects in the intestine, HSF1 could have effects on additional tissues. Although it would interesting to perform these experiments, I understand that the authors main focus is the nervous system-gut axis.

      We thank the reviewer for the insightful suggestions regarding the potential tissue-specific functions of ACR-6 and GAR-3. As noted in our response to point #6, endogenous expression imaging indicates that ACR-6 and GAR-3 are primarily expressed in neurons and the intestine with weak expression of GAR-3 in the muscle, so we tested the muscle. We found that muscle-specific RNAi of gar-2 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, whereas muscle-specific RNAi of gar-3 does not. This result further supports that GAR-3 primarily exerts this effect in the intestine.

      (3) “Can the authors specify in the corresponding figure legend at what age they tested sod-3 and mtl-1 expression in Pacr-2::TeTx worms (Figure 3F)? This is important to support the conclusions of the paper. Along these lines, can the authors also specify at what age they quantified the expression of HSF-1 targets (Figure 5F).”

      Thanks for the suggestion. As recommended, we have now provided the worm age in Figure 3F (day 1 adult) and Figure 5F legends (day 10 adult).

      (4) “To further strengthen the authors' conclusions, it might be interesting to examine the intracellular localization of DAF-16 in the intestine of Pacr-2::TeTx and syntaxin(T254I) worms compared to controls.”

      We thank the reviewer for this valuable suggestion, which was also raised by another reviewer. In response, we examined the subcellular localization of DAF-16 in the intestine. Direct imaging in the Pacr-2::TeTx or Pacr-2::syntaxin(T254I) backgrounds was technically challenging because their fluorescent protein tags (YFP or mCherry) would interfere with the detection of DAF-16::GFP. Therefore, we adopted an alternative approach by modulating the activity of acr-6, the intestinal acetylcholine receptor that transmits cholinergic signals from motor neurons to DAF-16. We found that acr-6 RNAi promotes the nuclear translocation of DAF-16. These new data are presented in Figure S5E by stating (page 11): “To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, confirming that ACh signaling modulate DAF-16 activity (Figure S5B).”

      (5) “The results with gar-2 RNAi are fascinating. I am very curious (and I assume potential readers too) about what tissues mediate the mid-late life effects of GAR-2 in longevity. Perhaps the authors could add experiments in a couple of other tissues known to regulate organismal lifespan (e.g. muscle). However, I totally understand why the authors focused on GAR-3, especially because both GAR-3 and ACR-6 have effects on the intestine and this is sufficient for the main conclusions of the paper.”

      We sincerely thank the reviewer for the insightful suggestion and for highlighting the potential role of GAR-2. In response, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in the muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 will be investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-3 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (6) “Figure 6: It seems that the genes are also expressed in the muscle. Can the authors include images of other tissues in supplementary figures?”

      Thanks for the suggestion. As suggested by the reviewer, we have now included images of whole worms expressing mCherry, which was knocked in the endogenous locus off gar-3 or acr-6 by CRISPR in Figure S10. However, we did not detect strong expression of gar-3 or acr-6 in the muscle under the conditions examined, which may be limited by the low endogenous protein expression level of the two genes in the muscle, though the CeNGEN website shows they are expressed in the muscle. Determining the precise spatiotemporal expression profiles of these receptors will likely require more sensitive methods. We plan to address this important question in future studies by using such refined approaches.

    1. Author response:

      General Statements

      We thank all three reviewers for their time taken to provide valuable feedback on our manuscript, and for appreciating the quality and usefulness of our data and results presented in our study. We have improved the manuscript based on their suggestions and provide a detailed, point-by-point response below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you for your positive feedback.

      There are several single-cell methodologies all claim to co-profile chromatin modifications and gene expression from the same individual cell, such as CoTECH, Paired-tag and others. Although T-ChIC employs pA-Mnase and IVT to obtain these modalities from single cells which are different, could the author provide some direct comparisons among all these technologies to see whether T-ChIC outperforms?

      In a separate technical manuscript describing the application of T-ChIC in mouse cells (Zeller, Blotenburg et al 2024, (Zeller et al., 2024)), we have provided a direct comparison of data quality between T-ChIC and other single-cell methods for chromatin-RNA co-profiling (Please refer to Fig. 1C,D and Fig. S1D, E, of the preprint). We show that compared to other methods, T-ChIC is able to better preserve the expected biological relationship between the histone modifications and gene expression in single cells.

      In current study, T-ChIC profiled H3K27me3 and H3K4me1 modifications, these data look great. How about other histone modifications (eg H3K9me3 and H3K36me3) and transcription factors?

      While we haven’t profiled these other modifications using T-ChIC in Zebrafish, we have previously published high quality data on these histone modifications using the sortChIC method, on which T-ChIC is based (Zeller, Yeung et al 2023)(Zeller et al., 2022). In our comparison, we find that histone modification profiles between T-ChIC and sortChIC are very similar (Fig. S1C in Zeller, Blotenburg et al 2024). Therefore the method is expected to work as well for the other histone marks.

      T-ChIC can detect full length transcription from the same single cells, but in FigS3, the authors still used other published single cell transcriptomics to annotate the cell types, this seems unnecessary?

      We used the published scRNA-seq dataset with a larger number of cells to homogenize our cell type labels with these datasets, but we also cross-referenced our cluster-specific marker genes with ZFIN and homogenized the cell type labels with ZFIN ontology. This way our annotation is in line with previous datasets but not biased by it. Due the relatively smaller size of our data, we didn’t expect to identify unique, rare cell types, but our full-length total RNA assay helps us identify non-coding RNAs such as miRNA previously undetected in scRNA assays, which we have now highlighted in new figure S1c .

      Throughout the manuscript, the authors found some interesting dynamics between chromatin state and gene expression during embryogenesis, independent approaches should be used to validate these findings, such as IHC staining or RNA ISH?

      We appreciate that the ISH staining could be useful to validate the expression pattern of genes identified in this study. But to validate the relationships between the histone marks and gene expression, we need to combine these stainings with functional genomics experiments, such as PRC2-related knockouts. Due to their complexity, such experiments are beyond the scope of this manuscript (see also reply to reviewer #3, comment #4 for details).

      In Fig2 and FigS4, the authors showed H3K27me3 cis spreading during development, this looks really interesting. Is this zebrafish specific? H3K27me3 ChIP-seq or CutTag data from mouse and/or human embryos should be reanalyzed and used to compare. The authors could speculate some possible mechanisms to explain this spreading pattern?

      Thanks for the suggestion. In this revision, we have reanalysed a dataset of mouse ChIP-seq of H3K27me3 during mouse embryonic development by Xiang et al (Nature Genetics 2019) and find similar evidence of spreading of H3K27me3 signal from their pre-marked promoter regions at E5.5 epiblast upon differentiation (new Figure S4i). This observation, combined with the fact that the mechanism of pre-marking of promoters by PRC1-PRC2 interaction seems to be conserved between the two species (see (Hickey et al., 2022), (Mei et al., 2021) & (Chen et al., 2021)), suggests that the dynamics of H3K27me3 pattern establishment is conserved across vertebrates. But we think a high-resolution profiling via a method like T-ChIC would be more useful to demonstrate the dynamics of signal spreading during mouse embryonic development in the future. We have discussed this further in our revised manuscript.

      Reviewer #1 (Significance):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you very much for your supportive remarks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Joint analysis of multiple modalities in single cells will provide a comprehensive view of cell fate states. In this manuscript, Bhardwaj et al developed a single-cell multi-omics assay, T-ChIC, to simultaneously capture histone modifications and full-length transcriptome and applied the method on early embryos of zebrafish. The authors observed a decoupled relationship between the chromatin modifications and gene expression at early developmental stages. The correlation becomes stronger as development proceeds, as genes are silenced by the cis-spreading of the repressive marker H3k27me3. Overall, the work is well performed, and the results are meaningful and interesting to readers in the epigenomic and embryonic development fields. There are some concerns before the manuscript is considered for publication.

      We thank the reviewer for appreciating the quality of our study.

      Major concerns:

      (1) A major point of this study is to understand embryo development, especially gastrulation, with the power of scMulti-Omics assay. However, the current analysis didn't focus on deciphering the biology of gastrulation, i.e., lineage-specific pioneer factors that help to reform the chromatin landscape. The majority of the data analysis is based on the temporal dimension, but not the cell-type-specific dimension, which reduces the value of the single-cell assay.

      We focussed on the lineage-specific transcription factor activity during gastrulation in Figure 4 and S8 of the manuscript and discovered several interesting regulators active at this stage. During our analysis of the temporal dimension for the rest of the manuscript, we also classified the cells by their germ layer and “latent” developmental time by taking the full advantage of the single-cell nature of our data. Additionally, we have now added the cell-type-specific H3K27me3 demethylation results for 24hpf in response to your comment below. We hope that these results, together with our openly available dataset would demonstrate the advantage of the single-cell aspect of our dataset.

      (2) The cis-spreading of H3K27me3 with developmental time is interesting. Considering H3k27me3 could mark bivalent regions, especially in pluripotent cells, there must be some regions that have lost H3k27me3 signals during development. Therefore, it's confusing that the authors didn't find these regions (30% spreading, 70% stable). The authors should explain and discuss this issue.

      Indeed we see that ~30% of the bins enriched in the pluripotent stage spread, while 70% do not seem to spread. In line with earlier observations(Hickey et al., 2022; Vastenhouw et al., 2010), we find that H3K27me3 is almost absent in the zygote and is still being accumulated until 24hpf and beyond. Therefore the majority of the sites in the genome still seem to be in the process of gaining H3K27me3 until 24hpf, explaining why we see mostly “spreading” and “stable” states. Considering most of these sites are at promoters and show signs of bivalency, we think that these sites are marked for activation or silencing at later stages. We have discussed this in the manuscript (“discussion”). However, in response to this and earlier comment, we went back and searched for genes that show H3K27me3 demethylation in the most mature cell types (at 24 hpf) in our data, and found a subset of genes that show K27 demethylation after acquiring them earlier. Interestingly, most of the top genes in this list are well-known as developmentally important for their corresponding cell types. We have added this new result and discussed it further in the manuscript (Fig. 2d,e, , Supplementary table 3).

      Minors:

      (1) The authors cited two scMulti-omics studies in the introduction, but there have been lots of single-cell multi-omics studies published recently. The authors should cite and consider them.

      We have cited more single-cell chromatin and multiome studies focussed on early embryogenesis in the introduction now.

      (2) bT-ChIC seems to have been presented in a previous paper (ref 15). Therefore, Fig. 1a is unnecessary to show.

      Figure 1a. shows a summary of our Zebrafish TChIC workflow, which contains the unique sample multiplexing and sorting strategy to reduce batch effects, which was not applied in the original TChIC workflow. We have now clarified this in “Results”.

      (3) It's better to show the percentage of cell numbers (30% vs 70%) for each heatmap in Figure 2C.

      We have added the numbers to the corresponding legends.

      (4) Please double-check the citation of Fig. S4C, which may not relate to the conclusion of signal differences between lineages.

      The citation seems to be correct (Fig. S4C supplements Fig. 2C, but shows mesodermal lineage cells) but the description of the legend was a bit misleading. We have clarified this now.

      (5) Figure 4C has not been cited or mentioned in the main text. Please check.

      Thanks for pointing it out. We have cited it in Results now.

      Reviewer #2 (Significance):

      Strengths:

      This work utilized a new single-cell multi-omics method and generated abundant epigenomics and transcriptomics datasets for cells covering multiple key developmental stages of zebrafish.

      Limitations:

      The data analysis was superficial and mainly focused on the correspondence between the two modalities. The discussion of developmental biology was limited.

      Advance:

      The zebrafish single-cell datasets are valuable. The T-ChIC method is new and interesting.

      The audience will be specialized and from basic research fields, such as developmental biology, epigenomics, bioinformatics, etc.

      I'm more specialized in the direction of single-cell epigenomics, gene regulation, 3D genomics, etc.

      Thank you for your remarks.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors.

      Major concerns

      (1) Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset.

      Thanks for pointing this out. We had, in fact, performed T-ChIC experiments in four rounds of biological replicates (independent clutch of embryos) and merged the data to create our resource. Although not all timepoints were profiled in each replicate, two timepoints (10 and 24hpf) are present in all four, and the celltype composition of these replicates from these 2 timepoints are very similar. We have added new plots in figure S2f and added (new) supplementary table (#1) to highlight the presence of biological replicates.

      (2) The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R<sup>2</sup> distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off.

      Thank you for this suggestion. We did use 10-fold cross validation during training and obtained the R<sup>2</sup>> values of TF motifs from the independent test set as an unbiased estimate. However, the cutoff of R<sup>2</sup> > 0.6 to select the TFs for classification was indeed arbitrary. In the revised version, we now report the FDR-adjusted p-values for these R<sup>2</sup> estimates based on permutation tests, and select TFs with a cutoff of padj < 0.01. We have updated our supplementary table #4 to include the p-values for all tested TFs. However, we see that our arbitrary cutoff of 0.6 was in fact, too stringent, and we can classify many more TFs based on the FDR cutoffs. We also updated our reported numbers in Fig. 4c to reflect this. Moreover, supplementary table #4 contains the complete list of TFs used in the analysis to allow others to choose their own cutoff.

      (3) Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes.

      We agree that independent validation of the functions of our predicted TFs on target gene activity would be important. During this revision, we analysed recently published scRNA-seq data of Saunders et al. (2023) (Saunders et al., 2023), which includes CRISPR-mediated F0 knockouts of a couple of our predicted TFs, but the scRNAseq was performed at later stages (24hpf onward) compared to our H3K4me1 analysis (which was 4-12 hpf). Therefore, we saw off-target genes being affected in lineages where these TFs are clearly not expressed (attached Fig 1). We therefore didn’t include these results in the manuscript. In future, we aim to systematically test the TFs predicted in our study with CRISPRi or similar experiments.

      (4) The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression.

      We appreciate the suggestion that indeed PRC2-disruption followed by T-ChIC or other forms of validation would be needed to confirm whether the H3K27me3 spreading is indeed causally linked to the silencing of the identified target genes. But performing this validation is complicated because of multiple reasons: 1) due to the EZH2 contribution from maternal RNA and the contradicting effects of various EZH2 zygotic mutations (depending on where the mutation occurs), the only properly validated PRC2-related mutant seems to be the maternal-zygotic mutant MZezh2, which requires germ cell transplantation (see Rougeot et al. 2019 (Rougeot et al., 2019)) , and San et al. 2019 (San et al., 2019) for details). The use of inhibitors have been described in other studies (den Broeder et al., 2020; Huang et al., 2021), but they do not show a validation of the H3K27me3 loss or a similar phenotype as the MZezh2 mutants, and can present unwanted side effects and toxicity at a high dose, affecting gene expression results. Moreover, in an attempt to validate, we performed our own trials with the EZH2 inhibitor (GSK123) and saw that this time window might be too short to see the effect within 24hpf (attached Fig. 2). Therefore, this validation is a more complex endeavor beyond the scope of this study. Nevertheless, our further analysis of H3K27me3 de-methylation on developmentally important genes (new Fig. 2e-f, Sup. table 3) adds more confidence that the polycomb repression plays an important role, and provides enough ground for future follow up studies.

      Minor concerns

      (1) Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development.

      We agree that H3K27me3 alone would not be sufficient to fully understand the repressive chromatin state. Extension to other chromatin marks and DNA methylation would be the focus of our follow up works.

      (2) Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided.

      As mentioned in the manuscript, we provide an open-source pre-processing pipeline “scChICflow” to perform all these steps (github.com/bhardwaj-lab/scChICflow). We have now also provided the configuration files on our zenodo repository (see below), which can simply be plugged into this pipeline together with the fastq files from GEO to obtain the processed dataset that we describe in the manuscript. Additionally, we have also clarified the peak calling and post-processing steps in the manuscript now.

      (3) Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added.

      We have now publicly released the .h5ad files with raw counts, normalized counts, and complete gene and cell-level metadata, along with signal tracks (bigwigs) and peaks on GEO. Additionally, we now also released the source datasets and notebooks (Rmarkdown format) on Zenodo that can be used to replicate the figures in the manuscript, and updated our statements on “Data and code availability”.

      (4) Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference.

      Thank you for spotting them. We have fixed these issues.

      Reviewer #3 (Significance):

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

      Thank you for your comments. We hope to have addressed your concerns in this revised version of our manuscript.

      Author response image 1.

      (1) (top) expression of tbx16, which was one of the common TFs detected in our study and also targeted by Saunders et al by CRISPR. tbx16 expression is restricted to presomitic mesoderm lineage by 12hpf, and is mostly absent from 24hpf cell types. (bottom) shows DE genes detected in different cellular neighborhoods (circled) in tbx16 crispants from 24hpf subset of cells in Saunders et al. None of these DE genes were detected as “direct targets” in our analysis and therefore seem to be downstream effects. (2) Effect of 3 different concentrations of EZH2 inhibitor (GSK123) on global H3K27me3 quantified by flow cytometry using fluorescent coupled antibody (same as we used in T-ChIC) in two replicates. The cells were incubated between 3 and 10 hpf and collected afterwards for this analysis. We observed a small shift in H3K27me3 signal, but it was inconsistent between replicates.

      References

      Chen, Z., Djekidel, M. N., & Zhang, Y. (2021). Distinct dynamics and functions of H2AK119ub1 and H3K27me3 in mouse preimplantation embryos. Nature Genetics, 53(4), 551–563. den Broeder, M. J., Ballangby, J., Kamminga, L. M., Aleström, P., Legler, J., Lindeman, L. C., & Kamstra, J. H. (2020). Inhibition of methyltransferase activity of enhancer of zeste 2 leads to enhanced lipid accumulation and altered chromatin status in zebrafish. Epigenetics & Chromatin, 13(1), 5.

      Hickey, G. J., Wike, C. L., Nie, X., Guo, Y., Tan, M., Murphy, P. J., & Cairns, B. R. (2022). Establishment of developmental gene silencing by ordered polycomb complex recruitment in early zebrafish embryos. eLife, 11, e67738.

      Huang, Y., Yu, S.-H., Zhen, W.-X., Cheng, T., Wang, D., Lin, J.-B., Wu, Y.-H., Wang, Y.-F., Chen, Y., Shu, L.-P., Wang, Y., Sun, X.-J., Zhou, Y., Yang, F., Hsu, C.-H., & Xu, P.-F. (2021). Tanshinone I, a new EZH2 inhibitor restricts normal and malignant hematopoiesis through upregulation of MMP9 and ABCG2. Theranostics, 11(14), 6891–6904.

      Mei, H., Kozuka, C., Hayashi, R., Kumon, M., Koseki, H., & Inoue, A. (2021). H2AK119ub1 guides maternal inheritance and zygotic deposition of H3K27me3 in mouse embryos. Nature Genetics, 53(4), 539–550.

      Rougeot, J., Chrispijn, N. D., Aben, M., Elurbe, D. M., Andralojc, K. M., Murphy, P. J., Jansen, P. W. T. C., Vermeulen, M., Cairns, B. R., & Kamminga, L. M. (2019). Maintenance of spatial gene expression by Polycomb-mediated repression after formation of a vertebrate body plan. Development (Cambridge, England), 146(19), dev178590.

      San, B., Rougeot, J., Voeltzke, K., van Vegchel, G., Aben, M., Andralojc, K. M., Flik, G., & Kamminga, L. M. (2019). The ezh2(sa1199) mutant zebrafish display no distinct phenotype. PloS One, 14(1), e0210217.

      Saunders, L. M., Srivatsan, S. R., Duran, M., Dorrity, M. W., Ewing, B., Linbo, T. H., Shendure, J., Raible, D. W., Moens, C. B., Kimelman, D., & Trapnell, C. (2023). Embryo-scale reverse genetics at single-cell resolution. Nature, 623(7988), 782–791.

      Vastenhouw, N. L., Zhang, Y., Woods, I. G., Imam, F., Regev, A., Liu, X. S., Rinn, J., & Schier, A. F. (2010). Chromatin signature of embryonic pluripotency is established during genome activation. Nature, 464(7290), 922–926.

      Zeller, P., Blotenburg, M., Bhardwaj, V., de Barbanson, B. A., Salmén, F., & van Oudenaarden, A. (2024). T-ChIC: multi-omic detection of histone modifications and full-length transcriptomes in the same single cell. In bioRxiv (p. 2024.05.09.593364). https://doi.org/10.1101/2024.05.09.593364

      Zeller, P., Yeung, J., Viñas Gaza, H., de Barbanson, B. A., Bhardwaj, V., Florescu, M., van der Linden, R., & van Oudenaarden, A. (2022). Single-cell sortChIC identifies hierarchical chromatin dynamics during hematopoiesis. Nature Genetics. https://doi.org/10.1038/s41588-022-01260-3

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We have addressed this comment with new GLMs. The new GLM1 includes both non-gazeweighted and gaze-weighted regressors and finds that the vmPFC and striatum reflect gazeweighted sampled value, while the preSMA reflects gaze-weighted accumulated value. We have now dropped the old GLM3 and added two other GLMs, one that explicitly interacts accumulated value with accumulated dwell, and the other that considers only partial gaze discounting. These analyses all support the preSMA as encoding gaze-weighted accumulated value.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the preSMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆_S_V| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in preSMA activity builds naturally on established findings. However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

      Recommendations for the authors:

      Editor Comments:

      Reviewer 1 in particular makes a number of suggestions for additional analyses that would help to strengthen the evidence supporting your conclusions.

      We thank the editor and the reviewers for the helpful suggestions for improving our manuscript. We discuss our efforts to address each point below.

      Reviewer #1 (Recommendations for the authors):

      (1) To address my concerns about GLM2, the first thing to do might be to simply show the correlation between the regressors used across the three different models (e.g., as a figure in the methods). Although the authors have done a good job to ensure that AV and SV are decorrelated when including them both in the same model, they haven't shown us whether the regressors used in, for example, GLM2 are correlated/similar to the regressors used in GLM1. This is important information for interpretation.

      Thank you for raising concerns about the overlap between different models. We agree that additional information regarding the correlation among sample-level regressors would aide readers in understanding the differences among the analyses. We now include this information in Figure 7 in the Methods section, as requested. While |SV| was uncorrelated with gaze-weighted |SV| (|SV<sub>Gaze</sub>|; Pearson’s r = 0.002, p = 0.848), lagged |AV| was significantly correlated with lagged, gaze-weighted |AV| (lagged |AV<sub>Gaze</sub>|; r = 0.365, p < 2.2 × 10<sup.-16</sup>).

      (2) The acid test for gaze-modulation of value signals would be to show that the gazemodulated signals explain the fMRI results over and above the non-gaze-modulated signals. This could simply mean including SVgaze and SV (and equivalent terms for AV) within the same GLM. Following from point (1), the authors may point out that these terms are highly correlated - yes, but the GLM will then test for the effects of SVgaze *over and above* the effects of SV. (In fact, although I'd normally caution against orthogonalisation - it would here be totally legitimate to orthogonalise SVgaze w.r.t. SV).

      We appreciate the reviewer’s suggestions for more robust tests of the presence of gaze-weighted signals. For reasons highlighted in our response above, we were initially hesitant to include both types of regressors in the same model due to their significant correlation. However, we now report the results of this analysis in the main text as the new GLM 1. This model incorporates both gaze-weighted and non-gaze-weighted terms. For each contrast we used the same procedures as reported in the main text (family-wise error corrected at p<0.05 and clusterforming thresholds at p<0.005).

      In the vmPFC, we found significant effects of both |∆SV| (peak voxel: x = -14, y = 44, z = -12; t = 3.90, p = 0.0190) and |∆SV<sub>Gaze</sub>| (peak voxel: x = 4, y = 38, z = -4; t= 5.21 p = 0.004), but no effects of |∆AV| or |∆AV<sub>Gaze</sub>|. The striatum also showed a significant correlation with |∆SV<sub>Gaze</sub>| (peak voxel: x = 22, y = 20, z = -10; t = 5.10 p = 0.014), but no other regressors.

      In the pre-SMA, we found a significantly positive relationship with both |∆AV| (peak voxel: x = 4, y = 14, z = 50; t = 4.75 p < 0.001) and |∆AV<sub>Gaze</sub>| (peak voxel: x = 4, y = 18, z = 50; t = 2.98, p = 0.032). In contrast, the dlPFC (x = 40, y = 34, z = 26; t = 6.83, p < 0.001) and IPS (x = 42, y = -50, z = 42; t = 5.16, p \= 0.010) were only correlated with |∆AV|. No other significant contrasts emerged.

      These results provide direct support for the presence of gaze-modulated value signals in the brain, which we now describe in the main text Results section.

      (3) With regards to GLM3, it would help to provide a bit more detail on what the time series looks like for the gaze regressor in this model - is it the entire timeseries of gaze (which presumably shifts back/forth between options multiple times within each trial) which is being convolved with the HRF? This seems different from how gaze is being calculated in GLM2, where it is amalgamated into an 'average gaze difference' within a sample between left/right options, if I understand the text correctly?

      We apologize for the lack of details regarding how we operationalized the gaze regressors in our analyses. You are correct that the gaze regressor was calculated differently in GLM2 and GLM3.

      However, in response to the reviewer’s points above (Major Point 2) and below (Major Point 4, Minor Point 1), we have decided to drop the old GLM3 from the paper while incorporating a revised GLM1 (combining old GLM1 and GLM2) and two new GLMs (see responses to Major Point 4 and Minor Point 1) to provide clearer evidence for gaze modulation of accumulated value in the brain.

      (4) Also, is there not a reason why it isn't more appropriate to interact AV with *previously deployed gaze difference* (accumulated across previous samples) in this model, rather than the current gaze location? The latter seems to rely upon the indirect linkage via the behavioural modelling result, which seems to weaken the claim.

      We thank the reviewer for this suggestion. We agree that our original GLM3 approach was limited because it interacted AV with current binary gaze location, which relies on the indirect behavioral relationship we established (i.e., that current gaze is negatively correlated with accumulated past gaze).

      The original GLM2 (which is now incorporated into the new GLM1) implemented something similar to what the reviewer is suggesting as it used gaze-weighted values accumulated across all previous samples. Specifically, in GLM2, the gaze-weighted accumulated value (AV<sub>gaze</sub>) was calculated as the sum of all previous sampled values, each weighted by the proportion of gaze allocated to each option during that sampling period.

      However, to more directly test whether accumulated evidence signals are modulated by accumulated gaze allocation we have now run an additional analysis (GLM2). In this analysis we have revised the old GLM3 to include additional regressors: ∆SV, lagged ∆AV, current gaze location, accumulated dwell advantage, ∆SV × current gaze location, and lagged ∆AV × accumulated dwell advantage.

      The two new regressors were defined as follows:

      Accumulated dwell advantage: For each sample t, accumulated dwell advantage represents the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      ∆AV × accumulated dwell advantage: The interaction between accumulated values and accumulated dwell advantage, which directly tests whether brain regions encoding accumulated value are modulated by the history of gaze allocation.

      This approach is conceptually similar to old GLM2’s gaze-weighting method, but allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation.

      Here, we found that the pre-SMA showed a positive correlation with the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.10, p = 0.0258). Surprisingly, the striatum also showed a correlation with this term (peak: x = -16, y = 10, z = -6; t = 4.07, p = 0.0176). No other ROIs showed significant relationships.

      This analysis provides additional evidence that pre-SMA encodes accumulated value signals that are modulated by accumulated gaze allocation, without relying on indirect relationships between current and past gaze. We now report these results in the main text as GLM2 as follows:

      “To more directly test whether accumulated evidence signals were modulated by accumulated gaze allocation throughout a trial, we conducted additional, exploratory analyses. Specifically, we ran a GLM that incorporated the following two terms: accumulated dwell advantage and ∆AV × accumulated dwell advantage, in addition to ∆SV, the current gaze location, and ∆SV × current gaze location.

      We calculated accumulated dwell advantage as follows: For each sample t, accumulated dwell advantage is the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      We also included the interaction between accumulated dwell advantage and ∆AV (i.e., signed accumulated evidence). This interaction term is positive when gaze is primarily to the left and left has more value or when gaze is primarily to the right and right has more value. This interaction term directly tests whether brain regions encoding accumulated evidence are modulated by the history of gaze allocation. This approach allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation itself.

      This GLM revealed a positive correlation between pre-SMA activity and the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.01, p = 0.026). Surprisingly, the striatum also showed a correlation with this term (peak voxel: x = -16, y = 10, z = -6; t = 4.07, p = 0.018). Additionally, activity in the dlPFC was positively correlated with ∆SV (peak voxel: x = -36, y = 34, z = 22; t = 3.96, p \= 0.016). No other ROIs showed significant relations.

      This analysis provides additional evidence that the pre-SMA encodes accumulated value signals that are modulated by the history of gaze allocation.”

      Minor

      (1) "In Trial A, the subject looks left 30% of the time and right 70% of the time. In Trial B, the subject looks left 70% of the time and right 30% of the time. In Trial A, the net input value ("drift rate") would be |0.3 ∙ 7 − 0.7 ∙ 3| = 0. In Trial B, the drift rate would be |0.7 ∙ 7 − 0.3 ∙ 3| = 4." I may be missing something, but isn't this consistent with an aDDM with theta=0, rather than theta=0.3-0.5 as is typically found?

      The reviewer raises an important point about our assumptions regarding attentional discounting. We agree that our approach could be problematic as it may assume stronger discounting than has been observed in the literature.

      To address this concern, we calculated drift on a sample-by-sample basis before aggregating to the trial level. Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>)

      γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>),

      where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent fixating left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. This approach preserves the fine-grained temporal dynamics of gazedependent value accumulation that would be lost by calculating gaze proportions only at the trial level.

      Using this sample-level method in a mixed-effects logistic regression predicting choice (left vs. right), we estimated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25). These estimates are somewhat higher than the typical aDDM findings of attentional bias (θ = 0.3–0.5). This may reflect the drawn-out nature of this task relative to prior aDDM tasks.

      Next, we ran a new GLM that incorporated these θ estimates in the sampled value estimates. For this GLM3, we computed θ-weighted sampled-value (|∆_TW_SV|) as:

      TWSV = (G<sub>Left</sub> × (V<sub>Left</sub> – θV<sub>Right</sub>)) – (G_R × (V<sub>Right</sub> – θV<sub>Left</sub>)).

      Similar to GLM1, we computed an accumulated value signal based on the lagged sum of previous samples’ |∆_TW_SV| (i.e., |∆_TW_AV|).

      We found significant positive effects of |∆TW_SV| in the vmPFC (peak voxel: x = -14, y = 44, z = -12; t = 3.57, _p = 0.0270) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.0198), but in no other ROI.

      In contrast, we found significant positive relationships between |∆TW_AV| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, _p = 0.0014), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.0040), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0000). Notably, we also observed a significant relationship between |∆TW_AV| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, _p = 0.0410). No other significant contrasts emerged.

      We now report this additional analysis as GLM3 in the main text, as follows:

      “In our first set of analyses, we implicitly assumed complete discounting of non-fixated information, in contrast with previous studies that have generally found only partial discounting (Krajbich et al., 2010; Sepulveda et al., 2020; Smith & Krajbich, 2019; Westbrook et al., 2020). To verify that our results are robust to inter-subject variability in attentional discounting, we estimated subject-level attentional discounting parameters and then re-estimated our original GLM with new, recalculated gaze-weighted value regressors.

      Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>) γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>), where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent gazing left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. We then ran a mixed-effects logistic regression predicting choice (left vs. right) as a function of β and γ and then calculated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25).

      Next, for the GLM, we computed θ-weighted sampled-value (|∆SV<sub>θ</sub>|) as:

      SV<sub>θ</sub> = (G<sub>Left</sub> × (V<sub>Left</sub> − _θ_V<sub>Right</sub>)) – (G<sub>Right</sub> × (V<sub>Right</sub> − _θ_V<sub>Left</sub>))

      Similar to the original GLM, we computed an accumulated value signal, |∆AV<sub>θ</sub>|, based on the lagged sum of previous samples’ |∆SV<sub>θ</sub>|.

      We found significant positive effects of |∆SV<sub>θ</sub>| in the vmPFC (peak voxel: x = -14, y = 44, z = 12; t = 3.57 p = 0.027) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.020), but in no other ROI.

      In contrast, we found significant positive relationships between |∆AV<sub>θ</sub>| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, p = 0.001), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.004), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0001). Notably, we also observed a significant relationship between |∆AV<sub>θ</sub>| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, p = 0.041). No other significant contrasts emerged.

      In summary, these analyses provide additional evidence that the vmPFC encodes gaze-weighted sampled value signals and the pre-SMA encodes gaze-weighted accumulated value signals, though other correlations also emerged.”

      (2) The reporting of statistical results in the fMRI could be sharpened - e.g. in the figure legends, don't just say "Voxels thresholded at p < .05.", but make clear whether you mean FWE whole-brain corrected (I think you do from the methods) or whether this is uncorrected for display; similarly, for the peak voxels, report the associated Z statistic at that voxel rather than just "negative beta".

      We agree that it is important to include additional details regarding how we reported the statistical results. We now clarify our procedures in the main text:

      “We report results using FWE-corrected statistical significance of p < 0.05 and a cluster significance threshold of p < 0.005.”

      We now also report the T statistics for peak voxels.

      (3) A couple of the citations are slightly wrong - e.g., Kolling et al 2012 shouldn't be cited as arguing for decision conflict, as in fact it argues strongly against this account and in favour of a foraging account of ACC activity. Similarly, Hunt et al 2018 doesn't provide support for decision conflict; instead, it shows signals in ACC show evidence accumulation for left/right actions over time (although not whether these accumulator signals are gazeweighted, in the same way as the present study).

      We thank the reviewer for pointing out these mistakes in our citations. We have revised the references throughout.

      Reviewer #2 (Recommendations for the authors):

      (1) In some places, the introduction would benefit from fleshing out certain points. For example it is stated “For instance, decisions that are less predictable also tend to take more time (Konovalov & Krajbich, 2019) and can be influenced by attention manipulations (Parnamets et al., 2015; Tavares et al., 2017; Gwinn et al., 2019; Bhatnagar & Orquin, 2022). The quantitative relations between these measures argue for an evidenceaccumulation process.” It is not clear why the relations between them argue for an EA process, and the reader would benefit from some further explanation.

      We thank the reviewer for this helpful suggestion. We agree that the original text did not sufficiently explain why these relationships support evidence-accumulation models. We have revised the introduction to better articulate the mechanistic basis for this claim.

      This revision clarifies these points in the main text:

      “Decisions like this are thought to rely on a bounded, evidence-accumulation process that depends on factors such as the value of the sampled information and shifts in attention. According to this framework, when two options are similar in value, evidence accumulates more slowly towards the decision threshold, resulting in longer response times (RT) and more opportunity for shifts in attention to influence the choice outcome. In contrast, when one option is clearly superior, evidence accumulates more rapidly and the decision is made quickly with less of a relation between gaze and choice. This choice process produces reliable, quantitative patterns in choice, RT, and eye-tracking data (Ashby et al., 2016; Callaway et al., 2021; Gluth et al., 2018; Krajbich et al., 2010; Smith & Krajbich, 2018). For instance, decisions with similar values are more random (i.e., less predictable), tend to take more time (Konovalov & Krajbich, 2019), and can be experimentally manipulated by diverting attention towards one option more than the other (Bhatnagar & Orquin, 2022; Gwinn et al., 2019; Pärnamets et al., 2015; Pleskac et al., 2022; Tavares et al., 2017). Critically, these behavioral measures do not simply correlate; rather, they exhibit precise quantitative relationships consistent with evidence accumulation models (Konovalov & Krajbich, 2019).”

      (2) Some of the study hypotheses also need to be clarified. What are the hypotheses regarding how SV and AV should translate to BOLD in an input vs integrator region? Larger SV/AV = larger BOLD? What predictions would be made for a time-on-task or conflict region? Are the predictions the same or different? Clarifying this will help the reader to understand to what extent the gaze manipulation is pivotal in identifying integrator regions.

      We thank the reviewer for this excellent suggestion. We agree that it is useful to clearly articulate our hypotheses about BOLD signal predictions for different aspects of the model, and why gaze manipulation is critical for distinguishing between them. We have now expanded the introduction to clarify these predictions.

      For input regions, we predicted a straightforward positive relationship: larger sampled value (|ΔSV|) should produce larger BOLD activity. Input regions encode the momentary evidence being sampled (i.e., the relative value of currently presented stimuli). Consistent with prior work (Bartra et al., 2013), we expected such activity in the vmPFC and ventral striatum.

      Critically, we also predicted that these sampled value signals should be modulated by gaze location. The attentional drift-diffusion model (aDDM; Krajbich et al., 2010) posits that attended items receive full value weight while unattended items are discounted. Consistent with prior work (Lim et al., 2011), we expected stronger vmPFC/striatum activity when the higher-value item is fixated compared to when the lower-value item is fixated

      For integrator regions, we predicted an analogous positive relationship: larger accumulated value (|ΔAV|) should produce more BOLD activity. Accumulator regions encode the summed evidence over the course of the decision. Consistent with prior work (Hare et al. 2011; Gluth et al. 2021; Pisauro et al. 2017) we expected such activity in the pre-SMA, dlPFC, and, IPS.

      As with sampled value, we predicted that integrator activity should reflect gaze-weighted accumulated value. Just as inputs are modulated by current gaze, the accumulated evidence should be weighted by the history of gaze allocation over the entire trial.

      Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time.

      The conflict account predicts that BOLD activity should scale with inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long RT (Pisauro et al. 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoid this ambiguity – we analyze the effect of accumulated value at each point in time, not just at the time of decision. In this case, conflict should be inversely correlated with accumulated value. Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of values.

      A more serious concern is the potential link to putative time-on-task BOLD activity. Accumulated value inevitably increases with time, leading to a correlation between the two variables (Grinband et al. 2011; Holroyd et al., 2018; Mumford et al. 2024). This is where the gaze data become particularly important. Time-on-task regions should show no relation with gaze allocation. After accounting for non-gaze-weighted accumulated value, only accumulator, and not time-on-task, regions should show a relation with gaze-weighted accumulated value. The results of the revised GLMs provide exactly such evidence.

      We have edited the manuscript to make clear to readers why our gaze manipulation was not merely exploratory but rather a theoretically-motivated test to distinguish between competing models of decision-related neural activity.

      We have clarified our study hypotheses in the Introduction as follows:

      “We hypothesized that we would find (1) a positive correlation between gaze-weighted |SV| and activity in the reward network (the ventromedial prefrontal cortex (vmPFC) and ventral striatum), and (2) a positive correlation between gaze-weighted |AV| in the pre-supplementary motor area (pre-SMA) (Aquino et al., 2023), dorsolateral prefrontal cortex (dlPFC), and intraparietal sulcus (IPS).”

      We have also added clarifying text about conflict and time-on-task to the Discussion as follows: “Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time. The conflict account predicts that BOLD activity should scale with the inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long response times (Pisauro et al., 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoided this ambiguity by analyzing the effect of accumulated value at each point in time, not just at the moment of decision. Under this approach, conflict should be inversely correlated with accumulated value (as higher accumulated evidence indicates less similarity between options). Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of option values.

      A more serious concern is the potential confound with time-on-task BOLD activity. Accumulated value inevitably increases with time within a trial, leading to a correlation between the two variables (Grinband et al., 2011; Holroyd et al., 2018; Mumford et al., 2024). This is where the gaze data were particularly important. Time-on-task regions should show no relation with gaze allocation patterns. After accounting for non-gaze-weighted accumulated value, only accumulator regions, and not time-on-task regions, should show a relationship with gazeweighted accumulated value. The results of our analyses provide exactly such evidence: preSMA activity was positively correlated with gaze-weighted accumulated value, even when accounting for previous gaze history and individual differences in attention discounting.”

      (3) The authors allude to there being a correlation between SV and AV on this task, but the correlation is never reported. Please report the correlation with and without the removal of T-1.

      We appreciate the reviewer pointing out this omission. We now report all correlations between SV and both the lagged and non-lagged versions of AV in the Methods section (Fig. 7). SV was significantly correlated with the full calculation of AV (Pearson’s r = 0.27). In contrast, this correlation, while still statistically significant, decreased when compared to lagged AV (Pearson’s r = 0.06).

      (4) When examining relationships between SV, AV, and choice probability, the authors note that a larger coefficient for SV compared to AV is an inevitable consequence of an SSM choice process. Please explain why this is the case.

      The reviewer is correct in observing that this point was not made sufficiently clear in the main text. We have now expanded the explanation in the behavioral results section.

      The key insight is that in sequential sampling models, choices occur when accumulated evidence reaches a decision threshold. Importantly, the perceived value of each sample consists of the true underlying value plus random noise. The final sample (SV) is what pushes the accumulated evidence over the threshold, which creates a selection bias: decisions tend to occur when the noise component of SV happens to be positive and large. This means that the perceived final SV systematically overestimates the true SV, biasing upward the regression coefficient for the effect of SV on choice. In contrast, AV represents the sum of all previous sampled evidence, samples that we know did not lead to a choice. These samples are thus more likely to have had a negative or small noise component, meaning that the perceived AV systematically underestimates the true AV. This biases downwards the regression coefficient for the effect of AV on choice.

      In the net, we expect that even when sample evidence is weighted equally over time in the true decision process, regression analyses will inevitably shower larger coefficients for the effects of SV then for those of AV. This is a statistical artefact of the threshold-crossing mechanism, and not a reflection of differential weighting. We have incorporated this explanation into the revised manuscript to make clear why this pattern is an expected consequence of the SSM framework:

      “The larger coefficient for ∆SV compared to ∆AV is an inevitable consequence of an SSM choice process. In SSMs, a choice occurs when accumulated evidence reaches a threshold. Critically, perceived value for any given sample consists of the true underlying value plus random noise. The final sample (∆SV) is what pushes the accumulated evidence over the threshold, which creates a selection effect: decisions tend to be made when the noise component of ∆SV is relatively large and aligned with the ultimate choice, causing the perceived final ∆SV to systematically overestimate the true ∆SV. As a result, the regression coefficient for the effect of final ∆SV on choice is overestimated. In contrast, ∆AV represents the sum of all previous evidence, which includes samples that were insufficient to trigger a choice and thus more likely to have noise components that favored the non-chosen option. This means that the perceived ∆AV systematically underestimates the true ∆AV. As a result, the regression coefficient for the effect of ∆AV on choice is underestimated. This creates an inherent asymmetry between ∆SV and ∆AV: even when the true decision process weights evidence equally over time, regression analyses will show larger coefficients for ∆SV than ∆AV. For any data generated by an SSM, regressing choice probability on final ∆SV and total ∆AV would produce a larger coefficient for ∆SV due to this threshold-crossing selection effect.”

      (5) It is not clear to me why the authors single out the pre-SMA only in the abstract when IPS and dlPFC also show stronger correlations with AV and exhibit gaze modulation in the authors' final non-linear analysis. Further explanation is required in the Discussion and I would also suggest amending the Abstract because the 'Most importantly' claim will not be meaningful for the reader.

      We appreciate the reviewer’s point. In the revised manuscript, we have included several new GLMs, including the new GLM1 that looks at gaze-weighted AV, above and beyond the effect of non-gaze-weighted AV. That analysis only supports pre-SMA. We have now clarified this in the Abstract as follows:

      “Finally, we found gaze modulated accumulated-value signals, above and beyond the non-gazemodulated signals, in the pre-supplementary motor area (pre-SMA), providing novel evidence that visual attention has lasting effects on decision variables and suggesting that activity in the pre-SMA reflects accumulated evidence.”

      (6) Some discussion of statistical power would be warranted given that a sample of 23 is now considered small by current fMRI standards.

      We appreciate the reviewer raising this important issue. We acknowledge that our sample size of 23 subjects (with only 20 having useable eye-tracking data) is on the small side by current fMRI standards. However, we believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆SV| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in pre-SMA activity builds naturally on established findings.

      However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements.

      We appreciate the reviewers' point here. In fact we selected the mitochondrial DNA as a target for just the reason that the reviewer notes. mtDNA should be spatially distinct from the nuclear targets and allow us to determine if we were in fact seeing spatially distinct proteins at the interorganelle (mtDNA vs. telomeres/centrosomes) and intraorganelle (telomeres vs centromeres) levels.

      But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one).

      We have now added two studies in Figure 4 and Figure 5 detailing the use of OMAP to investigate specific genomic elements. In this case the Hox clusters (HOXA and HOXB) and haplotype-specific analysis of X-chromosome inactivation centers in female murine (EY.T4) cells. The controls in these cases are more specific, in line with those suggested by the reviewer as we (1) compare HOXA and HOXB with or without EZH2 inhibition using the same sets of probes and (2) specifically compare the region surrounding the XIC in female cells for the inactive and active X chromosomes.

      You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      We performed GSEA on the enrichment scores for the label-free proteomics data from the SAINT output in Figure 1D and that several of these proteins (e.g., those highlighted in Figure 2A: TERF1, CENPN, TOM70) have already been extensively validated to co-localize to these locations.

      To the reviewers request for additional validation, we analyzed ChIP-seq data for several proteins to determine if they were enriched surrounding specific loci. In the case of the HoxA/B analysis, we found that HDAC3 and TCF12 were enriched at HOXB compared to HOXA, and SMARCB1 and ZC3H13 were enriched at HOXA compared to HOXB (Figure 4C). HDAC3 and TCF12 ChIP data confirmed increased peak calls at HOXB and SMARCB1 and ZC3H13 ChIP data confirmed increased peak calls at HOXA for these four selected proteins (Figure 4D).

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      We agree with the reviewer that compared to mitochondrial targeting, there could be non-specific nuclear comparisons. We note again though that we purposefully stayed away from using the word “specifically” when describing the proteomics work developed here. The reason being that we are not atlasing a large number of targets to define specificity. Instead, we highlight in Figure 2 that we did observe differences in proteins associating with telomeres and mitochondrial DNA. That may be non-specific, and in fact, this is also why we decided to include two nuclear targets to determine what might be specifically enriched. Thus, we compared centromeric and telomeric protein enrichment as determined by OMAP and observed consistent differential enrichment of shelterin proteins at telomeres (Figure 2I) and CENP-A complex members at centromeres (Figure 2J). We could have done the relative comparisons to no-oligo controls, analogous to how CASPEX compared targeted analyses to no-sgRNA controls (PMID: 29735997). However, we found that the mitochondrial targeted samples were generally better as a comparator because (1) we have clear means to validate differences and (2) the local environment around DNA is being labeled.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      Assuming the nuclear control was the same, It is unclear how this ratio-of-ratios ([Telo/Ctrl]/[Cent/ctrl]) experiment would be inherently different from the direct comparison between Telo and Centromere. Again, assuming the backgrounds are derived from the same cellular samples. More than likely adding the extra ratios could increase the artifactual variance in the estimates, reducing the power of the comparisons as has been seen in proteomics data using ratio-of-ratio comparisons in the past (Super-SILAC).

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      We appreciate the reviewers' point here. To be clear, we have not made any claims about new proteins at specific loci. Instead we validated that known telomeric and centromeric associating proteins were consistently enriched by DNA OMAP (Figure 2). We also want to emphasize that while valuable, the current paper is not an atlasing paper to define the full and specific proteomes of two genomic loci. We instead show how this method can be used to observe quantitative differences in proteins enriched at certain loci (HOXA/B work, Figure 4) and even between haplotypes (Xi/Xa work, Figure 5).

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      We appreciate the reviewers' point here and have added the following text to the discussion: “Additionally, we show that this method is also able to detect DNA-DNA contacts through biotinylation of loop anchors. Our approach functions similarly to 4C[86]. However, our approach of biotin labeling of contacts does not rely on pairwise ligation events. Thus, detection of contacts through DNA O-MAP will vary in the sampling of DNA-DNA contacts in comparison.”

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      We took the reviewers point and have worked to scale down the DNA OMAP experiments while revising this manuscript. As noted in Figure 5, we have been able to scale this work down to work on plates with ~10x fewer cells than with our initial experiments. This is on top of the initial DNA OMAP work in Figure 1 and 2, as well as our additional work in Figure 4, where we are using 30-60 million cells in solutions which is still 10x less material than previous work (PMID: 29735997). Thus, the newest DNA OMAP platform uses ~100x fewer cells than previous work.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      As noted above, we have added Figures 4 and 5 to address the reviewer concerns by targeting multiple non-repetitive loci (HOXA and HOXB clusters and a 4.5Mb region straddling X-inactivation center on both the active and inactive X homolog). Targeting the regions around the X-inactivation center shows the potential to perform haplotype-resolved proteome analysis of chromatin interactors.

      For the telomeric protein overlap, we tried to do this specifically in Figure 1F, we agree with the reviewer that the controls used dramatically change the proteins considered enriched. The goal of the network analysis was to show (1) that we identify proteins previously observed in telomere proteomic datasets and (2) that we gain a more complete view of proteins based on capturing more known interacting proteins than many previous methods as was noted for the RNA OMAP platform (PMID: 39468212). For example, we observed enrichment of PRPF40A in the telomeric DNA OMAP data. From the Bioplex interactome, PRPF40A was observed to interact with TERF2IP and TERF2, suggesting that through these interactions PRPF40A may colocalize at telomeres. Similarly, we observed enrichment of SF3A1, SF3B1, and SF3B2. The SF3 proteins are known regulators of telomere maintenance (PMID: 27818134), but have not previously been observed in telomeric proteomics datasets, except now in DNA OMAP.

      We have added the following text to the Results to clarify these points:

      “To benchmark DNA O-MAP, we compared the full set of telomeric proteins to proteins observed in five established telomeric datasets (PICh, C-BERST, CAPLOCUS, CAPTURE, BioID)12,14,16,35,36 (Figure 1F). DNA O-MAP captured both previously observed telomeric interacting proteins (shelterins) as well as telomere associated proteins (ribonucleoproteins). We identified multiple heterogeneous nuclear ribonucleoproteins (hnRNPs) previously annotated as telomere-associated, including HNRNPA1 and HNRNPU. HNRNPA1 has been demonstrated to displace replication protein A (RPA) and directly interact with single-stranded telomeric DNA to regulate telomerase activity37–39. HNRNPU belongs to the telomerase-associated proteome40 where it binds the telomeric G-quadruplex to prevent RPA from recognizing chromosome ends41. We mapped DNA O-MAP enriched telomeric proteins to the BioPlex protein interactome and observed that in addition to capturing proteins from previously observed telomeric datasets (Figure 1F), DNA O-MAP enriched for interactors of previously observed telomeric proteins. Previous data found RBM17 and SNRPA1 at telomeres, and in BioPlex these proteins interact with three SF3 proteins (SF3A1, SF3B1, SF3B2). Though they were not identified in previous telomeric proteome datasets, all three of these SF3 proteins were enriched in the DNA O-MAP telomeric data. Furthermore, through interactions with G-quadruplex binding factors, these SF3 proteins are regulators of telomere maintenance (PMID: 27818134). Taken together, this data supports the effectiveness of DNA O-MAP for sensitively and selectively isolating loci-specific proteomes.”

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figures 4 and 5).

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      Our primary claim for DNA OMAP is that it requires orders of magnitude fewer cells than previous studies. Based on comments along these lines from both reviewers, we performed DNA OMAP targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figure 4 and 5). For the X chromosome targeting, we used ~3 million cells per condition with methods that we optimized during revision. When targeting HOXA and HOXA, we were able to identify HDAC3 and TCF12 enrichment at HOXB compared to HOXA as well as ZC3H13 and SMARB1 enrichment at HOXA compared to HOXB, which is consistent with ChIP-seq reads from ENCODE for these proteins (Figure 4C, D). Both the HOXand X chromosome work help to address limitations noted in the Gauchier et al. paper the reviewer notes as both show progress towards overcoming “the major signal-to-noise ratio problem will need to be addressed before they can fully describe the specific composition of single-copy loci”.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      We analyzed ChIP-seq reads for our HOXA and HOXB (Figure 4C,D) which recapitulate our findings for four of our differentially enriched proteins. We also note that with the addition of the nonrepetitive loci (Figures 4 and 5), we have performed DNA OMAP on seven different targets (telomeres, pericentromeres, mitoDNA, HOXA, HOXB, Xi, and Xa) and identified expected targets at each of these. The consistency of these data, which mirrors the consistency of the RNA implementation of OMAP (PMID: 39468212), reinforces that we can successfully enrich local proteomes at genomic loci.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      Based on this critique we have gone back through the manuscript to improve the fairness of our comparisons and expanded the limitations in our discussion section.

      To the point about fixation, Schmiedeberg et al., which the reviewer references, does describe crosslinking requiring longer interactions (~5 s). Yet, as featured in reviews, many additional studies have found that “it has been possible to perform ChIP on transcription factors whose interactions with chromatin are known from imaging studies to be highly transient” (Review PMID: 26354429). We note similar results in proteomics analysis in Subbotin and Chait that state that the linkage of lysine-based fixatives like formaldehyde and “glutaraldehyde to reactive amines within the cellular milieu were sufficient to preserve even labile and transient interactions (PMID: 25172955).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOX clusters and part of the X chromosome) in the revised manuscript (Figures 4 and 5).

      Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We have made the comparisons as best as possible. In fact, we found it difficult to find examples of recent implementations of many of these methods. Purchasing the exact mass spectrometers or performing every version of chromatin proteomics would be well beyond the scope of this work. On the other hand, OMAP has already generated data for three manuscripts. We are making the claim that using the instrumentation and methods available to us, we were able to reduce the number of cells required to analyze a given genomic loci. We then applied TMT multiplexing to further improve the throughput and perform replicate analyses. To fully validate that one protein exists at one loci and no other would require exhaustive atlasing of protein-genomic interactions which would be well beyond the scope of this single paper. Similarly, ChIP for every target identified to assess an empirical FDR would be well beyond the scope of this work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In summary, all three reviewers raised major concerns about the limitations of the method, many of which could be resolved by more precise and transparent language about these limitations. If you choose to resubmit a revised version, you should address questions like: What scale does "individual locus" refer to? At what scale can the method map protein-DNA interactions at individual targeted loci, rather than large repetitive domains? What is the estimated false discovery rate for a set of enriched proteins? The eLife assessment for this version of the manuscript is based on reviewer concerns. Note that this assessment can be updated after receiving a response to reviewer comments.

      Reviewer #1 (Recommendations for the authors):

      (1)The first couple of paragraphs make it sound like your method would exclusively benefit from sample multiplexing with MS-based proteomics. That is a bit misleading. The other stated methods use TMT. They don't use it to compare very different genomic (or compartmental) regions, but there is no reason cberst, glopro or CasID could not.

      A good point and we have updated the manuscript to reflect this. While previous methods generally did not use TMT, they could be adapted to do so and, similar to OMAP, improved by the use of more replicates in their analyses.

      (2) Please make the colors in 1F for the dataset overlap easier to read. 2 and 4+ are too similar.

      We appreciate the comment on making the colors easier to discern. Along these lines we’ve changed the color of “2” to make it easier to distinguish from “4+”.

      (3) Label as many dots as legible in your volcano plots.

      We’ve labeled a number of proteins that are relevant to the discussion in this paper as well as some additional proteins. We feel that additional labeling would detract from the points that we are trying to make in individual figure panels about groups of proteins, rather than general remodeling of all proteins.

      (4) Figure 2E needs a divergent color scheme since it crosses 0. And is it scaled, log-transformed, or both? And compared to what then?

      Figure 2E (heatmap) is z-scaled relative protein abundance measurements based on TMTpro reporter ion signal to noise (“s/n”). We have added additional information to the legend to highlight the information that the reviewer points out here. For the color, we are unsure of what is being asked for, as above 0 is red and below 0 is blue.

      (5) Unclear what you are implying with "...only 1-2 biological replicates." I would omit or clarify.

      Fair point, we have updated the manuscript to omit this section to simplify the introduction.

      (6) H2O2 and biotin phenols might be toxic to living organisms. But so is 4% PFA and ISH. I realize you are trying to justify your new approach but you don't need to do it with exaggerated contrasts. This O-MAP is a great approach and probably more likely for people to adopt it because it's DNA ISH based. Plus, with the clinking, you are likely not displacing proteins via Cas9 landing.

      We appreciate the reviewer’s comments about adoption and lack of protein displacement. We’ve scaled back on the claims and added more about limitations owing to crosslinking and ISH.

      (7) How much genome does the Cent regions take up? You state 500 kb for Telos.

      In the text we delineate how large of a region the PanAlpha probes target “The genome-wide binding profile of the pan-alpha probe closely overlaps with centromeres (Figure S1) and covers approximately 35 Mb of the genome according to in silico predictions.” Additionally, we’ve added Table S4 to summarize target locus sizes for all of the included targets.

      (8) You seem to be underestimating the lysine labeling. Is that after TMT labeling and analysis? If so, you're already ignoring what couldn't be seen. I don't think it's that important but you included it, so please describe clearly why it's an issue and how much of an issue it is. How does that relate to lit values? And it's not just TMTpro, it's any lysine labeler.

      We appreciate the reviewers point about specifying the reasoning and the lack of clarity around overall lysine labeling. That 1.38% is the number of peptides with remainder modifications due to formaldehyde crosslinking. For overall acylation of lysines with TMT labels, we generally expect (and achieve) >97% labeling of lysines with TMT reagents as the Kuster and Carr labs nicely demonstrated across a range of labeling conditions (PMID: 30967486).

      Decrosslinking is a critical step generally for proteomics workflows on fixed or FFPE tissues and thus we sought to explore whether we could achieve sufficiently low residual lysine alkylation to enable protein quantitation by TMTpro reagents (or any lysine labeler, as the reviewer notes). For TMTpro-based methods on peptides, this is less of a concern generally as protease cleavage frees new primary amines at the N-termini of peptides which can be labeled for quantitation. But in part since we are describing a proteomics method on fixed tissues we wanted to share these data and the potential inclusion of residual fixation modifications for readers to potentially take into consideration when performing this method.

      Reviewer #3 (Recommendations for the authors):

      Liu et al. describe an original locus labelling approach that enables the isolation of specific genomic regions and their associated proteins. I have mixed views on this work, which, in my opinion, remains preliminary at this stage. Establishing the proteome of a single chromatin region is one of the most complex challenges in chromatin biology, as extensively discussed in Gauchier et al. (2020). Any breakthrough towards this goal is of significant interest to the community, making this manuscript potentially compelling. Indeed, some data suggest that the method works for repetitive DNA to some extent. However, much of the data is not very convincing, and in the case of small DNA targets, it argues against the use of DNA-O-MAP.

      In contrast to existing methods, DNA-O-MAP combines locus-specific hybridisation in situ (using affordable oligonucleotides) with proximity biotinylation. A major advantage of this strategy over other locus-specific biotinylation methods is the possibility of extensively washing excess or non-specifically hybridised probes before the biotinylation reaction, theoretically limiting biotinylation to the target region and thus significantly enhancing the signal-to-noise ratio. Other methods involving proximity biotinylation, such as targeted dCas9, do not have this capacity, meaning biotinylation occurs not only at the locus where a small fraction of dCas9 molecules is targeted but also around non-bound dCas9 molecules (representing the vast majority of dCas9 expressed in a given cell). This aspect potentially represents an interesting advance.

      We thank the reviewer for their thoughts and critiques, which we hope have in part relieved concerns pertaining to limitation on repetitive elements. To the latter points, we confirmed this with new specificity analysis that showed labeling to be highly specific to a given probe locus (Figure S3).

      Below, I outline the significant issues:

      The manuscript implies that DNA-O-MAP has better sensitivity than earlier techniques like CAPTURE, GLOPRO, or PICh. The authors state that PICh uses one trillion cells (which I doubt is accurate), and other methods require 300 million cells, whereas DNA-O-MAP uses only 60 million cells, suggesting the latter is more feasible. However, these earlier experiments were conducted almost 15 and 6 years ago, when mass spectrometry (MS) sensitivity was considerably lower than that of current instruments. The authors cannot know whether the proteome obtained by previous methods using 60 million cells, but analysed with current MS technology, would yield results inferior to those of DNA-O-MAP. Unless the authors directly compare these methods using the same number of cells and identical MS setups, I find their argument unjustified and misleading.

      Based on the instrumentation listed, we actually do have a good idea of how sensitivity changes may have affected identifications and overall sensitivity. For example, the CASPEX data was collected on an Orbitrap Fusion Lumos, while our data was collected on an Orbitrap Fusion Eclipse. From our work characterizing these two instruments during the Eclipse development (PMID: 32250601), we do actually know that the ion optics improvements boosted sensitivity of the Eclipse used in our work compared to the Lumos by ~50%, meaning if GLOPRO was run on an Eclipse it would still require >200 million cells per replicate for input.

      It is suggested that DNA-O-MAP is capable of 'multiplexing', whereas previous methods are not. This statement is also misleading. As I understand it, the targeted regions do not originate from a common pool of cells. Instead, TMT multiplexing only occurs after each group of cells has been independently labelled (Telo, Centro, Mito, control). Therefore, previous methods could also perform multiplexing with TMT. Moreover, it is unclear how each proteome was compared: one would expect many more proteins from centromeres than from telomeres (I am unsure about the number of mitochondria in these cells) since these regions are significantly larger than telomeres (possibly 10 to 100 times larger?). Have the authors attempted to normalise their proteomics data to the size (concatenated) of each target? This is particularly relevant when comparing histone enrichment at chromatin regions of differing sizes.

      We agree with the reviewers that this was overstated. In fact the GLOPRO paper notes that they performed a MYC analysis with a previous generation of TMT that could multiplex 10 samples. We have amended the manuscript to be more specific in those contexts. As stated in the methods section, “Samples were column normalized for total protein concentration”, to account for the amount of protein and size of the different targets.

      Figure 1C shows streptavidin dots resembling telomeres. To substantiate this claim, simultaneous immunofluorescence with a telomere-specific protein (e.g., TRF1 or TRF2) is required. It is currently unknown whether all or only a subset of telomeres are targeted by DNA-O-MAP, and it is also unclear if some streptavidin foci are non-telomeric. Quantification is needed to indicate the reproducibility of the labelling (the same comment applies to the centromere probes later in the manuscript; an immunofluorescence assay with CENPB would be informative, alongside quantifications).

      We understand the reviewer’s concern about specificity and reproducibility of DNA-O-MAP. To address this we have added analysis showing the efficiency and specificity of our FISH and biotin labeling for Telomere, PanAlpha, and Mitochondria targeting oligos (Figure S3). We found that biotin deposition was highly specific to the intended targets with an average across the three probes of 98% specificity.

      Perhaps more importantly, the authors suggest that it may be possible to enrich proteins that are not necessarily present at the target locus but are instead in spatial proximity (e.g., RNA polymerase I subunits enriched upon centromere targeting). Does this not undermine the purpose of retrieving locus-specific proteomes?

      The goal of DNA OMAP is to identify a local neighborhood of proteins around a specific genomic loci, similar to GLOPRO. As we note in the work presented in Figure 4 and 5 now, these neighborhoods are inherently interesting for comparison of quantitative changes that occur around a genomic locus.

      Possibly related to the previous issue, when DNA-O-MAP is used to assess DNA-DNA interactions, probes covering regions of 20-25 kb are employed. Therefore, one would expect these regions to be significantly biotinylated compared to flanking regions. However, Genome Browser screenshots indicate extensive biotinylation signals spanning several megabases around the 20-25 kb targets. If the method were highly resolutive, the target region would be primarily enriched, with possibly discrete lower enrichment at distant interacting regions. The lack of discrete enrichment suggests poor resolution, likely due to the likely large scale of proximity biotinylation. This compromises the effectiveness of DNA-O-MAP, especially if it is intended to target small loci with complex sequences. Could the authors quantify the absolute number of reads from the target region compared to those from elsewhere in the genome (both megabases around the locus and other chromosomes, where many co-enriched regions seem to exist)? This would provide insights into both enrichment and specificity.

      Thanks for this suggestion, we have included a new Figure S8 to look at normalized read depth as a function of distance from the genomic target. The resolution of DNA OMAP, like all peroxidase mediated proximity labeling methods, is not dependent on the sequence length of the DNA region, but the 30-40nm of physical space around the HRP molecule that is targeted to the genomic loci. 

      Minor Issues:

      (1) Page 3, second paragraph: It is unclear why probes producing a visible signal in situ necessarily translates to their ability to retrieve a specific proteome.

      We have revised the manuscript to de-emphasize the visible signal aspect of probe targeting and re-emphasize our initial point that the number of probes needed to properly target unique regions makes the use of locked nucleic acid probes cost-prohibitive. The basic point though, we and others previously showed with RNA OMAP (PMID: 39468212) and Apex/proximity labeling strategies, the ability to deposit biotin and visualize generally directly translates to recovery of proximally labeled proteins (PMID: 26866790).

      (2) Page 3, last paragraph: "to reach a higher degree of enrichment...": Has it been demonstrated that direct protein biotinylation provides higher enrichment of relevant proteins? Certainly, there is higher enrichment of proteins, but whether they are relevant is another matter.

      Our point here was that the methods using direct protein biotinylation have higher levels of enrichment and thus require less cells than the previously mentioned PICh method, which is why we wrote the following: “In the case of GLoPro, APEX-based proximity labeling enhanced protein detection sensitivity, reducing the input required for each replicate analysis to ~300 million cells—a 10-fold reduction in cell input compared to PICh which used 3 billion cells.”

      Regarding if these proteins are relevant or not, we show enrichment of known proteins that are critical to the function of their occupied genomic region at telomeres and centromeres. Additionally, we’ve made added quantitative comparisons to assess relevance in our analysis of Hox and our targeted region of the X chromosome through comparisons to ChIP data at these regions. The improved enrichment that we’ve established in our initial submission as well as in the updated version also means that we can further scale down the number of cells required.

      (3) Figure 2B is misleading; it appears as though all three regions are targeted in the same cell, suggesting true multiplexing, which, I believe, is not the case.

      To avoid any potential confusion about how the samples were derived we’ve updated this figure panel to show three separate cells, each with a different region being targeted.

      (3) If I understand correctly, the 'no probe' control should primarily retrieve endogenously biotinylated proteins (carboxylases), which are mainly found in mitochondria. Why does the Pearson clustering in Supplementary Figure 2 not place this control proteome closer to the mitochondrial proteome?

      Under the assumption that the ~10 carboxylases are biotinylated at the same levels in all cells, yet the proportion of these carboxylases compared to all enriched proteins for a given target is markedly reduced. Thus, as a proportion of the enriched proteome we note in Figure S4 that mitochondrial DNA OMAP enriches proteins besides the carboxylases. We believe this explains why the ‘no probe’ sample can be clearly separated along PC2 in Figure 2D.

      (4) Was CENPA enriched in the centromere DNA-O-MAP? If not, have the authors scaled up (e.g., with ten times more cells) to see if the local proteome becomes deeper and detects relevant low-abundance proteins like CENPA or HJURP? This would be very informative.

      We did not observe CENPA, and we had originally contemplated the experiment the reviewer suggested, but noted that CENPA has only two tryptic peptides (>7 AA, <35AA), and they are both in the commonly phosphorylated region of the protein. Rather than scale up these experiments, we decided to attempt DNA OMAP on the non-repetitive locus experiments.

      (5) Using a few million cells, I do not see how the starting chromatin amount could range from 0.5 to 7 mg, as shown in Figures 2 and 3. How were these figures calculated? One diploid cell contains approximately 6 pg of DNA/chromatin, which means one billion cells represent about 6 mg of DNA/chromatin (a typical measurement for these methods).

      Thanks to the reviewer for catching this, that should have been the total lysate amount, not chromatin mass. We have corrected Figures 2 and 3.

      (6) Figure S1: There is no indication of the metrics used for the shades of red.

      We have added a gradient legend to depict this.

      (7) What is the purpose of HCl in the experiment?

      HCl treatment was done to reduce autofluorescence for imaging (PMID: 39548245).

      (8) I could not find the MS dataset on the server using the provided accession number (PDX054080).

      Thank you for pointing this out, we have confirmed the dataset is public now and added the new datasets for the Xi/Xa and Hox studies. We also note that the accession should be “PXD054080”

      (9) Why desthiobiotin instead of biotin?

      We have tested both; desthiobiotin was helpful to reduce adsorption to surfaces. Either biotin or desthiobiotin can be used, though, for OMAP.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Del Rosario et al characterized the extent and cell types of sibling chimerism in marmosets. To do so, they took advantage of the thousands of SNPs that are transcribed in single-nucleus RNA-seq (snRNA-seq) data to identify the sibling genotype of origin for all sequenced cells across 4 tissues (blood, liver, kidney, and brain) from many marmosets. They found that chimerism is prevalent and widespread across tissues in marmosets, which has previously been shown. However, their snRNA-seq approach allowed them to identify precisely which cells were of sibling origin, and which were not. In doing so they definitively show that sibling chimerism across tissues is limited to cells of myeloid and lymphoid lineages. The authors then focus on a large sample of microglia sequenced across many brain regions to quantify: (1) variation in chimerism across brain regions in the same individual, and (2) the relative importance of genetic vs. environmental context on microglia function/identity.

      (1) Much like across different tissues in the same individual, they found that the proportion of chimeric microglia varies across brain regions collected from the same individuals (as well as differing from the proportion of sibling cells found in the blood of the same animals), suggesting that cells from different genetic backgrounds may differ in their recruitment and/or proliferation across regions and local tissue contexts, or that this may be linked to stochastic bottleneck effects during brain development.

      (2) Their (admittedly smaller sample size) analyses of host-sibling gene expression showed that the local environment dominates genotype.

      All told, this thoughtful and thorough manuscript accomplishes two important goals. First, it all but closes a previously open question on the extent and cell origins of sibling chimerism. Second, it sets the stage for using this unique model system to examine, in a natural context, how genetic variation in microglia may impact brain development, function, and disease.

      The conclusions of this paper are well supported by the data, and the authors exert appropriate care when extrapolating their results that come from smaller samples. However, there are a few concerns that should be addressed.

      The "modest correlation" mentioned in lines 170-172 does not take into account the uncertainty in estimates of each chimeric cell proportion (although the plot shows those estimates nicely). This is particularly important for the macrophages, which are far less abundant. Perhaps a more appropriate way to model this would be in a binomial framework (with a random effect for individuals of origin). Here, you could model the sibling identity of each macrophage as a function of the proportion of sibling-origin microglia and then directly estimate the percent variance explained.

      We appreciate this good suggestion. We performed an analysis along these lines, and found that it supported the conclusion of a lack of strong relationship between microglial and macrophage chimerism. In particular (and as we now have added to the Methods):

      “To perform an analysis of Fig. 2D that takes into account the uncertainty in the estimate of the chimeric cell proportion, we performed a binomial generalized linear mixed-effects model analysis in R using the command glmer( y~(1|indiv) + chimerism_micro, family=binomial), where y is a vector (of length 1,333) containing the genomic identity of each macrophage (either host or twin), 1|indiv models a random effect for the identity of each animal, and chimerism_micro is the microglia chimerism of the animal’s brain. The fixed effects probability of chimerism_micro was 0.795, indicating that microglial chimerism fraction was not statistically significant as a predictor for macrophage chimerism fraction. The estimate for the intercept was -0.8115 and the estimate for chimerism_micro was 0.3106, which indicates that the probability of a cell is a macrophage given the microglia chimerism fraction was only 0.57 (plogis(-0.8115+0.3106)).”

      We have added the following in the main text:

      “We investigated further by performing a statistical test that takes into account the uncertainty in the estimates of the chimeric cell proportion using a binomial framework (Methods); in this analysis, microglia chimerism fraction was not a statistically significant predictor of macrophage chimerism fraction (Methods). This suggests that in addition to the cell’s genome, other factors such as local host environment play a role in differential recruitment, proliferation or survival of the sibling cells. (We note that macrophages often transit the fluid-filled perivascular space, with a substantially different migration history and arrival dynamics than microglia.)”

      Given this new analysis, and our original observation that the Pearson correlation was only 0.31, we believe that other factors in addition to the cell’s genome play a role in differential recruitment or survival of sibling cells.

      A similar (albeit more complicated because of the number of regions being compared) approach could be applied to more rigorously quantify the variation in chimerism across brain regions (L198-215; Figure 4). This would also help to answer the question of whether specific brain regions are more "amenable" to microglia chimerism than others.

      We performed the analysis along these lines and added the following in the Methods section:

      “We used the same framework to further analyze Fig. 4. We included brain region as a covariate in the binomial framework: glmer( y~(1|indiv) + brain_reg + assay, family=binomial), where, y is a vector (of length 48,439) containing the genomic identity of each microglia, and assay is either “Drop-seq” or “10X”. The brain regions assayed in Fig. 4 are the cortex, hippocampus, hypothalamus, striatum, thalamus, and basal forebrain. All these brain regions were statistically significant predictors for microglia chimerism fraction (all P-values<2x10<sup>-16</sup>), supporting the conclusion that chimerism varies across brain regions. We also re-analyzed Supplementary Fig. 4 (Fig. 4B in original manuscript) using the same framework and found that 18 out of 27 brain substructures were statistically significant predictors for microglia chimerism fraction.”

      We have added the following sentences in the main text:

      “We used the binomial generalized linear mixed-model framework and found that all brain regions were statistically significant predictors for microglia chimerism fraction, supporting the conclusion that chimerism varies across brain regions (Methods).

      Analysis of finer brain substructures showed a similar result (Supplementary Fig. 4; the binomial generalized linear mixed-model framework determined that 18 out of 27 brain substructures were statistically significant as predictors for microglia chimerism fraction, Methods).”

      While the sample size is small, it would be exciting to see if any microglia eQTL are driven by sibling chimerism across the marmosets.

      We like this idea, but our study is underpowered for eQTL analysis since we only have 14 data points in the correlation analysis (eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses).

      L290-292: The authors should propose ways in which they could test the two different explanations proposed in this paragraph. For instance, a simulation-based modeling approach could potentially differentiate more stochastic bottleneck effects from recruitment-like effects.

      While intriguing, the gene expression comparison (Figure 5) is extremely underpowered. It would be helpful to clarify this and note the statistical thresholds used for identifying DEGs (the black points in the figure).

      We agree; to help clarify this for readers, we added the following sentence at the end of the paragraph discussing Fig. 5A-C.

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings. We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      And in the caption of Fig. 5A-C, we have included the statistical threshold for identifying DEGs:

      “In (A) to (C), each point represents a gene; its location on the plot represents the level of expression of that gene among microglia with two different genomes in the same animal. x- and y-axes: normalized gene expression levels (number of transcripts per 100,000 transcripts). FC: fold-change of gene expression, female/male for XIST. Fold-change and P-values were calculated using the binomTest method from the edgeR package (Robinson et al., 2010). Differentially expressed genes (black dots) were defined as: FDR Q-value<0.05 and fold-change>1.5 (in either direction) and the gene must be expressed in at least 10% of at least one of the two sets of microglia being compared.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports a novel and quite important study of chimerism among common marmosets. As the authors discuss, it has been known for years that marmosets display chimerism across a number of tissues. However, as the authors also recognize, the scope and details of this chimerism have been controversial. Some prior publications have suggested that the chimerism only involves cells derived from hematopoietic stem cells, while other publications have suggested more cell types can also be chimeric, including a wide range of cell types present in multiple organs. The present authors address this question and several other important issues by using snRNA-seq to track the expression of host and sibling-derived mRNAs across multiple tissues and cell types. The results are clear and provide strong evidence that all chimeric cells are derived from hematopoietic cell lineages.

      This work will have an impact on studies using marmosets to investigate various biological questions but will have the biggest impact on neuroscience and studies of cellular function within the brain. The demonstration that microglia and macrophages from different siblings from a single pregnancy, with different genomes expressing different transcriptomes, are commonly present within specific brain structures of a single individual opens a number of new opportunities to study microglia and macrophage function as well as interactions between microglia, macrophages, and other cell types.

      Strengths:

      The paper has a number of important strengths. This analysis employs the first unambiguous approach providing a clear answer to the question of whether sibling-derived chimeric cells arise only from hematopoietic lineages or from a wider array of embryonic sources. That is a long-standing open question and these snRNA-seq data seem to provide a clear answer, at least for the brain, liver, and kidney. In addition, the present authors investigate quantitative variation in chimeric cell proportions across several dimensions, comparing the proportion of chimeric cells across individual marmosets, across organs within an individual, and across brain regions within an individual. All these are significant questions, and the answers have important implications for multiple research areas. Marmosets are increasingly being used for a range of neuroscience studies, and a better understanding of the process that leads to the chimerism of microglia and macrophages in the marmoset brain is a valuable and timely contribution. But this work also has implications for other lines of study. Third, the snRNA-seq data will be made available through the Brain Initiative NeMO portal and the software used to quantify host vs. sibling cell proportions in different biosamples will be available through GitHub.

      Weaknesses:

      I find no major weaknesses, but several minor ones. First, the main text of the manuscript provides no information about the specific animals used in this study, other than sex. Some basic information about the sources of animals and their ages at the time of study would be useful within the main paper, even though more information will be available in the supplementary material.

      We moved the table containing animal information (age at time of study, sex, source, tissues analyzed) from Supplementary Table 1 into the main text as Table 1. We also added the following sentences starting on line 140:

      “Brain snRNA-seq was performed on 11 animals (6 adults, 3 neonates and 1 six months old; Table 1). All were unrelated except for CJ006 and CJ007 which are birth siblings, and CJ025 and CJ026 which are (non-birth) siblings. All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization in Massachusetts. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single cell atlas of the marmoset brain. The three neonates had died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      Second, it is not clear why only 14 pairs of animals were used for estimating the correlation of chimerism levels in microglia and macrophages. Is this lower than the total number of pairwise comparisons possible in order to avoid using non-independent samples? Some explanation would be helpful.

      Only birth siblings (twins and triplets) can be meaningfully included in this analysis. The 14 pairs of animals we used to estimate the correlation of chimerism levels in microglia and macrophages included all pairs that we could use for this analysis: eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses.

      Finally, I think more analysis of the consistency and variability of gene expression in microglia across different regions of the brain would be valuable. Are there genetic pathways expressed similarly in host and sibling microglia, regardless of region of the brain? Are there pathways that are consistently expressed differently in host vs sibling microglia regardless of brain region?

      For brain-region differences in microglial gene expression, we are under-powered and would only be scratching the surface of a question (interesting but beyond the focus and scope of this paper) that needs deeper experimental sampling.

      For the questions about sibling-sibling differences (regardless of which sibling is host) and recurring host-sibling differences, we can do a stronger analysis, because these analyses have similar power to each other. We describe this analysis in the revised manuscript as follows:

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings.”

      We also, as suggested, tried to get beyond single-gene analyses to expression of programs/pathways, by performing latent factor analysis on the single-cell gene expression measurements. 

      “Following the method described in (Ling et al., 2024), we performed latent factor analysis using the probabilistic estimation of expression residuals (PEER, Stegle et al., 2010) on the gene-by-donor matrix expression of microglia. We started by creating a gene-by-cell matrix of microglia gene expression from all animals, and we normalized the matrix using SCT transform version 2 (Choudhary and Satija, 2022) with 3000 variable features. We obtained the Pearson residuals from SCT normalization and summed up the residuals across cells with the same genome to obtain a gene-by-donor matrix of expression measurements of microglia. We used this matrix as input to PEER and ran the tool with a provided number of factors from 9 to 12. For each gene-expression latent factor, to evaluate whether host/sibling identity had a consistent effect on expression levels, we performed a linear regression with host/sibling identity using glm(peer_factor_k ~ host_or_twin). For all factors, the P-values for the effect of host_or_twin were all insignificant (greater than 0.1), indicating that no PEER factor associated with host-vs-twin identity. Thus, our results found no large-scale gene expression program that was consistently expressed differently between hosts and twins.”

      We have added the text above to the Methods section, and we added the following at the end of the section on Gene-expression comparisons of host- to sibling-derived microglia (lines 264-267):

      “We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      Gene-expression pathways/factors did (within some animals) did show host-twin differences in expression levels, but without a consistent host-twin direction of effect that was shared across the many host-twin comparisons. In particular, we used the PEER analysis that we have performed above and calculated the host-sibling expression level difference for each latent factor. Many factors differed in expression in individual cases, though none did so in all cases nor in a consistent-sign manner:

      Author response image 1.

      Difference between host and sibling expression of gene-expression latent factors for each of the 12 factors computed (using PEER) from the single-cell dataset. For a given factor, the factor expression value of the sibling-genome cells is subtracted from that of the host-genome cells and the difference is divided by the maximum of the absolute value of all elements in that factor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the introduction (line 62), the authors mention that chimerism might have shaped behavior in marmosets (and perhaps been selected for). It would be helpful to see this revisited in the discussion. Is it possible that additional genetic variation in immune cells (resident and circulating) provides adaptive benefits and/or disease resistance? In the case of microglia, could the proportion of sibling cells be related (either positively or negatively) to local/regional pathology?

      We liked this suggestion and have added the following in the Discussion:

      “Chimerism could also enable interesting future analyses of whether there are adaptive benefits of chimerism in marmoset immune cells, among whom chimerism could in principle allow presentation of a wider variety of antigens for adaptive immunity. In a recent outbreak of yellow fever in Brazil in 2016-2018, marmosets were found to be less susceptible than other primates that lack immune system chimerism, including the howler monkeys (Alouatta), robust capuchins (Sapajus), and titi monkeys (Callicebus) (de Azebedo Fernandes, et al., 2021). In studying future outbreaks in marmosets, one could use single-cell RNA-seq and the methods described here to study how genetically distinct immune cells (in the same animal) have differentially migrated to affected tissues and/or assumed "activated" immune cell states. Recent innovations in spatial transcriptomics with sequencing readouts (that detect SNP alleles) may also make it possible to identify any differential recruitment of genetically distinct immune cells to focal infection sites.”

      Minor comments:

      L300 delete "temporal.”

      We have revised the text accordingly.

      L305: "more-restricted" should not be hyphenated.

      We have revised the text accordingly.

      L309: "from the non-cell" - delete "the.”

      We have revised the text accordingly.

      L367: Louvain, not Louvaine.

      We have revised the text accordingly.

      Figure 2B can be removed - it does not add much information and takes up a lot of space.

      We have moved Figure 2B to panel J Supplementary Fig. 1 (it is now displayed together with all other animals).

      The same can be said for Figure 4B, which is too tiny. There might be more effective ways to show this variation across animals.

      We have moved Figure 4B to Supplementary Fig. 4 and we have increased the font sizes to make the text in the figures more readable.

      Reviewer #2 (Recommendations for the authors):

      I would suggest providing some basic information about the sources of study animals within the main text. At a minimum, it would be useful to state which colonies are represented in the data, and if there is anything significant about the individual animal histories (e.g. prior exposure to surgical intervention or infectious disease). I believe this basic information should be in the main text, despite the inclusion of a broader range of information in the supplements.

      We appreciate this suggestion and revised lines 143 to 149 of the main text as follows:

      “All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single-cell atlas of the marmoset brain (Krienen et al., 2020; Krienen et al., 2023). The three neonates died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      I would include the species name (Callithrix jacchus) in line 48.

      “On lines 47-48, we now indicate the name of the genus: “Chimerism is common, however, in the Callitrichidae family that consists of the marmosets (Callithrix) and their close relatives the tamarins (Saguinus)...”

      Then on line 65, we now indicate the species name: “Here, we analyze chimerism in the common marmoset (Callithrix jacchus) brain, liver, kidney and blood,...”

      The word "organisms" in line 59 should be "organs.”

      We have modified the text accordingly.

      Lines 100-101: I would suggest this would be clearer to readers if it read: "The relative likelihoods of the original source of each cell could be strongly...".

      We have modified the text accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue - the fragility of meta-analytic findings - by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong, though some clarifications would further enhance interpretability.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Weaknesses:

      (1) The rationale and mathematical details behind the proposed EOI and ROAR methods are insufficiently explained. Readers are asked to rely on external sources (Grimes, 2022; 2024b) without adequate exposition here. At a minimum, the definitions, intuition, and key formulas should be summarized in the manuscript to ensure comprehensibility.

      (2) EOIMETA is described as being applicable when heterogeneity is low, but guidance is missing on how to interpret results when heterogeneity is high (e.g., large I²). Clarification in the Results/Discussion is needed, and ideally, a simulation or illustrative example could be added.

      (3) The manuscript would benefit from side-by-side comparisons between the traditional FI at the trial level and EOIMETA at the meta-analytic level. This would contextualize the proposed approach and underscore the added value of EOIMETA.

      (4) Scope of FI: The statement that FI applies only to binary outcomes is inaccurate. While originally developed for dichotomous endpoints, extensions exist (e.g., Continuous Fragility Index, CFI). The manuscript should clarify that EOIMETA focuses on binary outcomes, but FI, as a concept, has been generalized.

      Reviewer #2 (Public review):

      Summary:

      The study expands existing analytical tools originally developed for randomized controlled trials with dichotomous outcomes to assess the potential impact of missing data, adapting them for meta-analytical contexts. These tools evaluate how missing data may influence meta-analyses where p-value distributions cluster around significance thresholds, often leading to conflicting meta-analyses addressing the same research question. The approach quantifies the number of recodings (adding events to the experimental group and/or removing events from the control group) required for a meta-analysis to lose or gain statistical significance. The author developed an R package to perform fragility and redaction analyses and to compare these methods with a previously established approach by Atal et al. (2019), also integrated into the package. Overall, the study provides valuable insights by applying existing analytical tools from randomized controlled trials to meta-analytical contexts.

      Strengths:

      The author's results support his claims. Analyzing the fragility of a given meta-analysis could be a valuable approach for identifying early signs of fragility within a specific topic or body of evidence. If fragility is detected alongside results that hover around the significance threshold, adjusting the significance cutoff as a function of sample size should be considered before making any binary decision regarding statistical significance for that body of evidence. Although the primary goal of meta-analysis is effect estimation, conclusions often still rely on threshold-based interpretations, which is understandable. In some of the examples presented by Atal et al. (2019), the event recoding required to shift a meta-analysis from significant to non-significant (or vice versa) produced only minimal changes in the effect size estimation. Therefore, in bodies of evidence where meta-analyses are fragile or where results cluster near the null, it may be appropriate to adjust the cutoff. Conducting such analyses-identifying fragility early and adapting thresholds accordingly-could help flag fragile bodies of evidence and prevent future conflicting meta-analyses on the same question, thereby reducing research waste and improving reproducibility.

      Weaknesses:

      It would be valuable to include additional bodies of conflicting literature in which meta-analyses have demonstrated fragility. This would allow for a more thorough assessment of the consistency of these analytical tools, their differences, and whether this particular body of literature favored one methodology over another. The method proposed by Atal et al. was applied to numerous meta-analyses and demonstrated consistent performance. I believe there is room for improvement, as both the EOI and ROAR appear to be very promising tools for identifying fragility in meta-analytical contexts.

      I believe the manuscript should be improved in terms of reporting, with clearer statements of the study's and methods' limitations, and by incorporating additional bodies of evidence to strengthen its claims.

      Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of the Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to the meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaptation is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

      Specific comments:

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      I am very appreciative of the insightful comments you all shared, and in light of them have made several clarifications and revisions. Thank you again, I am grateful to have received such considered feedback and I hope I’ve addressed any outstanding issues. I have replied to each reviewer’s recommendations in this document sequentially for ease of scanning, and am most grateful for the summary strengths and weaknesses, which I am also incorporated into these replies. Thank you again!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript makes the important argument that many meta-analyses are inherently fragile, which aligns with prior work (e.g., PMID: 40999337). Please add the reference to the statements.

      Excellent point, thank you – I’ve expanded the discussion of fragility analysis, and its application to meta-analysis, including this reference.

      (2) The rationale and mathematical underpinnings of the proposed EOI and ROAR methods are not sufficiently explained. While the authors cite Grimes (2022, 2024b), readers are expected to rely heavily on these external sources without adequate exposition in the current paper. This limits the ability to fully evaluate the reasonableness of the methods or to reproduce the approach. I strongly recommend expanding the description of EOI and ROAR within the manuscript.

      I agree fully – I was a little remiss in this scope, as I was worried about overwhelming the reader. However, I was too sparse with detail and have now extended the text this way to describe the methods intuitively as possible (see Discussion, subsection “Ellipse of Insignificance and Region of Attainable Redaction”

      (3) In the Methods, the authors note that EOIMETA is applicable when between-study heterogeneity is low. However, the manuscript provides little guidance on how to interpret results when heterogeneity is high (e.g., larger I² values). I recommend clarifying this issue in the Results or Discussion sections, emphasizing the limitations of EOIMETA under high heterogeneity. Ideally, the authors could include either a small simulation study or an illustrative example to demonstrate the performance of the method in such settings.

      This is an excellent question, and I was remiss for not considering it better in the manuscript. Originally, the simple idea was to just pool the results for EOI, in which case heterogeneity would be an issue. But I then subsequently added weighed-inverse variance methods to account for situations with increased heterogeneity, so my initial comment was not strictly correct. I’ve changed the text in several places, notably in the methods and in the discussion (see reply point 5).

      (4) While EOIMETA is introduced as a generalizable fragility metric for meta-analyses, the illustrative examples would benefit from clearer comparisons with the traditional Fragility Index (FI). Because FI is well established in the RCT literature and familiar to many readers, presenting side-by-side results (e.g., FI at the trial level versus EOIMETA at the meta-analytic level) would provide important context. Such comparisons would also highlight the added value of EOIMETA, underscoring that even when individual trials appear robust under FI, the pooled meta-analysis may remain fragile.

      This is an excellent idea! The new table is given below. Note that traditional FI are not defined for non-significant results, and EOI is ambiguous for counts <2.

      (5) In the Discussion currently states that the Fragility Index (FI) applies only to binary outcomes. This is not entirely accurate. While the original FI was indeed developed for dichotomous endpoints, subsequent methodological work has extended the concept to other data types, including continuous outcomes (continuous fragility index, CFI). The manuscript should acknowledge this distinction: EOIMETA presently focuses on binary outcomes at the meta-analytic level, but FI more broadly is not restricted to binary data. Adding this clarification, with appropriate citations, would improve accuracy and place EOIMETA more clearly within the broader fragility literature.

      Thank you for this catch – clarified now in the discussion:

      Reviewer #2 (Recommendations for the authors):

      (1) Typos/inconsistencies/writing clarifications: All table and figure legends and titles are missing a period at the end of each sentence. In the sentence "to be estimated by bootstrap methods. Initially, we ran...", there should be a space between "methods" and "Initially" (line 113).

      Apologies, these are now remedied.

      (2) In Table 2, the total number of patients in the meta-analysis of all 12 studies is reported as 133,262, whereas the text states 133,475 patients. Based on my calculations from Figure 2, the total appears to be 133,262. Could you please clarify this discrepancy?

      Certainly – your calculations are correct. The text figure was a typo based on a very early draft where the summation function was not correctly run, and doubled counted some cases. This was fixed for the figure but not the text. The text should now match, thank you for spotting this. There are some issues with figure 2, which I will address in next few points.

      (3) Regarding this point, the meta-analysis by Zhang et al. (2019) shows some inconsistencies in the reported number of patients in the paper. According to the data provided on GitHub the total number of patients is 37671. However, Table 1 of the paper lists 38538 patients, and the main text states "5 RCTs involving 39168 patients." Similarly, for Guo et al. (2023), the main text reports that the meta-analysis included 11 RCTs with 112165 patients, whereas the table lists 111952, which appears consistent with the data available on GitHub. There is also a discrepancy in Zhang et al. (2022), which cites 61853 patients in the introduction but 61223 patients in Table 1. These inconsistencies should be clarified, as even small discrepancies in reported sample sizes can undermine the credibility of the analyses presented.

      Well-spotted – the incorrect figures are artefacts of an early draft with a double-counting summation function, and I should have spotted them and removed them prior to submission. To clarify, the correct figures from each study (which agree with github data) are given in the corrected table 1.

      Thus, there are 38,538 subjects in the Zhang et al 2019 analysis, which matches the first sheet of the github listing. The confusion comes from sheet 2 which was included only with this, which breaks these events down into events / non-events (hence the total non-events being 37,671) but keeps the old labels. This is needlessly confusing, and accordingly I have re-uploaded the data with correct headers for sheet 2.  This summation problem was also apparent in the total of figure 2, which has been replaced with a correct version now. Thank you for spotting this!

      (4) In line 158, who does "He" refer to? Please clarify this in more detail.

      Apologies, this was a typo and should have read “the” – now corrected.

      (5) The discrepant results of the RCT by Scragg et al. (2018) between the meta-analysis by Zhang et al. and that by Guo et al. could be presented in a table. This could be included as supplementary material or, preferably, in the main text (Results section).

      To avoid confusion, I will add a version of this to the github files for interested users to explore.

      (6) In the legend of Figure 2, a period is missing at the end of the sentence. Additionally, although it is generally understood, it would be helpful to specify that the numbers in parentheses represent the confidence intervals. Please confirm whether these are 95%, 89%, or 99% confidence intervals.

      Apologies, these are 95% CIs. Clarified now in updated legends.

      (7) The statement of "The more recent and robust methods for fragility analysis (EOI) and redaction (ROAR) have potential applications beyond fragile-by-design RCTs, extending to cohort studies, preclinical work, and even ecological studies, as stated by the author" in line 163. Could you please provide references supporting these claims? I believe the relevant references may be included in the EOI paper, but it would be helpful to cite them here as well.

      This has recently been used in new analysis now cited in the introduction with fuller description of method for context. Please see response to reviewer 1, points 2

      (8) Since the study was previously published as a preprint (https://www.medrxiv.org/content/10.1101/2025.08.15.25333793v1.full-text), this should be mentioned in the manuscript.

      Added as a note now.

      (9) It would also be valuable to include a figure illustrating ROAR for the same meta-analyses presented in Figure 1 for EOI, possibly as supplementary material.

      See reply to point 10.

      (10) Finally, it would be interesting to provide plots of both EOI and ROAR for the meta-analyses of all 12 included studies. These graphs could be replicated using the code examples provided by the author in the original EOI and ROAR publications.

      These have now been added to the github repository as supplementary material.

      (11a) Replications of EOI fragility: eoicfunc.R (github): - In the code provided on GitHub, an error occurred in the "EllipseFromEquation" function within eoifunc. This was due to the PlaneGeometry package not being available for the latest version of R. I attempted several installation methods (using devtools, remotes, and GitHub, as well as direct installation from a URL). However, after adjusting the code, I was able to run the analyses. For the full cohort, including all 12 studies using the EOI approach, I obtained a Minimal Experimental Arm only recoding (xi) = 14 and a Minimal Control Arm only recoding (yi) = 15, whereas the authors reported that 5 recodings were sufficient. It appears that differences in code versions or functions might have slightly affected the results. After downgrading R and running the eoic function with PlaneGeometry successfully installed, the fragility index for the EOI approach was 15 rather than 5.

      Apologies for the issue with PlaneGeometry, I will try to fix this for future iterations. The difference you see is an artefact of running EOIFUNC on pooled data, rather than the dedicated EOIMETA function, with the chief difference being that EOIFUNC doesn’t apply WIV correction.  If we simply pool events, this is the output:

      Author response image 1.

      If the reviewer uses the EOIMETA function which employs inverse weighing, then to define each trial we use a vector of events and non-events in each arm. For all the 12 studies, this would be (in R code syntax, or import from github file)

      Author response image 2.

      Then they will obtain:

      Author response image 3.

      If the reviewer runs a simple pooler analysis with weighed inverse correction turned off, they should return a similar answer as a simple eoifunc call, save the zero count correction difference. But EOIMETA weighs the sample, and is reported in main paper.

      (12) I recalculated the eoic function for Zhang et al. (2019) and found a fragility index (dmin) of 1. FECKUP Vector Length: 0.5722. Minimal Experimental Arm Recoding (xi): 0.7738. Minimal Control Arm Recoding (yi): 0.8499.

      This again appears to be an artefact of using eoifunc rather than eoimeta; with eoimeta, which uses WIV to adjust the studies for heterogeneity effects, this is the reported output:

      Author response image 4.

      (13) Using the previous code (before downgrading R and loading PlaneGeometry), I recalculated the EOI for Zhang et al. (2022) and found Minimal Experimental Arm only recoding (xi) = 55 and Minimal Control Arm only recoding (yi) = 59-results slightly closer to those reported by the authors. After properly loading PlaneGeometry, I recalculated and obtained for Zhang et al. (2022): Fragility index (dmin) = 57; FECKUP Vector Length = 39.948; Minimal Experimental Arm Recoding (xi) = 54.5436; Minimal Control Arm Recoding (yi) = 58.635.

      Again this appears to be a difference in using eoifunc or eoimeta as a call -  I can replicate this result using EOIFUNC:

      Author response image 5:

      But adjusting for study weighing with eoimeta:

      Author response image 6.

      (14) For Guo et al. (2022), the EOI fragility index was 17 [dmin = 17]. FECKUP Vector Length: 11.3721. Minimal Experimental Arm Recoding (xi): -15.6825. Minimal Control Arm Recoding (yi): -16.5167. However, the authors report an EOI fragility of 38. Since I was able to load PlaneGeometry properly and run eoicfunc.R (from GitHub) without errors, the discrepancies likely reflect minor coding or version inconsistencies rather than software limitations.

      These again stem from using eoifunc on simple pooled data versus eoimeta, which adjusts by study.

      (15) Replications of ROAR fragility: roarfunc.R (github): - For Guo et al. (2022), the ROAR fragility calculated using roarfunc.R was 16 [rmin (Redaction Fragility Index) = 16]. FOCK Vector Length: 15.942. Minimal Experimental Arm Redaction (xc): 15.9442. Minimal Control Arm Redaction (yc): 978.8906. In the main text, the author reports a redaction fragility of 37. What might explain these discrepancies?

      Again, this stems from EOIMETA versus EOIFUNC (and roarfunc calls without weighed adjustment). As the reviewer has observed, the fragility increases when there is no study level adjustment, which we have now added to the discussion text.

      (16) In generic_run.R, line 6 contains a bug - it is missing a forward slash (/) between the directory path and the filename. The correct line of code should be: pathload = paste0(pathname, "/", filename, exname). The same issue occurs in generalcode.R.

      Apologies, I will correct this in the upload!

      (17) Theoretical framework: Is there any other method available for comparison besides the one proposed by Atal et al.? Could you include a brief literature review describing alternative approaches?

      To my knowledge, there is not – Xing et al (now referenced) covered this earlier in the year, and I have included an expanded background for this purpose. Please see reply to reviewer 1, point 1.

      (18a) There appears to be no heterogeneity in the meta-analysis in terms of effect sizes and I², likely because most values are quite large, yet the included studies address very different populations (e.g., patients with COPD, NSCLC survivors, older adults, women, and GI cancer survivors). This could have been explained more clearly, including how such diverse literature might influence fragility indices or whether there is a logical rationale for combining these studies. Could you perform a sensitivity analysis or provide a conceptual explanation of how the heterogeneity - or lack thereof - across these trials may affect the fragility indices? Although I² values are small, the conceptual heterogeneity among studies suggests that the pooled results may be comparing fundamentally different clinical contexts, which requires clarification.

      I think this is a very pertinent point, I am unsure as to why these authors combined such diverse populations without any consideration of whether they were comparable, but this is a common problem in meta-analysis. I have added the following to the discussion to address this problem:

      “The use of vitamin D meta-analyses in this work was chosen as illustrative rather than specific, but it is worth noting that there are methodological concerns with much vitamin D research. (Grimes aet al., 2024). The three studies cited in this work report relatively low heterogeneity in their meta-analysis in both effect sizes and I<sup>2</sup> values, but it is worth noting that the included studies addressed very different populations, including patients with Chronic Obstructive Pulmonary Disease, Non small cell lung cancer survivors, women only cohorts, older adults, and gastrological cancer survivors. These groups have presumably different risk factors for cancer deaths, and why the authors of these studies combined the cohorts with fundamentally different clinical contexts is unclear. Why the heterogeneity appeared so relatively low in different groups is also a curious feature. This goes beyond the scope of the current work, but serves as an example of the reality that meta-analysis is only as strong as its underlying data and methodological rigor in comparing like-with-like, and the conclusions drawn from them must always be seen in context.”

      Reviewer #3 (Recommendations for the authors):

      (1) Line 156, acronym FI not defined.

      Apologies, I this is now defined at the outset as “fragility index”.

      (2) Line 158, typo "He"?

      Apologies again, this was a typo and was supposed to read “the”, fixed now.

      (3) Across the manuscript, I think the "re-coding" phrasing may confuse clinical readers. Maybe rephrasing to "flipping event classification" or "flipping group" would be better.

      Excellent point – this has now been modified at the outset.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

      We now demonstrate in Fig. S9 efficient degron-mediated depletion of both NUF2 and SPC24 in cell-cycle arrested cells by Western blotting. We show similar data for siRNA knockdowns. Our siRNA knockdown experiments include a “siDEATH” control that induces cytotoxicity by targeting several essential genes. In Fig. S6a we now show that siDEATH transfection results in strong cytotoxicity and cell death in cycling as well as cell cycle arrested G1/S and G2/M populations indicating efficient protein depletion. Additionally, in Fig. S6b we now show depletion NCAPH2 protein levels by siRNA knockdown in cycling as well as cell cycle arrested cell populations by Western blot analysis. We mention these results on page 11 and page 13.

      Reviewer #2 (Public review):

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      We discussed our hit selection criteria on page 8 and in the Methods section. Some of the concerns regarding a bias towards non-essential genes are alleviated by the fact that our screen is limited to a relative short duration of 72 hours rather than the longer timepoints that are generally used to assess essentiality in pooled CRISPR-KO screens, allowing us to identify genes that may be essential if eliminated permanently. In support of this notion, we identify subunits of the essential condensin and cohesin complexes as hits with only limited effect on cell viability. In this case, the Z-score for change in cell number upon NCAPH2 knockout was -0.26 indicating only a mild reduction compared to the average cell number across all targets.

      Other confounding effects on hit selection due to micronuclei formation, cell cycle effects etc. are minimized as we closely monitor micronuclei formation and cell viability in our screen. Finally, aneuploidy is similarly not a confounding factor in hit identification since, as we previously demonstrated, the Ripley’s K-based clustering score is robust to changes in spot number (Keikhosravi, A., et al. 2025).

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      We appreciate these points. Given the presence of one centromere on each chromosome, we used centromeres as surrogate landmarks of higher-order nuclear genome organization and considered centromere patterns as a general indicator of overall genome organization. While the relationship of centromere patterns to other genome features is poorly understood in mammalian cells, a link is suggested by observations in other organisms. For example, in yeast, the clustering of centromeres reflects the overall Rabl configuration of chromosomes. Having said that, we agree that our extrapolation to overall genome organization is somewhat speculative, and we have toned down these conclusions throughout the manuscript.

      We agree that one of the most interesting questions emerging from our study is whether centromere clustering has a functional role. In follow-up studies we will use some of the key regulator identified in these screens to perturb the native centromere distribution and assay for various cellular responses including in gene expression and genome integrity. These studies will be the subject of future publications.

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      We thank the reviewer for this comment. We now clarify the relationship of these proteins to centromeres in more detail on page 12. While they all have some relationship to centromeres, as would be expected if they contributed to centromere clustering, they represent multiple distinct pathways and processes.

      The observed effects on clustering are unlikely due to aneuploidy as only very limited aneuploidy is observed in our cells and because Ripley’s K measurement of centromere clustering is robust to change in chromosome copy number. Follow-up studies using live cell imaging approaches are currently in progress to address some of these mechanistic questions.

      Finally, the additive effects observed mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

      We rephrased the text on page 14 based on the reviewer’s recommendations.

      Reviewer #3 (Public review):

      Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      We now include in Fig. S11 examples of disordered mitotic nuclei observed in the absence of NUF2 or SPC24.

      I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

      We have modified this description on page 4.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Proper characterisation of the cell lines used in the manuscript. Tagged proteins have been known to affect protein levels compared to the parental cell, and where this is the case (or not), it needs to be transparently shown in the manuscript.

      The cell lines to conditionally deplete NCAPH2 and KI67 have previously been published, and they have been characterized to show normal expression levels of the tagged protein (Takagi et al., 2018). We also show quantification of Western blots to compare protein level of tagged SPC24 and NUF2 to that of the untagged proteins in the parental cell line (Fig. S8e-f) and discuss these results on page 11 and page 12.

      (2) Demonstration of protein depletion in the degron cell lines.

      We showed efficient protein depletion in the degron cell lines (Fig. S8c and S8d). In addition, we now show in Fig. S9 depletion of SPC24 and NUF2 in cells arrested at G1/S and G2/M.

      (3) The study examines centromere clustering, but not genome architecture. While it is understood that a complete investigation of genome architecture is beyond the scope of the current study, the interpretation does not match the data. The authors are suggested to pay attention to this point throughout the manuscript and consider their findings in terms of centromere clustering rather than genome architecture, including changing the title accordingly.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and a link to overall genome organization has been suggested in some organisms such as yeast, we have retained the wording in a few select instances, including the title. We also make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      Reviewer #1 (Recommendations for the authors):

      (1) Controls of depletion by western blot in synchronized cells (siRNAs and degrons) are lacking.

      We now show Western blots demonstrating efficient depletion of the target proteins in degron (Fig. S9) and siRNA treated cell-cycle arrested cells (Fig. S6b).

      It would have been very nice to discuss the implications of these findings further. For example, do centromere clustering changes gene expression/repression of pericentromeric heterochromatin expression? Is centromere clustering associated with specific diseases? How is global chromatin organization affecting gene expression/genome stability, etc? Although some of these aspects are unknown, a discussion about them would have been nice.

      We appreciate these interesting points. These questions are the subject of our ongoing follow up studies. We now discuss possible consequences of centromere re-organization on gene expression and genome stability on page 18.

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) Clarify Scope and Avoid Overinterpretation

      (a) The study exclusively investigates centromere positioning, without addressing broader aspects of genome architecture.

      (b) There is no established link presented between centromere positioning and higher-order genome organisation.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and observations in yeast suggest such a link, we have retained the wording in a few select instances. We make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      (c) The exclusion criteria used in the screen should be clearly explained, including the implications of selecting only non-essential or redundant genes.

      We discuss on page 8 and in the Methods section the exclusion criteria used in the screen, including the implications for identifying essential genes.

      (d) The authors should discuss why the identified proteins significantly affect centromere clustering but do not impact cell cycle progression.

      We now discuss this topic briefly on page 9. While some hits are expected to affect both cell-cycle progression and centromere clustering (Fig. S4c), it is not a priori expected that all hits would affect both.

      (2) Supplementary Figure 1

      This figure appears unnecessary. The co-localisation between CENP-C and CENP-A is well established in the literature, and the scoring provided does not add essential new information.

      The data was included in response to repeat questions from a centromere expert. We prefer to retain this data for completeness.

      (3) Differential Hits between Cell Lines 

      For hits that behave differently across cell lines, expression data should be provided. Are the genes equally expressed in both cell types? What is the level of depletion achieved?

      It is possible that cell-type specific hits arise due to difference in expression. Cell-type specific hits may also arise due multiple other reason including cancer vs. non-cancer origin, hTERT-immortalization, cell growth properties, variation in underlying DNA sequences of the Cas9 target loci, initial state of centromere clustering to name a few. Each of these possibilities requires additional experiments to identify the exact reason for cell-type specificity of a given factor. A full analysis of the reason for cell-type specificity is, however, beyond the scope of current study.

      (4) Efficiency of Cell Cycle-Specific Degradation

      Degradation efficiency likely varies across cell cycle stages. The authors should provide Western blots showing the extent of protein depletion at each cell cycle block.

      We provide Western blot data in Fig. S9 to demonstrate efficient knockdown of proteins in G1/S and G2/M arrested cells.

      (5) Figure S6 - Validation of New Cell Lines

      Genotyping data for the newly generated cell lines should be included, along with Western blots using protein-specific antibodies (not just the tag), compared to the parental cell line.

      We provide in Fig. S7c-d genotyping data and in Fig. S8e-f Western blot data to compare levels of tagged and untagged proteins.

      (6) Figure S7 - G2/M Block Efficiency

      The G2/M block appears suboptimal after 20 hours in RO-3306, with only ~50% of cells in G2/M and just 21-27% for Ki-67, where most cells remain in S phase. This raises concerns about the interpretation of mitotic depletion effects. It is possible that cells never progressed from G1 or completed S phase without Ki-67. Prior studies (van Schaik et al., 2022; Stamatiou et al., 2024) have shown delayed and uneven replication of centromeric/pericentromeric regions upon Ki-67 depletion during S phase, which could affect the readout. Live-cell imaging would be a more robust approach to confirm mitotic status.

      For KI67 after RO-3306 treatment, 73 and 67% cells were arrested at the G2/M boundary in the presence or absence of KI67, respectively (Fig. S10a-b). Upon release from G2/M arrest, the proportion of G1 cells increased from 6-13% to 28-60% in all four factors tested (Fig. S10b, and d). Please note that our results are not directly dependent on release efficiency, since we use single-cell staging (Fig. 3b) and selectively analyze only G1 populations (Fig. 5c).

      We are currently working towards live cell imaging, but this requires development and characterization of additional cell lines which is beyond the scope of this study.

      Statistical analyses of cell cycle phase distributions should also be included.

      We include statistical analyses of cell cycle phase distributions in Fig. S4c and Fig. S10c-d by performing t-tests with FDR corrections to compare percentage of cells in either in G1, S or G2 in the presence and absence of each factor tested.

      (7) Aneuploidy Assessment

      Aneuploidy scores for the four key proteins should be provided, ideally using centromere-specific FISH probes.

      While an aneuploidy score for each hit would be interesting piece of information, we showed in a previous publication that the Ripley’s K-based Clustering Score method used here is robust to aneuploidy (Keikhosravi et al., 2025) and aneuploidy would thus not lead to spurious identification of these proteins in our screen.

      (8) Add-Back Experiment (Page 14)

      While the add-back experiment is conceptually strong, its execution could be improved. <br /> It should be performed on synchronised cells: deplete the protein in G2/M, arrest in thymidine, then release into G1 without the protein to observe the unclustering phenotype.

      Re-expression should occur during the block, followed by release and analysis in the next G1 phase. This would better demonstrate whether clustering defects from the previous division can be rescued.

      We have attempted these types of long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (9) Statistical Analyses

      Several figures lack statistical analysis, which is essential for data interpretation:

      (a) Figure 1B-E

      (b) Figure 3I

      (c) Figure 4B

      (d) Figure 5B, C, G

      (e) Supplementary Figures S4B and S7

      Statistical analyses were performed for a) Fig. 1b-e, b) Fig. 3i, c) Fig. 4b, d) Fig. 5b-c and the details of the test are mentioned in the corresponding figure legends. We also include statistical tests for Fig. 5g, S5b and S7c-d.

      Minor Comments:

      (1) Page 9: "Reassuringly, in line with known centromere-nucleoli association (Bury, Moodie et al. 2020, van Schaik, Manzo et al. 2022)..."

      The citation "van Schaik, Manzo et al. 2022" is incorrect and should be revised.

      We have removed this reference.

      (2) Page 10:

      "...were grouped into six categories: regulators of chromatin structure, kinetochore proteins, nucleolar proteins, nuclear pore complex components..."

      The authors should note that NUP160, listed as a nuclear pore complex hit, is also a kinetochore component during mitosis and may be linked to mitotic defects.

      We now mention this on page 10.

      (3) Page 12:

      "Progression through S phase was equally efficient in the presence or absence of KI67."

      While bulk S phase progression may appear unaffected, refined analyses (e.g., Repli-seq, EdU patterning) have shown delayed replication of centromeric/pericentromeric regions upon Ki-67 depletion. This should be acknowledged, especially given the study's focus on centromeres (see Schaik et al., 2022; Stamatiou et al., 2024).

      Our statement was meant to describe the results we observed in this study. We indicate that overall progression is not affected, but subtle effects may persist, and we cite the relevant references on page 13.

      (4) Page 12:

      "KI67 is a well-known marker of cell proliferation..."

      The first study demonstrating the dependency of chromosome periphery on Ki-67 was Booth et al., 2014, which should be cited.

      This citation has been added.

      Reviewer #3 (Recommendations for the authors):

      (1) On page 14, paragraph 1, the authors suggest that NCAPH2 and SPC24 act independently on centromere clustering. I'm not convinced that this is the right interpretation of the data. Rather, the lack of an additive phenotype following NCAPH2 and SPC24 dual depletion suggests to me that these two proteins are acting in the same pathway.

      We show that knockdown of NCAPH2 and SPC24 results in opposite effects in centromere clustering. However, knockdown of SPC24 in NCAPH2-AID cells produces an intermediate level of clustering compared to depletion of NCAPH2 or SPC24 knockdown alone. This indicates additive effects. We have modified our description of these results on p. 14.

      (2) The analysis and experimental design in Figure 5g could be improved. For one, I would add statistical comparisons like the other figure panels. Second, the authors would ideally perform AID depletion in a synchronized G2 population before washout during the subsequent G1. This design might make some of the more subtle changes (e.g., KI67-AID) more obvious.

      We now include statistical analysis in Fig. 5g. We have attempted long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (3) In the discussion, the authors allude to centromere clustering data from the NDC80 complex, HMGA1, and other HMGs but fail to direct the reader to where they may find the data. If these data are in Tables S4 and S5, perhaps the authors could make these tables more reader-friendly?

      For each target, the mean Z-score of two biological replicates based on Clustering Score is located in column H in Table S4 and S5.

      (4) In my opinion, the term 'clustering score' comes across a bit ambiguous. In most cases, this term appears to refer to the distance between centromeric foci but is used occasionally to refer to the number of centromeric spots. For example, on page 9, paragraph 1, line 3, cluster/clustering is used three times but with slightly different meanings. Perhaps the authors can consider using the word 'clustering' to indicate the number of spots, 'dispersion' to indicate distance between centromeres, and 'radial distribution' to indicate distance from the nuclear center? Or other ways to improve the consistency of the descriptive terms.

      We apologize for not being clear. The Clustering Score is a very specific parameter derived from use of a Ripley’s K clustering algorithm as described in Materials and Methods. We now ensure that the term is used correctly throughout and that the other terms are also used consistently.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses

      As presented, the manuscript has limitations that weaken support for the central conclusions drawn by the authors. Many of the findings align with prior work on this topic, but do not extend those findings substantially.

      An overarching limitation is the lack of temporal resolution in the manipulations relative to the behavioral assays. This is particularly important for anxiety-like behaviors, as antecedent exposures can alter performance. In the open field and elevated zero maze assays, testing occurred 30 minutes after CNO injection. During much of this interval, the targeted neurons were likely active, making it difficult to determine whether observed behavioral changes were primary - resulting directly from SuM neuronal activity - or secondary, reflecting a stress-like state induced by prolonged activation of SuM and related circuits. This concern also applies to the chronic inhibition of ventral subiculum (vSub) neurons during 10 days of CSDS.

      We appreciate the reviewer's concern regarding the timing of CNO administration relative to behavioral testing. The 30-minute interval was selected according to some previous studies[1, 2]. This window ensures stable and specific neuronal manipulation while minimizing off-target effects and was strictly performed through all experiments. We acknowledge that shorter interval (~15 mins) can be efficient to produce biological effect in vivo[3, 4]. We repeated chemogenetic tests 2-3 times to make sure to get reliable data for statistical analysis. However, we cannot exclude potential side-effects caused by chemogenetically prolonged activation of SuM because of its poor temporal resolution compared to optogenetic manipulation. We agree that employing techniques with higher temporal resolution, such as optogenetics, in future studies would provide an excellent complement to these findings.

      The combination of stressors (foot shock and CSDS) and behavioral assays further complicates interpretation. The precise role of SuM neurons, including SANs, remains unclear. Both vSub and dSub neurons responded to foot shock, but only vSub neurons showed activity differences associated with open-arm transitions in the EZM.

      We agree that the use of multiple stressors (foot shock and CSDS) adds complexity to the interpretation. Our rationale was to test the generality of the SuM response and the role of SANs across different stress modalities (acute vs. chronic). The key finding is that while both vSub and dSub projections to the SuM were activated by the acute stressor of foot shock (Figure 5N-R), only the vSub-SuM pathway showed a significant increase in calcium activity specifically during the anxiety-provoking transition from the closed to the open arms of the EZM (Figure 5I-M). This dissociation suggests a selective role for the vSub-SuM circuit in encoding anxiety-related information, beyond a general response to stress.

      In light of prior studies linking SuM to locomotion (Farrell et al., Science 2021; Escobedo et al., eLife 2024), the absence of analyses connecting subpopulations to locomotor changes weakens the claim that vSub neurons selectively encode anxiety. Because open- and closed-arm transitions are inherently tied to locomotor activity, locomotion must be carefully controlled to avoid confounding interpretations.

      We thank the reviewer for highlighting the important studies linking the SuM to locomotion. We acknowledge this known function and carefully considered it in our analyses. Non-selective activation of the entire SuM didn’t affect total distance traveled in open field and elevated zero maze (Supplemental Figure 2 B-C). Although the locomotion of mice in OF and EZM was affected while targeting SANs, we also compared the travel distance in the central area of OF, to some extent, to minimize the influence of locomotion on the estimation of anxiety produced avoidance to the central area (Figure 4 I). We agree that future work delineating the specific subpopulations within the SuM that regulate locomotion versus anxiety would be highly valuable.

      Another limitation is the narrow behavioral scope. Beyond open field and EZM, no additional assays were used to assess how SAN reactivation affects other behaviors. Without richer behavioral analyses, interpretations about fear engrams, freezing, or broader stress-related functions of SuM remain incomplete.

      In addition, small n values across several datasets reduce confidence in the strength of the conclusions.

      We acknowledge that the primary focus on OF and EZM tests is a limitation in fully characterizing the behavioral profile of SAN manipulation. These tests were selected as they are well-validated, standard assays for anxiety-like behavior in rodents[5–10]. However, we also included the reward-seeking test, where activation of SANs significantly suppressed sucrose consumption (Figure 4L), suggesting a broader impact on motivational state that is often linked to anxiety. We fully agree with the reviewer that employing a richer behavioral battery—such as tests for social avoidance, conditioned place aversion, or Pavlovian fear conditioning—in future studies will be essential to comprehensively define the functional scope of SuM SANs and to conclusively dissect their role from fear memory engrams.

      Figure level concerns:

      (1) Figure 1: In Figure 1, the acute recruitment of SuM neurons by for shock is paired with changes in neural activity induced by social defeat stress. Although interesting, the connections of changes induced by a chronic stressor to Fos induction following acute foot shock are unclear and do not establish a baseline for the studies in Figure 3 on activation of SANs by social stressors.

      Thank you for this important comment. We agree that directly linking acute foot shock-induced cFos expression with chronic social defeat stress (CSDS) electrophysiological changes may create an interpretive gap. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). We did not intend to imply that the same neuronal population responds identically to both stressors.

      To address this, we have clarified in the text that the purpose of Figure 1 is to show that SuM is responsive to diverse stressors, rather than to establish a direct mechanistic link between acute and chronic activation patterns. The baseline for SAN studies in Figure 3 is established through the TRAP2 tagging protocol following foot shock, independent of the CSDS model. We acknowledge that future studies should compare SAN recruitment across acute vs. chronic stressors to better define their functional overlap.

      (2) Figure 2: The chemogenetic experiments using AAV-hSyn-Gq-DREADDs lack data or images, or hit maps showing viral spread across animals. This omission is critical given the small size of SuM, where viral spread directly determines which neurons are manipulated. Without this, it is difficult to interpret findings in the context of prior studies on SuM circuits involved in threats and rewards.

      Please see Supplemental Figure 2 for the infection area of AAV.

      (3) Figure 3: The TRAP experiments show that the number of labeled neurons following foot shock (Figure 3F) is approximately double that of baseline home-cage animals, though y-axis scaling complicates interpretation. It is unclear whether this reflects true Fos induction, low TRAP efficiency, or baseline recombination.

      We thank the reviewer for pointing out the axis scaling issue. We have modified the y-axis to start from 0. The SuM nucleus has been reported to play role in the awake of rodents, it’s reasonable to have some basal neuronal activation after 4-OHT i.p. injection.

      Overlap analyses are also limited. For example, it is not shown what proportion of foot shock SANs are reactivated by subsequent foot shock. Comparisons of Fos induction after sucrose reward are also weakened by the very low Fos signal observed. If sucrose reward does not robustly induce Fos in SuM, its utility in distinguishing reward- versus stress-activated neurons is questionable. Thus, conclusions about overlap between SANs and socially stressed neurons remain uncertain due to the missing quantification of Fos+ populations.

      Thank you for the question. We have replaced the reactivation chance graph with a new reactivation percent analysis graph to show the proportion of SANs that reactivated by subsequent sucrose reward or stress. The rationale we use social stress other than foot shock is to show the potential generality of foot-shock tagged neurons. The lower expression of cFos after sucrose exposure suggest first, the SuM may not involve in reward regulation, which we agree with you; second, those SANs are more likely to modulate anxiety-like behavior but not reward.

      (4) Supplemental Figure 3: The claim that "SANs in the SuM encode anxiety but not fear memory" is not well supported. Inhibition of SANs (Gi-DREADDs) did not alter freezing behavior, but the absence of change could reflect technical issues (e.g., insufficient TRAP efficiency, low expression of Gi-DREADDs). Moreover, the manuscript does not provide a positive control showing that SuM SANs inhibition alters anxiety-like behavior, making it difficult to interpret the negative result. Prior work (Escobedo et al., eLife 2024) suggests SuM neurons drive active responses, not freezing, raising further interpretive questions.

      We agree that here we didn’t provide enough data to confirm there is no regulation effect of SuM-SANs on fear memory. Relevant statement has been removed to avoid any further misunderstanding.

      (5) Figure 4: The statement that corticosterone concentration is "usually used to estimate whether an individual is anxious" (line 236) is an overstatement. Corticosterone fluctuates dynamically across the day and responds to a broad range of stimuli beyond anxiety.

      Thank you for your kind reminder. Corticosterone/cortisol, the primary stress hormone, is a well-established biomarker whose levels are elevated in response to stress and in anxiety states.[11, 12]. Some studies also reported that supplying corticosterone can produce anxiety-like behaviors in rodents[13–16]. We collect the blood sample at the same timepoint in Figure 4 C-D. We agree that line 236 is a kind of overstatement and has modified.

      (6) Figures 5-6: The conclusion that vSub neurons encode anxiety-like behavior is not firmly supported. Data from photo-activating terminals in SuM is shown for ex vivo recording, but not in vivo behavior, which would strengthen support for this conclusion. Both vSub and dSub neurons responded to foot shock. The key evidence comes from apparent differential recruitment during open-arm exploration. However, the timing appears to lag arm entry, no data are provided for closed-arm entry, and there is heterogeneity across animals. These limitations reduce confidence in the authors' central claim regarding vSub-specific encoding of anxiety.

      We thank the reviewer for this important point. To address the concern regarding the in vivo behavioral encoding specificity of the vSub-SuM pathway, we further analyzed the in vivo fiber photometry data. The new analysis revealed that calcium activity in vSub-SuM projection neurons exhibited bidirectional, instantaneous, and specific changes during transitions between the open and closed arms of the elevated plus maze: their activity significantly and immediately decreased when mice moved from the open arm to the closed arm (new results shown in Supplemental Figure 5), and conversely, significantly and immediately increased upon transitioning from the closed to the open arm. However, under the same behavioral events, dSub-SuM projection neurons showed no significant change in activity. We hope this finding could strengthens the role of the vSub-SuM pathway in encoding anxiety-like behavior.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) From the data presented, the authors conclude that "the SuM is the critical brain region that regulates anxiety" (line 190). This interpretation appears overstated, as it downplays well-established contributions of other brain regions and does not place SuM's role within a broader network context. The data support that SuM neurons are recruited by foot shock and, to a lesser extent, by acute social stress. However, the alterations in activity of SuM subpopulations following chronic stress reported in Figure 1 remain largely unexplored, limiting insight into their functional relevance.

      Thank you for the suggestion. We have modified the line 190 with cautious “In this study, we combined multiple methods to determine whether the SuM is a brain region that involve in modulating anxiety.”

      (2) The limited temporal resolution of DREADD-based manipulations leaves alternative explanations untested. For example, if SANs encode signals of threat, generalized stress, or nociception, then prolonged activation could indirectly alter behavior in the open field and EZM assays, rather than reflecting direct anxiety regulation.

      We discussed the DREADD method in the first part in our response.

      (3) The conclusion that "SuM store information about stress but not memory" (line 240) is not fully supported, particularly with respect to possible roles in memory. The lack of a role in memory of events, as opposed to the output of threat or stress memory, may be true, but is functionally untested in presented experiments. The data do indicate activation of the SuM neuron by foot shock, which has been previously reported (Escobedo et al eLife 2024). The changes in SuM activity following chronic stress (Figure 1) are intriguing, but their relationship to "stress information storage" is not clearly established.

      Thank you for your valuable comments. Foot-shock-activated neurons may play role in modulate any of the following anxiety-like behaviors and emotional memory (fear memory). We realized that we didn’t fully test all aspects of anxiety and memory, thus resulting in some overstatements in the manuscript. It is more proper to focus on “anxiety avoidance” according to the reduced open-arm exploration in EZM/EPM.

      Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques to link stress, neuronal ensembles, and circuit function, thereby advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, which highlights functional distinctions between neural circuits underlying anxiety and fear.

      Some aspects would benefit from clarification. For example, how selective is the recruitment of this population to stress compared with other aversive states, and how should one best interpret their definition as "stress-activated neurons" given the relatively modest overlap across stress exposures? In addition, the use of the term "engram" in this context raises conceptual questions. Is it appropriate to describe a neuronal ensemble encoding an emotional state as an engram, a term usually tied to specific memory recall?

      Overall, this work makes a valuable contribution by identifying SuM stress-activated neurons and their ventral subiculum inputs as central elements of the circuitry underlying anxiety. These findings provide a valuable framework for future studies investigating anxiety circuitry and may inform the development of targeted interventions for stress-related disorders.

      We thank the reviewer for raising these important points. We agree that further clarification is warranted. In our study, we compared SAN reactivation across different stimuli: foot shock (acute physical stress), social stress (chronic psychosocial stress), and sucrose reward (non-aversive positive stimulus). As shown in Figure 3, SANs in the supramammillary nucleus (SuM) were significantly reactivated by social stress but not by sucrose reward. Moreover, the c-Fos response in SuM was markedly higher after foot shock compared to home cage controls (Figure 1). While we did not test all possible aversive states (e.g., pain, sickness), our data support that SuM SANs are preferentially recruited by stressors rather than by reward or neutral conditions. We acknowledge that the overlap across stress modalities is not complete, which may reflect differences in stress intensity, duration, or circuit engagement. Future work will systematically compare SAN recruitment across diverse aversive and non-aversive states to further define their selectivity.

      The term “stress-activated neurons” (SANs) here refers to neurons that are reliably activated by at least one type of stressor and can be reactivated by subsequent stress exposure. The partial overlap across stressors likely reflects the diversity of stress responses and the possibility that distinct subpopulations within SuM may encode different aspects of aversive experience. Importantly, chemogenetic activation of SANs was sufficient to induce anxiety-like behavior and elevate corticosterone (Figure 4), supporting their functional role in stress-related behavioral and physiological outputs. We have revised the manuscript to clarify that SANs represent a stress-responsive ensemble rather than a uniform population activated identically by all stressors.

      We appreciate the reviewer’s conceptual caution. In the revised manuscript, we intentionally avoided using the term “engram” to describe SANs. Our focus is on a stress-activated neuronal ensemble that drives anxiety-like behavior, not on memory recall per se. We refer to SANs as an “ensemble” or “population” rather than an engram, consistent with the TRAP-based labeling approach used to capture neurons activated during a specific experience. We agree that “engram” is best reserved for memory-encoding cells and will ensure this distinction remains clear throughout the text.

      Reviewer #3 (Public review):

      Weaknesses:

      The strength of some of the evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. The claim that SuM generates an "anxiety engram" is also judged to be incompletely supported by the evidence. Namely, what is unclear is whether these cells/regions encode anxiety per se versus modulate behaviors (like exploration) that tend to correlate with anxiety. Since many brain regions respond to footshock and other stressors, the response of SuM to these stimuli is not strong evidence for a role in anxiety. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects. As shown in Supplementary Figure 2, stimulating stress-responsive cells in SuM potently reduces general locomotor exploration. This raises concerns that the manipulation could have broader effects that go beyond just changes in anxiety-like behavior. Furthermore, the meaning of an "anxiety engram" is unclear. Would this engram encode stress, the sense of a potential threat, or the behavioral response? A more developed analysis of the behavioral correlates of SuM activity and the behavioral effects of SuM manipulations could give insight into these questions.

      We appreciate the reviewer’s thoughtful critique regarding the specificity of SuM’s role in anxiety and the interpretation of our findings. We acknowledge that SuM has broad functions, including regulating exploration and hippocampal theta. However, our data show that general SuM activation increases anxiety-like measures (reduced open-arm time in EZM, decreased center exploration in OF) without altering total locomotion (Fig. 2, Suppl. Fig. 2). The locomotor reduction in SAN activation experiments (Suppl. Fig. 2F–G) was observed alongside clear anxiety-like behavioral changes (e.g. suppressed reward seeking), suggesting that the effects are not solely due to motor suppression. We agree that the methods we used to estimate anxiety-like behaviors base on mice movement when testing, and this could be a shortage of this research when trying to link the data to anxiety. Therefore it will be more proper to interpret the results as modulation of anxiety-like behavior (anxiety related avoidance) but not anxiety itself. We have modified the manuscript to describe more precise to avoid overstatement.

      Our fiber photometry data (Fig. 5) show that vSub–SuM projection neurons increase activity specifically when mice enter open arms of the EZM—a behavioral transition associated with anxiety—whereas dSub–SuM projections do not. This activity correlates with anxiety-related behavior, not merely with movement or stress per se.

      We also agree that the term “engram” may be misleading in this context. In the manuscript, we refer to SANs as a “stress-activated neuronal ensemble” rather than an anxiety engram. Our data indicate that these neurons are recruited by stress and their reactivation produces more anxiety related avoidance to open arms. We have revised the text to avoid conceptual overreach and to clarify that SuM SANs likely contribute to a state of sustained anxiety/avoidance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting, including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from noting that the subjects were male in the abstract and discussion of the limitations of the exclusion of females.

      Thank you for the suggestion. We have included the full statistical detail in a separate sheet as Table 1. Also, we have modified the title of the manuscript to reflect the sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) In line 211, the authors state, "we recorded neuronal action potentials via multichannel extracellular recording while the mice were moving in the EPM, a traditional type of maze used to test anxiety in rodents,". However, it is unclear what data is presented in the paper, that is, extracellular recordings from SuM in mice on the elevated plus maze.

      We have deleted the description of multichannel recording data in EPM as the data was removed earlier.

      Minor corrections to the text and figures.

      (2) For bar plots, perhaps clarify how the data is presented. For example, in Figure 4, "The data in B, D, E and I-L are presented as the means {plus minus} SEMs," but this does not appear to be plotted as a mean with SEM error bars because the error bars cover all the values.

      Corrected.

      (3) In Figure 5, the white text for EGFP in panel B is very difficult to see.

      Corrected.

      (4) For Figure 5D, it would be helpful to more clearly specify which neurons in SuM were recorded from. Was it SANs or all SuM neurons?

      We did whole-cell recording on all SuM neurons.

      (5) Fos2A-iCreERT2 is mislabeled as "Fos2A-iCreERT" in the methods.

      Corrected.

      (6) The sentence at line 139 "To make sure foot shock induced anxiety won't last until manipulation, we subjected139mice to an acute stress protocol involving foot shocks and then performed the elevated plus140maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7," is unclear as written.

      Thank you for pointing this. We have modified the sentence to make it more clear. “To make sure mice are on similar basal condition while applying chemo-genetic manipulation, we subjected mice to an acute stress protocol involving foot shocks and then performed the elevated plus maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7 (Figure 4 A). The mice that experienced foot shocks showed decreases in the exploration time in the open arms on day 2. However, acute stress-induced anxiety was not detected on day 7 (Figure 4 B), which allow us to compare the reactivation of SANs produced anxiety-like behavior between groups at the same baseline.”

      (7) The details of the viral injections used for ex vivo electrophysiology are not sufficient to understand the experiment and the implications of the data. Which neurons (SANs?) are recorded from, what percent of those had inputs, were the sub-neurons globally labeled or just SANs?

      We performed whole-cell recording on global SuM neurons to show if the projection is innervated by glutamergic neurons in Sub as shown in Figure 5-B that the projection neurons in Sub are exclusively vglut1 expressed. Based on this aim of the experiment, we didn’t keep any neurons that were not response to the light stimulation, therefore can’t calculate the input percent in this case. We have added words to clearly show that we did global SuM neurons in Methods.

      (8) The scale used in Figure 6C renders that data unreadable. 120 to 40% changes in body weight are well beyond the variability in the data.

      We have modified the axis (90 to 110%) to show the body weight change clearer.

      (9) The dose of CNO used, 5 mg/kg, is high, and using lower doses or other DREADD ligands is worth considering.

      Thank you for your valuable comment. We have noticed that people are using relatively lower dose of CNO or other DREADD ligands that are reported much higher affinity and less side-effect. The dose of 5mg/kg was adapted from earlier papers that using DREADD and show no obvious side-effect in mice[17], e.g locomotion (S Figure 2B), in our experiments, so we keep using this dose in this project to make it consistent across different cohorts of experiments. We are switching to DCZ to avoid any potential side-effect of CNO in the following experiments based on this project.

      Reviewer #2 (Recommendations for the authors):

      This is a strong manuscript that provides important insights into the role of the supramammillary nucleus (SuM) and its inputs from the ventral subiculum in regulating anxiety. The combination of behavioral, imaging, electrophysiological, and circuit manipulation approaches is impressive, and the distinction the authors propose between anxiety-related and fear-related circuits is conceptually important.

      There are, however, some points that I think need clarification. The authors emphasize that the hippocampus is essential for fear memory recall, yet they do not directly evaluate whether the SuM-hippocampal pathway might contribute differentially to anxiety versus fear memory. Addressing this would help to explain where the dissociation between the two processes arises.

      Thank you for the suggestion. We realized that we didn’t collect enough data to exclude the role of those SANs on memory, especially fear memory, a memory formation bases on strong emotional training as aforementioned. The data and relevant discussion have been removed to avoid misunderstanding and overstatement.

      I am also not fully convinced about the definition of the "stress-activated neurons" (SANs). The overlap across repeated stress exposures is quite modest (around 20%), which suggests that this population may not be strictly stress-specific but rather a dynamic subset that is preferentially, though not exclusively, engaged by stress. Related to this, the use of the term "engram" raises conceptual questions. Since the classic engram refers to an ensemble encoding and recalling a specific memory, it is not obvious whether it is appropriate to apply the term to a neuronal population that appears to represent a persistent emotional state. The authors should consider justifying this choice of terminology more carefully or adopting a different term.

      Thank you for your important comments. Yes we agree that the SANs in this manuscript are more likely dynamic subset other than exclusive foot-stress engaged “engram”. That’s why we use “stress-activated neurons” but not “engram” to describe this neuronal ensemble. To avoid further misleading, we have made some modification to reduce the use of “engram” across the manuscript.

      Some parts of the text also need more precision. For example, the statement in lines 63-65 that "few studies have explored emotion-related engram cells" is potentially misleading, as most engram studies focus on memories with a strong emotional component. The rationale for this claim should be clarified.

      This sentence has been deleted since it is not necessary to link the text and misleading.

      In Figure 1, the choice of methods is also puzzling: cFos immunostaining is used after shock delivery, while electrophysiology is used for the CSDS paradigm. It would be helpful to explain why different readouts were chosen for different stress models, and whether this may affect the comparability of the results.

      Thank you for this important comment. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). The reason we chose different method is that acute stress produces transit effect while chronic stress produces long-lasting effect. To our knowledge, cFos is a well-established marker for strong neuronal activation, but with short lifespan (~4-6 hours) and suits acute paradigm better. In vivo recording allows us to compare the neuronal activity before and after chronic experiments within subjects and has ability to reveal cumulative effect which cFos cannot. To address this, we have clarified in the text that the purpose of Figure 1 in Line 112-113: “To investigate if SuM would be responsive to diverse stressors, we next examined whether chronic stress, which different mechanism underlying…”

      Finally, some additional details would strengthen the presentation. The discussion of corticosterone and other physiological markers could be expanded to indicate whether these effects were robust across stress paradigms. Similarly, the relatively modest overlap between SANs activated by different stressors could be framed more explicitly as part of a broader principle of flexible ensemble recruitment in anxiety-related circuits.

      Thank you for your suggestion. We have added more discussion about the corticosterone and the flexibility of SANs in the manuscript. See Line 267-270: “The serum corticosterone concentration can be used as a marker of stress-induced change in the peripheral blood. Previous studies showed serum corticosterone can be increased by various stress stimulation [39–42]; meanwhile, intentionally supplementing the diet with corticosterone can induce anxiety-like behaviors in rodents[43].” and Line 275-281: “However, the reactivation rate of SANs caused by different stressor was relatively lower than the initial activation rate caused by foot shock (Figure 3). This suggests that stress-activated neuronal clusters may have more flexible recruitment principles, with only a small number of neurons potentially encoding emotional information, while most other neurons remain involved in encoding other neural activities. Studies in other field, particularly studies of memory engram, has shown that the sets of neurons activated during learning are dynamic and exhibit high flexibility [44, 45].”

      Overall, the work is of high quality and provides a valuable contribution to the field, but addressing these points would help sharpen the mechanistic claims and ensure that the conceptual framework is as clear and precise as the experimental data.

      Reviewer #3 (Recommendations for the authors):

      (1) Since increased SuM activity is hypothesized to mediate the effects of stress on anxiety-like behavior, a logical step would be to test for necessity by silencing the stress-activated SuM cells.

      We agree this is a logical and valuable experiment. While our current study focused primarily on the sufficiency of SuM/SAN activation to induce anxiety-like behavior, we acknowledge that inhibition experiments would provide critical complementary evidence for necessity. We have added a statement in the Discussion noting that “future studies should examine whether silencing SuM SANs, either during stress exposure or during anxiety testing, can prevent or reduce stress-induced anxiety”. This will help establish a more complete causal role.

      (2) Discuss what is meant by "anxiety engram" and what features of anxiety the labeled cells might encode.

      We concur that “stress-activated neuron (SAN)” is a more precise descriptor than “engram” in this context. We have revised the text to avoid the potentially misleading term “engram” and instead refer to a “stress-activated neuron”. The labeled cells are preferentially reactivated by stress (not reward), and their activation promotes both behavioral avoidance and physiological stress markers (corticosterone). They likely contribute to the maintenance of an anxious state under perceived threat, rather than encoding discrete threat cues or memories.

      (3) A more nuanced analysis of behavioral correlates of SuM activity and/or the behavioral effects of SuM manipulations would strengthen this paper.

      To provide a more nuanced understanding of the behavioral correlates, we have performed additional analyses on our fiber photometry data (now presented in Supplemental Figure 6). and have also planned additional experiments for the future study to deepen our understanding.

      References:

      (1) Jendryka M, Palchaudhuri M, Ursu D, van der Veen B, Liss B, Kätzel D, et al. Pharmacokinetic and pharmacodynamic actions of clozapine-N-oxide, clozapine, and compound 21 in DREADD-based chemogenetics in mice. Sci Rep. 2019;9.

      (2) Koike H, Demars MP, Short JA, Nabel EM, Akbarian S, Baxter MG, et al. Chemogenetic Inactivation of Dorsal Anterior Cingulate Cortex Neurons Disrupts Attentional Behavior in Mouse. Neuropsychopharmacology. 2016;41:1014–1023.

      (3) Guettier J-M, Gautam D, Scarselli M, Ruiz De Azua I, Li JH, Rosemond E, et al. A chemical-genetic approach to study G protein regulation of cell function in vivo. Proceedings of the National Academy of Sciences. 2009;106:19197–19202.

      (4) Wess J, Nakajima K, Jain S. Novel designer receptors to probe GPCR signaling and physiology. Trends Pharmacol Sci. 2013;34:385–392.

      (5) Kraeuter AK, Guest PC, Sarnyai Z. The Elevated Plus Maze Test for Measuring Anxiety-Like Behavior in Rodents. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 69–74.

      (6) Kraeuter AK, Guest PC, Sarnyai Z. The Open Field Test for Measuring Locomotor Activity and Anxiety-Like Behavior. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 99–103.

      (7) Wall PM, Messier C. Methodological and conceptual issues in the use of the elevated plus-maze as a psychological measurement instrument of animal anxiety-like behavior. Neurosci Biobehav Rev. 2001;25:275–286.

      (8) Carobrez AP, Bertoglio LJ. Ethological and temporal analyses of anxiety-like behavior: The elevated plus-maze model 20 years on. Neurosci Biobehav Rev. 2005;29:1193–1205.

      (9) Seibenhener ML, Wooten MC. Use of the open field maze to measure locomotor and anxiety-like behavior in mice. Journal of Visualized Experiments. 2015. 6 February 2015. https://doi.org/10.3791/52434.

      (10) Prut L, Belzung C. The open field as a paradigm to measure the effects of drugs on anxiety-like behaviors: A review. Eur J Pharmacol. 2003;463:3–33.

      (11) Chen Y, Zhou X, Chu B, Xie Q, Liu Z, Luo D, et al. Restraint Stress, Foot Shock and Corticosterone Differentially Alter Autophagy in the Rat Hippocampus, Basolateral Amygdala and Prefrontal Cortex. Neurochem Res. 2024;49:492–506.

      (12) Hassell JE, Nguyen KT, Gates CA, Lowry CA. The Impact of Stressor Exposure and Glucocorticoids on Anxiety and Fear. Curr. Top. Behav. Neurosci., vol. 43, Springer; 2019. p. 271–321.

      (13) Peng B, Xu Q, Liu J, Guo S, Borgland SL, Liu S. Corticosterone attenuates reward-seeking behavior and increases anxiety via D2 receptor signaling in ventral tegmental area dopamine neurons. Journal of Neuroscience. 2021;41:1566–1581.

      (14) Myers B, Greenwood-Van Meerveld B. Elevated corticosterone in the amygdala leads to persistant increases in anxiety-like behavior and pain sensitivity. Behavioural Brain Research. 2010;214:465–469.

      (15) Demuyser T, Deneyer L, Bentea E, Albertini G, Van Liefferinge J, Merckx E, et al. In-depth behavioral characterization of the corticosterone mouse model and the critical involvement of housing conditions. Physiol Behav. 2016;156:199–207.

      (16) Shoji H, Maeda Y, Miyakawa T. Chronic corticosterone exposure causes anxiety- and depression-related behaviors with altered gut microbial and brain metabolomic profiles in adult male C57BL/6J mice. Molecular Brain . 2024;17.

      (17) Manvich DF, Webster KA, Foster SL, Farrell MS, Ritchie JC, Porter JH, et al. The DREADD agonist clozapine N-oxide (CNO) is reverse-metabolized to clozapine and produces clozapine-like interoceptive stimulus effects in rats and mice. Sci Rep. 2018;8.

    1. AbstractVector-borne diseases pose a persistent and increasing challenge to human, animal, and agricultural systems globally. Mathematical modeling frameworks incorporating vector trait responses are powerful tools to assess risk and predict vector-borne disease impacts. Developing these frameworks and the reliability of their predictions hinge on the availability of experimentally derived vector trait data for model parameterization and inference of the biological mechanisms underpinning transmission. Trait experiments have generated data for many known and potential vector species, but the terminology used across studies is inconsistent, and accompanying publications may share data with insufficient detail for reuse or synthesis. The lack of data standardization can lead to information loss and prohibits analytical comprehensiveness. Here, we present MIReVTD, a Minimum Information standard for Reporting Vector Trait Data. Our reporting checklist balances completeness and labor- intensiveness with the goal of making these important experimental data easier to find and reuse, without onerous effort for scientists generating the data. To illustrate the standard, we provide an example reproducing results from an Aedes aegypti mosquito study.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag020), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2:

      I read with interest the manuscript as I wholeheartedly agree there is a strong need for harmonization on reporting quantitative measurements of vector traits, especially for the subsequent development of mathematical models. The paper is well written, and examples are very helpful, particularly the one shown in Figure 1, advocating for the need for the sharing of individual (possibly raw) observations. I have some very minor comments and suggestions. Given the broad readership of the journal, I feel the Introduction would benefit from some definitions of what the authors mean by vector and vector-borne diseases, with some examples (WNV, DENV, … up to you). It's not very clear to me how the authors' current proposal aligns with what already proposed in Wu et al. 2022 (ref 21). It seems like some sort of extension? Could you please further elaborate on this? Regarding latitude and longitude, I think also the coordinate reference system should be standardized (WGS, no UTM or others). You might provide some examples of online repositories (line 187). Some (like GitHub) might not be perpetually available, differently from (hopefully) others like Zenodo or the Supplementary Materials accompanying the paper. The latter might be preferrable in my opinion. Figure 1. Please provide the equation of the TPC. Please note that Figure 2 currently does not seem to be cited in the main text (perhaps it should be on line 248?). What does "Dataset: 572" mean? As currently VecTraits seem the best (and only?) example of what the authors are proposing, perhaps it should be mentioned in the Abstract as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Given the prominent connectivity of C2 and C3 to lamina neurons, we actually expected that lamina processing is also affected. We did the experiment of silencing C2 and recording in the lamina neuron L2 and found no significant difference in their response profile (Author response image 1).

      Author response image 1.

      Calcium responses of L2 axon terminals to full field ON and PFF flashes for controls (grey, N=8 flies, 59 cells) or while genetically silencing C2 using shibire<sup>ts</sup> (magenta, N=4 flies, 26 cells). Traces show mean +- SEM.

      We could include these data in the main manuscript, but we do not really feel comfortable in claiming that C2 and C3 have a specific role in motion processing only, even if it was predominantly affecting medulla neurons. To our knowledge, how peripheral visual circuitry contributes to any other visual behaviors, such as object detection, including the pursuit of mating partners, or escape behaviors, is not well understood. Instead, we added a sentence to the discussion stating that our work does not exclude that, given their wide connectivity, C2 and C3 are also involved in other visual computations.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      We apologize: The plots in the manuscript show the mean across all cells, but the statistics were done more conservatively, across flies. We corrected this mismatch and the figure now shows the mean ± ste across flies after first averaging across cells within each fly. Thank you for pointing this out. Since we recorded n=6-8 flies per genotype, we did not include violin plots, which would indeed make sense if we showed data for each cell.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

      We have tried this experiment, but unfortunately, flies were not walking well on the ball, and we were not able to obtain data of sufficient quality.

      Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      We adjusted the title to “are involved in motion detection.”

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      We did not assume large differences between the datasets because Nern et al. 2025 described no major sexual dimorphism. To verify this, we now plotted C2 and C3 connectivity from the three major EM datasets that include C2/C3 connectivity, the female FAFB dataset (Zheng et al. 2018, Dorkenwald et al. 2024, Schlegel et al. 2024) the male visual system (Nern et al. 2025), and the 7-column dataset (Takemura et al. 2015) and see no major differences (Author response image 2 and Author response image 3).

      Author response image 2.

      Relative pres- and post-synaptic counts for C3 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      Author response image 3.

      Relative pres- and post-synaptic counts for C2 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      We fully agree that one could go down this route. Given the widespread connectivity of C2 and C3, and the fact that these are time-consuming experiments with often complex genetics, we had decided to instead study the “compound effect” of C2 and C3 silencing by analyzing T4/T5 physiological properties and motion-guided behavior. We now explicitly explain this logic by saying, “To understand the compound effect of C2 and C3 on motion processing, we focused on the direction-selective T4/T5 neurons, which are downstream of many of the neurons that C2 and C3 directly connect to.”

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

      Apart from the behavioral screen results, we only tested ON edges in our more detailed behavioral characterizations. And while we show phenotypes for the OFF-DS cell T5, it is well established that inhibitory cells that respond to one contrast polarity can function in the pathway with the opposite contrast polarity (e.g., the OFF-selective Mi9 in the ON pathway). We realized that our narrative in the results section was misleading in this regard (we had given the ON selectivity of C2/C3 as one argument why we first focused on the ON pathway) and eliminated this argument.

      For the differential involvement of C2/C3 for T4/T5 responses to stationary and moving stimuli (C2 and C3 silencing affects both T4 and T5 DS responses, but mostly T4 flash responses): We mostly took the disinhibition of flash responses in T4 as a motivation to look more specifically at a potential role in motion-computation. We now added a sentence about the potential emergence of these flash responses to the already extensive discussion paragraph “How could inhibitory feedback neurons affect motion detection in the ON pathway?”

      Last, we added a discussion point about the relationship between C2 and C3 connectivity and the functional consequences, and discussed the fact that C3 connectivity alone does not correlate with a functional role of C3 (alone) in DS computation.

      Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

      We agree that we cannot fully draw this mechanistic argument, but we also do not think that this is a realistic goal of this study. Even in a scenario where one would measure the temporal and spatial properties of “all” neurons that are connected to C2 and C3, this would likely not reveal the full mechanisms linking the single neurons to DS computation, but would require silencing specific connections, or specific molecular components of the connection, or could be complemented by models. A beautiful example where such a mechanistic understanding was achieved, recently published in Nature, essentially focused on a single synaptic connection (between Mi9 and T4) (Groschner et al. 2024), and built on extensive work that had already highlighted the importance of these neurons. We would further argue that the field does not have a good understanding of how T4/T5 responses are translated into behavior. Although possible pathways emerge from connectomes, it is for example not understood why the temporal frequency tuning of T4/T5 substantially differs from the temporal frequency tuning of the optomotor response.

      We therefore would like to highlight that the focus of our study was not to connect all those pieces, but rather to highlight the hitherto unknown overall importance of inhibitory feedback neurons for visual computations along the visual hierarchy, from individual neuron properties, via DS computation, to the temporal precision of the optomotor response.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood."

      This is incorrect not only because it is referred to as a general statement, but also because many studies have examined inhibition in flies. It may not be solely GABAergic inhibition, but that is just one type. While some discussions later address feedback from horizontal cells in the retina, etc., there is no mention of work on color vision, which requires feedback. Please rephrase.

      We now say “visual motion processing” in this sentence, and added a sentence on color vision: “... color-opponent signalling requires reciprocal inhibition between photoreceptors as well as feedback inhibition from distal medulla (Dm) neurons. (Schnaitmann et al., 2018, Heath et al., 2020, Schnaitmann et al., 2024). “

      (2) Line 197: "Because a previous studies" One or many?, but more important, please cite them.

      We corrected to “a previous study” and cite Tuthill et al. 2013

      (3) Line 172: I noticed a few minor grammatical errors and wording issues, such as the use of "we next" twice in one sentence. "To next identify potential GABAergic neurons that are important for motion computation in the ON pathway, we next intersected 12 InSITE-Gal4." I am bad at picking them out, but since I noticed them, I would strongly suggest looking at the text carefully again.

      We deleted one occurrence of ‘next’, thank you for catching that.

      (4) Question to the authors. Why did you use twice independent lines and not checkers for the white noise analysis in Figure 3e?

      We used flickering bars because many visual system neurons tested in our lab respond with a better signal-to-noise ratio as compared to checkerboards. Flickering bars also appear to be more suited to isolate the spatial surround of neurons. This type of stimulus has been successfully used in previous studies to extract receptive fields of neurons in the fly visual system (Arenz et al. 2017; Leong et al., 2016, Salazar-Gatzimas et al. 2016; Fisher et al. 2015, …).

      (5) Line 248: "Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects..." Please state how here. I would need to go to the methods.

      We added a sentence “C2 was silenced by expression of UAS-shibire<sup>ts</sup> (UAS-shi<sup>ts</sup>) for temporal control of the inhibition of synaptic activity.”

      (6) Much of the work in the blowfly uses picrotoxinin to block GABAergic inhibition in the visual motion pathway. It would be useful to mention some of this early work and its results, particularly that of Single et al. (1997). It might be interesting to reinterpret their results.

      Thank you for pointing this out. We added this paragraph to the discussion: ‘Work in blowflies has found a severe impact of GABAergic signaling for DS in LPTCs downstream of T4 and T5 cells, using application of picrotoxin to the whole brain (Single et al. 1997; Schmid and Bülthoff 1988). Although the loss of DS in LPTCs could originate from direct inhibitory synapses onto LPTCs (Mauss et al. 2015; Ammer et al. 2023), the disruption of GABAergic signaling in upstream circuitry, which reduces DS in T4 and T5, may also contribute to the phenotype seen in LPTCs.’

      Reviewer #2 (Recommendations for the authors):

      The following set of corrections aims to better the scientific and presentation aspects of this work.

      (1) The title of the work implies that C2 and C3 neurons are required for motion processing, whereas the study shows their participation in motion computations, which persists post their silencing. Therefore, "Inhibitory columnar feedback neurons contribute to Drosophila motion processing" would be a more appropriate title.

      We rephrased the title to say that inhibitory feedback neurons “are involved in” motion processing.

      (2) The morphology of C2 and C3 neurons, i.e., ramifications in medulla & cell body in medulla and axonal targeting to lamina, implies their feedback role. It would be important to mention the specific feedback loop they participate in and the role of Mi1 more extensively in lines 36, 120.

      We find it hard to speculate on the specific feedback loops that C2 and C3 are involved in from their widespread input and output connectivity. If we had, we would have wanted to support this by functional measurements of this specific loop, which was not the goal of this study.

      (3) In lines 55-89, the authors explore the instances of feedback inhibition within and across species and modalities. For the Drosophila visual example (lines 76-89), given that it also addresses motion circuits, the following studies should be included:

      Ammer, G., Serbe-Kamp, E., Mauss, A.S., et al. Multilevel visual motion opponency in Drosophila. Nat Neurosci 26, 1894-1905 (2023). https://doi.org/10.1038/s41593-023-01443-z. Mabuchi Y, Cui X, Xie L, Kim H, Jiang T, Yapici N. Visual feedback neurons fine-tune Drosophila male courtship via GABA-mediated inhibition. Curr Biol. 2023 Sep 25;33(18):3896-3910.e7. doi: 10.1016/j.cub.2023.08.034.

      We added a sentence on the Ammer et al. finding to the introduction. Since the introduction paragraph focuses on known physiological effects within the visual system, we did not find a good fit for the Mabuchi et al. study, which focuses on serotonergic feedback neurons with a role far downstream in courtship behavior.

      (4) In lines 102-103, the following work should be referenced: Groschner LN, Malis JG, Zuidinga B, Borst A. A biophysical account of multiplication by a single neuron. Nature. 2022 Mar;603(7899):119-123. doi: 10.1038/s41586-022-04428-3.

      We cited a few of the many papers that used “modeling frameworks” and selected the ones focusing on the entire feedforward circuitry. To also give credit to the Borst lab, we instead added Serbe et al. 2016 here.

      (5) In lines 107-108, the Braun et al. (2023) study has not performed Rdl knockdown experiments in T4 cells; hence, it needs to be better clarified in the text.

      We corrected this in the text.

      (6) Even though the dataset was previously published, a summary plot of the different phenotypes would be very helpful to the reader. Moreover, in line 131, as the study focuses on motion vision, it would be better to use "early motion visual processing" rather than "early visual processing.”

      We added a summary plot of the behavioral screen data to Supplementary figure 1, and rephrased previous line 131.

      (7) The first result section title excludes C3 neurons, even though in lines 172-179 they are addressed; therefore, the C3 inclusion is suggested as in "GABAergic C2 and C3 neurons control behavioral responses to motion cues". The term "required" should be excluded from the title as the other neuronal types encountered in the InSITE drivers were never quantified; thus, the "behavioral requirement" might come from these other neurons as well.

      From the experiments shown in this paragraph alone we cannot make conclusive claims about C3, as it was also weakly visible in one of our genetic control in the intersectional strategy that we took (we had written: “This strategy also revealed other GABAergic cell types, including the columnar neuron C3 and the large amacrine cell CT1 which were however also weakly present in the gad1-p65AD control).

      We changed the title of this paragraph to: A forward genetic behavioral screen identifies GABAergic C2 neurons to be involved in motion detection.

      (8) In line 142, it should be clearly stated that the MultiColor FlpOut technique was used and should also be cited: Nern A, Pfeiffer BD, Rubin GM. Optimized tools for multicolor stochastic labeling reveal diverse stereotyped cell arrangements in the fly visual system. Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2967-76. doi: 10.1073/pnas.1506763112.

      We did not use MCFO clones, but simple Flp-out clones, and the genotype and reference for this were given in the methods: UAS-FRT-CD2y+-RFT-mCD8::GFP; UAS-Flp , (Wong et al. 2002). To make this clearer, we now also cite (Wong et al. 2002) in the results section.

      (9) In Figure 1c, a description of RFP should be written as it is already in Supplementary Figure 1c.

      We added this to the Figure caption.

      (10) In line 172, "next" is redundant as it was previously used at the beginning of the sentence.

      Removed

      (11) In line 175, based on both figures that the authors refer to, instead of C2, C3 should be written.

      We do indeed see C3 labeled in the images, but also in a gad1-p65AD control. We thus cannot be sure if C3 indeed reflects the intersection pattern. However, the three lines shown in Figure 1d clearly also label C2, which is not seen in the control condition.

      (12) In line 184, a split-C2 line is used (and a split C3 as in Supplementary Figure 2). It would enhance the credibility of the work and even be appropriate afterwards to use the word "requirement" if this split-C2 line was used for behavioral experiments, as in Gohl et al., 2011, and Sillies et al.,2013 studies.

      We are indeed using the same split-C2 line for imaging and for behavioral experiments in Figure 7. We see Figure 1 (and with that, Silies et al. 2013) as a first pass screen, from which we obtained candidates, which we then more thoroughly tested throughout the remaining manuscript, with more specific lines. We are no longer using the word “requirement”

      (13) In lines 186-188, is DenMark used as a postsynaptic marker? If yes, an additional control would be the use of Discs-large (DLG) as a postsynaptic marker, as DenMark would not be restricted to postsynaptic densities.

      Yes, we used DenMark as written in the sentence “we expressed GFP-tagged Synaptotagmin (Syt::GFP) to label pre-synapses together with the dendritic marker DenMark (Nicolai et al., 2010)”. Since our claims about widespread C2 and C3 connectivity are further supported by connectomics, we did not use another postsynaptic marker.

      (14) In line 191, L2 is mentioned as presynaptic, whereas in Figure 2b is clearly postsynaptic.

      We write “This revealed that C2 forms several presynaptic contacts with the lamina neurons L5, L1, and L2” . L5, L1, and L2 are hence postsynaptic to C2, which is what is plotted in Figure 2b. 

      (15) In line 197, the "a" in "because a previous studies" should be removed, and these studies should be cited as the authors do in line 514.

      Done as suggested.

      (16) In line 1191, the figure title uses the term "required", whereas the plotted data suggest that T4 and T5 responses remain DS after C2&C3 silencing. Rephrasing to "C2 and C3 affect direction-selective.." would be better suited.

      We replaced “required” with “contribute to”

      (17) In the legend of Figure 2b, the "Counts of synapses" is misleading. The number plotted refers to the percentage of synapse counts from the target neuron.

      Corrected.

      (18) A general question about the C2 and C3 ON selectivity: How would the authors explain the OFF deficits from the published behavioral screening in Supplementary Figure 1a? Do the other InSITE neurons contribute to it? This needs to be further elaborated in the discussion.

      A neuron being ON selective does not imply that it is functionally required in the ON pathway only. In fact, Mi9, a major component of the ON pathway (even if not “required” under many stimulus conditions), is OFF selective.

      Furthermore, both we (Ramos-Traslosheros and Silies, 2021) and others (Salazar-Gatzimas et al. 2019) have shown that both ON and OFF signals are combined in ON and OFF pathways, which is further supported by connectomics data. We clarified the transition from physiology to function in the results section, as already explained above.

      (19) In line 216, the authors' image from layer M1, but the reasoning behind this choice is missing. The explanation gap intensifies after you proceed with further examining the layer-specific responses in Supplementary Figure 2. Is this because C2 and C3 receive their inputs in M1, as is insinuated in line 219?

      As Supplementary Figure 2 shows, we initially imaged from all layers of the medulla, where C2 arborizes. Because the response properties, including kinetics, weren’t different, we had no reason to believe that C2 is highly compartmentalized. We thus subsequently focused on layer M1, where amplitudes were highest. We clarified this in the text.

      (20) In line 229, it should be clear whether the STRFs come from M1 measurements. STRF analysis in M5, M8, and M9/10 also verifies that the C2, C3 multicolumnar span would further strengthen the results. Given the focus of the work in Mi1 and T4/T5, Mi1-C2 connections should be clarified in terms of which medulla layer they formulate. Additionally, the reasoning behind showing in Figure 3 STRFs from M1 measurements, even though Supplementary Figure 2b implies equal responses in M9/10, where also Tm1 and Tm4 output from C3, should be explained.

      We never recorded STRFs in the silenced condition and make no claims about C2 changing spatial properties of Mi1. We added the information that STRFs were recorded in layer M1 to the figure caption. We checked the specific connectivity of C2 and Mi1 and they indeed connect in M1 (Author response image 4), but regardless of this result, there is no evidence for compartmentalization in these columnar neurons.

      Author response image 4.

      Image of a C2 (blue) and Mi1 (yellow) neuron from EM Data (FAFB). Circles depict synapses from C2 to Mi1 in layer M1 of the medulla.

      (21) In Figure 3e, the statistical significance or lack thereof is not visible at the bar plot.

      Consistently throughout the manuscript, we now just indicate if a comparison is significant. If nothing is shown, it means that it is not.

      To clarify this, we added a sentence to the statistics section in the methods now saying: We show significant differences in figures using asterisks (p<0.05 *,p<0.01 **, p<0.001***). Non-significant differences are not further indicated.

      Please note that based on another reviewer comment, we also adapted the analysis of the kernels. This changed the statistics to be significant for the timing of the on peak response (Figure 3e).’

      (22) In line 249, it is mentioned that the strongest C2 connection is Mi1; this does not derive from the data shown in Figure 2b.

      We intended to look at medulla neurons, and Mi1 is the most connected medulla neuron to C2. We clarified that in the text, which now reads: “Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects temporal and spatial filter properties of the medulla neurons that provide direct input to T4 neurons. We chose to test Mi1 as it is the medulla neuron most strongly connected to C2.”

      (23) The result section title "C2 & C3 neurons shape response properties of the ON pathway medulla neuron Mi1" does not include C3 results. This would be fundamental to have. As previously mentioned, the neural correlates of this inhibitory feedback loop should be clearly defined, and the current version of this work evades doing so.

      We corrected the title. As discussed elsewhere, it was not the goal of this study to work the specific contributions of C2 (and C3) to all neurons they connect to, but rather focus on the compound effect for motion detection.

      (24) In line 276, the following work should be cited: Maisak MS, Haag J, Ammer G, Serbe E, Meier M, Leonhardt A, Schilling T, Bahl A, Rubin GM, Nern A, Dickson BJ, Reiff DF, Hopp E, Borst A. A directional tuning map of Drosophila elementary motion detectors. Nature. 2013 Aug 8;500(7461):212-6. doi: 10.1038/nature12320.

      We added the citation.

      (25) In line 273, the title implies the investigation of the spatial filtering of T4 and T5 cells. This does not take place in the respective result section.

      We changed the title to: “C2 and C3 shape temporal and spatial response properties of T4 and T5 neurons.”

      (26) In line 280, Kir2.1 is used, whereas previously thermogenetic silencing with Shibirets was preferred; could the authors elaborate on this choice in the text, for example, genetic reasons?

      We generally prefer shibire[ts] because of its inducible nature. However, our T4/T5 recordings too included more stimuli (motion stimuli) than the Mi1 recordings, and the effect of shi[ts] mediated silencing by pre-heating the flies (as established by Joesch et al. 2010) was not longlasting enough for these experiments, which is why we used Kir2.1. In a previous set of experiments, we had tried incubating flies while imaging, but this induced too large movements of the brain and T4/T5 recordings were not stable enough.

      (27) In lines 290-291, T5 ON suppression is found to be affected by C2 silencing, but the bar plot in Figure 5b uses the OFF-step data. It would be best if the ON-step data for T5 cells were also plotted.

      ON-step data for T5 are plotted in Supplementary Fig. 3e

      (28) In line 288, "when C2 was also blocked", "also" should be included, as you are referring to double silencing.

      Sorry for the confusion, we called the wrong figure in that sentence. Here, we wanted to point at the increased response of T4 to the ON-step upon C2 silencing, which was quantified in Supplementary Fig. 3e.

      (29) In line 312, it is important to mention in the discussion why it is the case that C2 and not C3 had an effect on T5 DS responses. C2 outputs to Tm1, whereas C3 to Tm1 and Tm4, based on Figure 2b, with Tm1 and Tm4 being one of the four major cholinergic T5 inputs. Hence, it would be natural to think that C3 and not C2 would affect T5 responses.

      We addressed this in the discussion.

      (30) In lines 326-328, it is crucial to mention the neural correlates that connect C2 and C3 to T4 and T5. Additionally, the Shinomiya et al. (2019) study shows C3 to T4 connections, which are mentioned in the discussion and should be cited in line 429.

      We do not think that mentioning neural correlates at this point is crucial, as these sentences were concluding a paragraph in which we link C2/C3 silencing to T4/T5 responses. We also do not know the neural correlates (but for Mi1) so this would not be accurate.

      We have been mentioning C3 to T4 connection in both the results and discussion, and our analysis (Figure 2) stems from the FAFB dataset. We added citations to both results and discussion.

      (31) In Figure 6a, compared to Figure 3b, the term compass plots is used instead of polar plots. It would be best to use one consistent term. Additionally, in Figure 6c, it is not mentioned if the responses across genotypes are the outcome of averaging across subtype responses.

      These two plots are not the same; a compass plot is a sub-category of polar plots. Polar plots, as in Figure 3, show the response amplitude of the neurons to the different directions of motion. Instead, compass plots, as in Figure 6, show vectors that depict the tuning direction and the strength of tuning of individual neurons.

      We added the following sentence to clarify the calculation in Figure 6c: ‘To average responses of all neurons, the PD of each neuron was determined by its maximal response to one of 8 directions shown.'

      (32) In line 344, the title could be adjusted to "C2 is controlling the temporal dynamics of ON behavior", under the same reasoning of 'requirements' explained before.

      We think that “is controlling” is a stronger claim than “being required”. For a geneticist, the word “required” simply means that there is a(ny) loss of function phenotype, i.e., a reduction in DS when C2 and C3 are silenced/blocked. Many neurons are sufficient but not required to induce a certain behavior (i.e., they can induce a behavior when ectopically activated, but show no significant loss of function phenotype). We therefore consider it remarkable that C2 and C3 silencing indeed shows a significant reduction in DS.

      However, we do not want to overclaim anything, and the title now reads: “T4 tunes the temporal dynamics of ON behavior”

      (33) In Figure 7c, the plot legend should be "deceleration".

      Corrected

      (34) In line 424, the Braun et al. (2023) experiments were performed in T5 cells as previously mentioned.

      Corrected

      (35) In line 435, the authors mention that both ON-selective C2 and C3 neurons act partially in parallel pathways. In Figure 2b, the upstream circuitry between C2 and C3 is identical. How would they explain the functional-connectivity contradiction?

      In terms of acting in parallel pathways, downstream, not upstream, connectivity of C2 and C3 will matter, which is not identical. C2 for example connects to Mi1, L1, and L4, whereas C3 does not. On the other hand, C3 connects to Mi9 and Tm4, which C2 does not.

      (36) In lines 445-447, the authors address C2 and C3 neurons as columnar, whereas they previously showed in Figure 3 that they are multicolumnar.

      Here, we refer to the nomenclature of Nern et al, that use the term “columnar” whenever something is present in each column. We specifically define this by saying “only 15 cells are truly columnar in the sense that they are present once per column and present in each column”. In the results section, we instead talk about “functionally multicolumnar” and changed a sentence in the discussion to say “The spatial receptive fields of C2 and C3 are consistent with the multicolumnar branching of their projections in the medulla” to avoid any such confusion.

      (37) In line 448, "thus" is repetitive, and the extracted view in line 449 does not contribute to the essence of the study.

      Fixed.

      (38) In line 459, the authors refer to inhibition inheritance; this term should be used frequently in the text in case the neural correlates between C2 & C3 and T4 & T5 are not deciphered.

      We think this point is very clear throughout the manuscript now. As one prominent example, we added a sentence to the first paragraph of the discussion saying “Given the widespread connectivity of C2 and C3 to neurons upstream of T4/T5, this effect [on DS tuning] is likely inherited from upstream neurons of T4/T5.”

      (39) In line 521, the transition between sentences is problematic.

      Corrected

      (40) For Supplementary Figure 1, why were the ON-motion deficits not addressed with the antibody approach used for Supplementary Figure 1a?

      The approach using anti-GABA stainings turned out to be largely redundant with the intersectional strategy. Furthermore, the intersectional strategy provided the full morphology of the cell and, hence, led to easier identification of the cell types involved.

      (41) In line 1169, C2 is mentioned, whereas C3 is annotated in the figure.

      Corrected

      (42) A general comment is that Tm1 inputs could be a good candidate for assessing T5 inputs, as performed for Mi1-T4 in Fig.4. Such experiments would enhance the understanding of inhibitory inheritance to T5 responses.

      We fully agree.

      (42) Do the authors have any indication or experiments done regarding the C2&C3 role in T4&T5 velocity tuning? This would be complementary to the direction of this study.

      This is a good idea, that we had tried. However, we did not see a difference between control and C2 silencing for the temporal frequency tuning of T4/T5. As velocity is closely related to temporal frequency tuning, we would not expect to see a difference there either.

      While it would have been nice to be able to draw such a link, we would also state that our behavioral data are a bit different: We did not look at temporal frequency tuning per se, and overall, it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tunings (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013). 

      (43) As a suggestion, Figure 7 would be better positioned as Figure 4, right after the ON-selectivity finding of C2 neurons.

      We preferred to keep the current order.

      Reviewer #3 (Recommendations for the authors):

      Main recommendation:

      It would be useful to propose a neural circuit model that connects the various observations. One can draw here on the many circuit models for motion vision in the prior literature.

      (1) How might the extended response in upstream neurons Mi1 lead to the inappropriate nulldirection responses in T4/T5?

      This is a good question and we can only speculate. Mi1 responses are enhanced upon C2 silencing and T4 responses to full field flash responses are also enhanced. Likely, these motionindependent responses are also seen when the edge travels into the non-preferred direction, whereas this non-motion response would likely be masked by the motion response to the preferred direction. The phenotype seen in T5 is likely inherited from medulla neurons, e.g. Tm1, to which C2 connects. How the delay of the Mi1 response upon C2 silencing may specifically affect ND responses, we don’t know. 

      (2) How is the loss of DS in T4/T5 compatible with the continued sensitivity to motion in the turning response? Perhaps the signal from 180-degree oppositely tuned T-cells gets subtracted, so as to remove the baseline activity?

      This is a great question that we cannot answer. Overall, perturbations that affect T4/T5 physiology do not necessarily manifest in equivalent phenotypes when looking at behavioral turning responses. Prominent examples come from silencing core neurons of motion-detection circuits, such as Mi1 and Tm3 (see Figure 4, Strother et al. 2017).

      (3) How do the altered dynamics in upstream neurons relate to the loss of high-frequency discrimination in the behavior? One would want to explain why the normal fly has a pronounced decay in the response even though the motion is still ongoing (Figure 7b left, starting at 0.4 s). That decay is missing in the mutant response.

      That is an excellent question that we unfortunately do not have an answer for. Please note that our visual stimuli is a single edge which is sweeping across the eye, and which might not elicit equally strong responses at each position of the eye, or each time during the stimulus presentation.

      In terms of linking the dynamics of upstream neurons to behavior, we already pointed out above that it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tuning, with T4/T5 neurons being tuned to lower temporal frequencies than the turning behavior of a fly walking on a ball (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013).

      Other recommendations:

      (1) Abstract line 37 "At the behavioral level, feedback inhibition temporally sharpens responses to ON stimuli, enhancing the fly's ability to discriminate visual stimuli that occur in quick succession." It may be worth specifying *moving* stimuli.

      Done as suggested

      (2) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood." This seems overly negative. Subsequent text mentions a number of such instances that are understood, and one could add more from the retina.

      We agree. We rephrased to say ‘motion vision’ and added more examples of known roles of feedback inhibition

      (3) Line 69: "inhibitory feedback signals from horizontal cells and amacrine cells to photoreceptors and bipolar cells, respectively, are involved in multiple mechanisms of retinal processing, including global light adaptation, spatial frequency tuning, or the center-surround organization (Diamond 2017)." Maybe add the proven role in temporal sharpening of responses, which is of relevance to the present report.

      We added temporal sharpening to that introduction point.

      (4) Figure 1: The text for this figure talks about behavioral motion detection deficits in various lines. Maybe add an example of the behavioral effects to this figure.

      We added a summary plot of the behavioral screen data to Supplementary figure 1.

      (5) Line 325: "the timing of the ON peak tended to be slower for C3 compared to C2 for both the vertical and the horizontal STRF": It's hard to see evidence for that in the data.

      Based on your next comment we reanalysed the kernels of C2 and C3. This resulted in a significant difference in peak timing between C2 and C3. 

      (6) When presenting kernels as in Figure 3d and Figure 4b, extend the time axis to positive times until the kernel goes to zero. This "prediction of future stimuli" allows the reader to see the degree of correlation within the stimulus, which affects how one interprets the shape of the kernel. Also, plotting the entire peak gives a better assessment of whether there are any shape differences between conditions. An alternative is to compute the kernel via deconvolution, which gets closer to the actual causal kernel, but that procedure tends to highlight high-frequency noise in the measurement.

      We replotted the kernels in Figure 3d and 4b to show positive times. The kernels of C2 and C3 stayed at a positive level. Going back through the data we found a severe decrease in GCaMP signal in the first 2 seconds of the recording. We reanalyzed the kernels by ignoring the first seconds. All kernels now go back to zero. The shape of the kernels did not change but we now find a significant difference in peak timing between C2 and C3. Thank you for pointing this out.

      (7) Line 280 "simultaneously blocked C2 and C3 using Kir2.1": First use of that acronym. Please explain what the method is.

      We now explain “we simultaneously blocked C2 and C3 by overexpression of the inwardrectifying potassium channel Kir2.1”

      (8) Line 350 "temporal dynamics for C2 silencing": suggests "dynamics of silencing"; maybe better "response dynamics during C2 silencing".

      Edited as suggested

      (9) Figure 7: Explain the details of the stimulus containing two subsequent on edges. What happens between one edge and the next? Does the screen switch back to black? Or does the second edge ride on top of the final level of the first edge? This matters for interpreting the response.

      Yes, the screen turns dark between subsequent edge presentations. We added a sentence to the methods to clarify that. 

      (10) Line 402 "novel, critical components of motion computation.": This seems exaggerated. At the behavioral level, motion computation is mostly unaffected, except for some details of time resolution. Whether those matter for the fly's life is unclear.

      We deleted the word ‘critical.’

      (11) Line 413 "GABAergic inhibition required for motion detection is mediated by C2 and C3": Again, this seems exaggerated. Motion *detection* appears to work fine, but the *discrimination* of two closely successive motion stimuli is affected. The rest of the text does properly distinguish "discrimination" from "detection".

      We changed the title to say: ‘GABAergic inhibition in motion detection is mediated by C2 and C3.’

      (12) Line 489 "Whereas the role of C2 and C3 for the OFF pathway may be more generally to suppress neuronal activity,": Unclear to what this refers. The present report emphasizes that there is no effect on OFF activity (Figure 5).

      We did not see an effect of T5 responses to OFF flashes as shown in Figure 5 but we found a significant reduction of DS when silencing C2, as well as slightly overall increased responses to all directions for C2 and C3 silencing, which was significant for null directions when silencing C2. This is shown in Figure 6.

      Typos:

      (1) Line 521.

      Fixed

      (2) Line 1170: context of the citation unclear.

      Fixed

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

      Thank you for your assessment of our study. Respectfully, we do not agree that our study shows a lack of benefit of using the EABR approach. For the monovalent boosters, the S-EABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the regular S mRNA booster, which is consistent with the findings from our prior study in naïve mice. In addition, the bivalent S-EABR booster consistently elicited the highest neutralizing titers against all tested variants, including significantly higher titers against BA.5 and BQ.1.1 than the monovalent S booster. The bivalent S-EABR booster also induced detectable neutralization activity in a larger number of mice than all other boosters.

      Consistent with this analysis, please note that reviewers 1 and 2 commented that “the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant” (reviewer 1) and “the authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting” (reviewer 2).

      We agree with the reviewers’ assessment that the EABR booster-mediated improvements were mostly modest, in particular against the BQ.1.1 and XBB.1 strains. We also acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and time-consuming given that we already included 10 mice per group, which is standard practice in the vaccine field.

      Finally, we also wish to point out that we did include experiments that addressed potential mechanistic differences between booster groups. For example, we conducted deep mutational scanning studies to determine polyclonal antibody epitope mapping profiles, showing that bivalent S-EABR boosters induced more balanced targeting of multiple RBD epitopes, which likely contributed to the observed improvements in neutralization. Our work also included cryo-EM studies demonstrating that bivalent S mRNA boosters promote heterotrimer formation, which could potentially drive preferential stimulation of cross-reactive B cells via intra-spike crosslinking. This represents a potential mechanism explaining how bivalent boosters outperformed monovalent boosters in our and many prior studies, which warrants further investigation. Finally, we also performed serum depletion assays, showing that the BA.5 neutralizing activity elicited by the bivalent Wu1/BA.5 S and S-EABR mRNA boosters was primarily driven by cross-neutralizing Abs induced by the primary vaccination series.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      We thank the reviewer for their accurate summary of our study. Please see our comments to the reviewer’s individual points below, as well as our responses to the editor’s assessment above.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      The reviewer raises an important point, and we agree that including additional groups receiving three immunizations with the bivalent spike and/or spike-EABR mRNA vaccines would have improved the experimental design. However, we believe that several prior studies have already demonstrated that Omicron S immunogens are not inherently poorly immunogenic compared to the ancestral S; e.g., Scheaffer et al., Nat Med (2022); Ying et al., Cell (2022); Muik et al., Sci Immunol (2022). Based on these prior reports, we conclude that the lower neutralizing titers against Omicron variants in our study are most likely driven by immune imprinting as a result of the initial vaccination series with the ancestral S immunogen.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      We acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and timeconsuming given that we already included 10 mice per group, which is standard practice in the vaccine field. We added a “Limitations of the study” section at the end of the discussion to address all of these points in detail (lines 570-598 in the revised version).

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

      As we pointed out in our response to the editor’s assessment above, the monovalent SEABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the conventional monovalent S mRNA booster, which is largely consistent with the findings from our prior study in naïve mice. Although the bivalent S-EABR mRNA booster consistently elicited higher neutralizing titers than the conventional bivalent S mRNA booster, we agree with the reviewer that these improvements were modest and not statistically significant. Overall, neutralizing activity against later Omicron variants, such as BQ.1.1 and XBB.1 was low. We attributed this finding to immune imprinting (see response to point (1) above) and acknowledged that the EABR approach was not able to effectively overcome this effect (see discussion section of the paper, lines 537-558; and “Limitations of the study” section, lines 570-598 in the revised version).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      We thank the reviewer for their support and for the accurate summary and evaluation of our study.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

      We thank the reviewer for bringing up this important point. We agree that the variants used for this study are now outdated, and it would have been informative to evaluate conventional and EABR boosters against contemporaneous strains. However, as the reviewer correctly pointed out, this type of study requires a substantial amount of time to conduct and will therefore will likely always be outdated by the time the data are analyzed and prepared for publication. To accurately assess immune responses against recent or current strains in mice, multiple boosters would have been needed to mimic the pre-existing immune context in the human population in 2025. Assuming intervals of 6-7 months between boosters (as used in this study to mimic booster intervals in the human population as closely as possible), this type of study would have been challenging to conduct, especially given the limited lifespan of mice. Thus, we performed this proof-of-concept study using outdated variants to assess the potential of EABR-modified boosters. We greatly appreciate the reviewer’s understanding and acknowledge this limitation of our study, which is highlighted in the added “Limitations of the study” section in the revised version of the manuscript (lines 570-598).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym RBD in the title should be spelled out.

      We thank the reviewer for raising this point. We made this change in the revised version of the paper.

      (2) Lines 167-168 describe no differences between the cohorts at day 244. It should also be stated that for all timepoints, there are no significant differences.

      We modified the revised manuscript according to the reviewer’s suggestion (line 170).

      Reviewer #2 (Recommendations for the authors):

      (1) Given the focus on developing broad vaccines for future coronavirus outbreaks, it would be particularly informative to test whether the EABR antigens elicit broadened/heightened responses against other (beta)coronaviruses. If enough serum is left, it would seem straightforward to conduct neutralization assays against non-SARSCoV-2 coronaviruses.

      We thank the reviewer for this valid suggestion. Unfortunately, the extensive analysis of the serum samples, including spike and RBD ELISAs and neutralization assays against multiple variants, deep mutational scanning, and depletion assays, used up the serum samples for most mice. We agree that it would be interesting to investigate whether bivalent EABR boosters elicit pan-sarbecovirus responses in future studies.

      (2) In the bar plots for antibody titer changes, shown as log10 fold change, it is quite hard to interpret the difference between bars (e.g., what is the fold change difference between each bar in the same time point?). A table of mean {plus minus} SD values would be helpful.

      That’s a great suggestion. We added a table (Table S1) presenting all the geometric mean neutralization titers for all timepoints and variants in the revised version of the manuscript.

      (3) The development of heterotrimers as potential antigens is very interesting, but it seems out of place in the current manuscript. This should likely be in a separate, standalone manuscript.

      We thank the reviewer for commenting on the heterotrimer part of our manuscript. The presented work was not intended to advance the development of heterotrimers as potential antigens. Instead, our findings demonstrate that bivalent spike mRNA vaccines readily generate heterotrimers, which could promote intra-spike crosslinking and potentially impact antibody epitope targeting profiles as suggested by the deep mutational scanning data for the bivalent S-EABR mRNA booster (Fig. 4; Fig. S7-8). We think this is an important consideration that warrants further investigation with regards to the development of future bivalent or multivalent vaccines.

      (4) As a minor note, the sequences of the variants used or accession numbers should be provided in the Methods, since different groups have used different mutations for variants.

      We added the accession numbers for the vaccine strains used in this study (lines 604605).

    1. Reviewer #3 (Public review):

      Summary:

      In the paper "Deep mutational scanning reveals pharmacologically relevant insights into TYK2 signaling and disease", the authors perform a comprehensive deep mutational scan of the kinase TYK2, a protein of pharmacological interest due to its central role in multiple immune-related phenotypes. The study assesses two key functional phenotypes: protein abundance and IFN-α-dependent signaling. The signaling assays were conducted across a dose-response range under various inhibitor conditions, allowing for an in-depth characterization of TYK2 activity and regulation. Both the experimental design and data analysis were executed with rigor and transparency, yielding a dataset that appears highly reliable. The authors provide strong evidence and a scientifically grounded interpretation of their results.

      The paper presents the results of a deep mutational scan based on two assays: an IFN-α-stimulated signaling assay and a protein abundance assay. These measurements are further supported by variant classifications from AlphaMissense and ClinVar, providing a framework for functional interpretation. Building on these data, the authors propose four potential pharmacological applications of their screening system at the end of the first results section.

      First, they demonstrate that the combined analysis of abundance and IFN-α signaling identifies potential allosteric sites, focusing on variants with normal protein stability but reduced signaling activity. Through this approach, they detect two previously uncharacterized allosteric regions (Results Section 2).

      Second, they explore how the screen can be used to predict variant-specific drug responses or resistance mechanisms (Results Section 3). This is achieved through assays involving two different inhibitors, which reveal both resistance- and potentiation-associated variants.

      Third, they assess the relative functional consequences of ligand and inhibitor dosing by performing IFN-α and inhibitor dose-response experiments (1, 10, and 100 U/mL IFN-α; IC99 and IC75 inhibitor concentrations; Results Section 3).

      Finally, the authors investigate how specific human variants, such as P1104A and I684S, may inform therapeutic modality selection (Results Section 4). Although these variants exhibit no detectable effect on IFN-α signaling within this experimental system, they substantially impact protein abundance. By integrating data from the UK Biobank, the authors further demonstrate that protective effects against autoimmune disease are associated with altered protein abundance rather than differences in IFN-α signaling, highlighting the distinct mechanistic basis of TYK2's clinical relevance.

      Strengths:

      Overall, we found this paper rigorous, well-written, and easy to follow. As such, we think this is an exceptional example of a deep mutational scanning manuscript, and this dataset will be invaluable to the field. We particularly appreciate that the authors could explore sensitivity to inhibitor concentration across multiple doses of the inhibitor.

      Weaknesses:

      Despite the authors' rigorous experimentation and thoughtful interpretation, the study leaves several important mechanistic questions unresolved, as is common in any study. While the data provide clear functional patterns, the underlying biophysical and biochemical explanations remain insufficiently explored. For instance, in point 1, the identification of two novel allosteric sites is intriguing, yet the paper does not elaborate on the structural basis or mechanistic rationale for their regulatory effects. In point 2, resistance and potentiation variants are described for two distinct inhibitors, but it remains unclear why certain variants respond specifically to one compound and not the other. In point 3, higher inhibitor concentrations appear to diminish allosteric interactions, though the reasons why some sites are affected while others are not are left unexplained. Finally, in point 4, the observation that protein abundance, but not IFN-α signaling, correlates with autoimmune protection is compelling but mechanistically ambiguous. These gaps do not detract from the technical excellence of the work; rather, they highlight opportunities for future studies to clarify the molecular and pharmacological mechanisms underlying TYK2 regulation and to deepen the translational insights drawn from this comprehensive mutational scan. We hope that the authors could provide more direction and mechanistic context in the discussion section to guide readers toward these next steps.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive and precise comments, which have helped us improve the consistency and clarity of our manuscript. Below, we provide a point-by-point response to each comment. In summary, the main changes introduced in the revised version are as follows:

      (1) We replaced all the statistical analyses to their non-parametric equivalents to ensure compliance with test assumptions and consistency of the results;

      (2) We compare the participants’ reaction times before and during connected practice, revealing a significant reduction in reaction times of both partners when connected;

      (3) We added, in the supplementary materials, a table reporting the vigor scores of each participant in each experimental condition, facilitating the assessment of individual and dyadic behaviors;

      (4) We have reviewed and refined the terminology throughout the manuscript and reduced the number of abbreviations to improve clarity.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      We thank the reviewer for their in-depth analysis and constructive assessment of our manuscript.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      We understand and appreciate the reviewer’s concern regarding the effective sample size at the dyad level (n=10). While our primary analyses focus on dyad-specific interactions, we note that the reported effects are consistent across multiple dynamic conditions and are associated with large effect sizes. To provide a conservative assessment the Cohen’s D values reported correspond to the smallest effect size observed across the relevant statistical tests, thereby limiting the risk of false positives or overinterpretation. In addition, to ensure robustness given the sample size and distribution properties of the data, we have replaced all parametric tests with their non-parametric counterparts, as some analyses violated ANOVA assumptions. Friedman and Kruskal-Wallis tests are now used for paired and unpaired main effects respectively, and Wilcoxon and Mann-Whitney tests for paired and unpaired post-hoc comparisons respectively. Note that these changes did not alter the conclusions of the study.

      (a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      We agree with this comment. However, we note that the distribution of vigor scores within a population is typically centered around 1, with large deviations observed only for the fastest and slowest participants [1]. As a result, the distri bution of ∆-vigor is inherently skewed. Correcting for this skewness would (i) require pairing participants based on their vigor, which is logistically difficult, and (ii) lead to an atypical sampling of dyads, with an over representation of pairs exhibiting very large vigor differences. The distributions of vigor scores for the fast and slow groups before and after the interaction are reported in Supplementary Fig. S21. In addition, as suggested by the reviewer, we have now included Table S.1 in the supplementary materials, listing the values VigorFast, VigorSlow, and VigorCombined for each of the 10 dyads. This table provides a complete view of the evolution of participant’s vigor throughout the experiment.

      (b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      We initially expected the magnitude of difference in individual vigor within a dyad to play a significant role. However, our analysis did not reveal any systematic effect of ∆-vigor on either the interaction force or the resulting dyadic vigor, as shown by the LMM analysis. Importantly, the interactive adaptation hypothesis does per se imply that the magnitude of vigor differences between the two partners should matter, only that their respective roles in selecting the adapted behavior is different. Although the model includes several free parameters, we did not attempt to fit it to individual dyads as would in principle be possible. Instead, we performed a sensitivity analysis to assess how variations in the difference in vigor between the partners influence model predictions. For this purpose, we simulated increasing values of µ and variations in the fast partner’s cost of time. In addition, we demonstrated that uncertainty in the estimated behavior of the slow partner, which is a priori specific to each individual, has a substantial impact on the optimal movement duration of the dyad. Overall, this analysis shows that the model captures the full range of qualitative trends observed in the experimental data. When applied to predict the behavior of the average dyad, the resulting movement time prediction error remain small, as detailed in the Results section.

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

      We thank the reviewer for this interesting question, that prompted us to extend our analysis of reaction times to the connected conditions. This additional analysis revealed a significant main effect of the condition on the reaction time for both the fast and slow groups (in both cases: W<sub>2</sub> > 0.39, p < 0.02). Post-hoc comparisons showed a significant reduction in reaction time between the initial null-field block (NF1) and the KH condition for the slow group (p = 0.03, D = 1.46), and a similar trend for the fast group (p = 0.06, D = 1.03). However, the reaction times remained comparable between the two groups, with no significant difference between them. We have incorporated these observations in the Results section (p.4, l.100–109) and expanded the Discussion (p.11, l.341–348) to address their implications for interactive adaptation in human-human and human-robot physical interactions.

      Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner’s vigor rather than by the faster partner’s, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner’s vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      We thank the reviewer for their thorough analysis of our manuscript and their constructive feedback.

      Weaknesses:

      (1) A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner’s vigor, the model formulation appears to emphasize the faster partner’s time-related cost and interaction forces. Although the cost function includes an uncertaintyrelated component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner’s control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner’s control architecture.

      We have modified our phrasing to clarify the principles according to which the computational framework was designed (p.7, l.226–231 and p.9, l.260–264). As stated in the Results section, the model is indeed asymmetric by design, which corresponds to the different roles of the fast and slow partner exhibited in the data. In that context, the uncertain term associated with the slow partners should be understood as an overarching constraint that conditions the strategy of the dyad, while the fast partner cost of time acts as a contributor to the expected dyad strategy. Conceptually and numerically as reported in the sensitivity analysis, this asymmetry corresponds to the role of the slow partners in setting the vigor ranking among the dyads and the role of the fast partner in setting the average dyadic behavior.

      (2) A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

      We agree with the reviewer that this terminology required clarification. In this paper, the term “motor plan” refers to the time series of control inputs planned by the CNS, rather than solely to kinematic descriptors such as speed or duration. These planned control signals are a direct consequence of the underlying optimization structure and cost functions that govern trajectory generation. We have clarified this definition in the Introduction (p.1, l.23–24).

      Reviewer #3 (Public review):

      Strengths:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Thank you for this analysis and the insightful feedback.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners’ individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      We thank the reviewer for this thoughtful comment. We deliberately used different statistical approaches throughout the paper in order to address different types of questions. Note that the statistical tests were converted to their nonparametric equivalent for consistency (see answer to Reviewer 1).

      - Friedman tests were used in a limited number of cases to assess population- or group-level effects, such as differences in movement time, smoothness, or accuracy across the solo, connected, and after-effects conditions. Such tests provide a straightforward framework for these descriptive, condition-level comparisons.

      - The stability of individual and dyadic vigor scores across conditions was assessed using Pearson correlations across all condition pairs, which we consider the most direct and interpretable approach for evaluating consistency across sessions.

      - LMMs were employed to examine how dyadic vigor relates to the partners’ individual vigor measured in the solo conditions, which revealed the critical contribution of the slow partner.

      Rather than applying a single statistical framework throughout, we selected the method best suited to each question. While LMMs are well suited for modeling participant-specific variability when linking individual and dyadic measures, their systematic use in all analyses would be less intuitive and would not directly address several of the population-level comparisons central to this study.

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by interindividual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      We thank the reviewer for this interesting question. We first note that the notion of reward in motor control is quite broad. Although our task did not include explicit external (e.g. monetary) rewards, we assumed that participants attribute an implicit value to completing the task in accordance with the experimenter’s instructions. This assumption has been shown to be appropriate for characterising baseline behavior in previous studies [2–5].

      As discussed in the Introduction, vigor is generally understood to emerge from a tradeoff between effort, accuracy, and time. The reviewer is correct in noting that inter-individual differences in vigor may reflect differences in reward sensitivity or in its discounting [3,6], given that time and reward are intrinsically coupled. Differences in vigor may also arise from inter-individual variability in sensitivity to effort or perceived task difficulty. Because these factors are intertwined—for example, increasing accuracy through co-contraction typically incurs greater effort [7])—it is challenging to disentangle their respective contributions based solely on behavioral data.

      In the present study, our inverse optimal control procedure to identify the cost of time (and thus predict individuals’ vigor) relies on a predefined effort-accuracy tradeoff under fixed final time across multiple movement amplitudes [8]. As a result, the model does not allow us to independently estimate individual sensitivities to effort, accuracy, and time. Such characterization of computational "phenotypes" would likely require experimental paradigms in which each of these factors is systematically manipulated while the others are held constant, which is beyond the scope of the current dataset. In practice, the main value of behavioral modeling lies in revealing the relative weighting of these criteria by the CNS during motor planning [5]. We have expanded the Discussion to clarify these limitations and considerations (see Discussion p.12, l.396–401 & l.407–412).

      Finally, we chose not to emphasize these broader issues in the present manuscript because (i) they are peripheral to our primary research question on how individual vigor influences human-human interaction, and (ii) although we do not yet have definitive and consensual answers, they have been addressed in multiple studies reviewed elsewhere [9,10].

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      We also find that the preservation of accuracy and smoothness despite changes in vigor is an interesting result, and we therefore chose to report these measures in the Supplementary Materials. However, we believe it is preferable not to include them in the main figures for the following reasons:

      - We avoid framing our results in terms of a speed-accuracy trade-off, as Fitts’ work was initially designed to study fast movements [11], whereas our work focuses on self-paced movements. As outlined in the Introduction, vigor is more appropriately interpreted as reflecting a tradeoff between effort (related to movement speed), accuracy, and time. From this perspective, the reported changes of vigor already capture a shift in the underlying trade-off selected by the CNS, using a framework better suited to our experimental paradigm.

      - The manuscript is technically dense and reports multiple analyses that are essential to establish (i) the existence and definition of dyadic vigor, and (ii) how it emerges from interaction between partners. Although the observed preservation of accuracy and improvements in smoothness are informative, they are not central to these two primary questions and would risk diverting attention from the core contributions of the paper. In addition, accuracy is not a feature predicted by our deterministic modeling and extensions would be needed to capture these aspect. Here we only attempted to replicate average behaviors.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

      We thank the reviewer for this comment, which prompted us to verify the assumptions underlying our ANOVAs. We found that a few distributions in the original analysis, as well as in some of the new tests, did not meet these assumptions. To ensure consistency, all statistical analyses have now been replaced with non-parametric tests: Friedman and Kruskal-Wallis tests for paired and unpaired main effects, Wilcoxon and Mann-Whitney tests for paired and unpaired post-hocs. The updated results do not change any of the conclusions. the only minor change is accuracy, that appeared slightly improved in a restricted number of connected conditions, and now appears mostly non-impacted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) Lines 146-147. The authors state, "Whereas the fast partners maintained a similar duration". Figures S6H,I suggest that fast partners made slower movements during the paired task relative to the solo task, not movements with a similar duration.

      We agree that Fig. S.6H,I suggest slightly slower movements for the fast partners, though not significant. We have modified the sentence to be less assertive than in the previous version (see p.6, l.155).

      (2) In the Discussion (Lines 318-319), the authors state that their findings confirm and extend the "benefits of dyadic control in collaborative actions". What benefits are they referring to here, relative to individual control? It would be helpful if the authors would elaborate on this claim.

      We have modified this sentence to clarify that the benefits of dyadic control refer to previously reported advantages over individual control, namely reduced movement time Reed and Peshkin (2008) [12] and improved tracking accuracy [13,14] (see p.11, l.336–337).

      (3) On Lines 87-89, the authors reference a decomposition of variance of vigor scores across the NF1, VL, and VH conditions; however, I did not see an explanation of how this decomposition was performed. The method used to estimate variance explained by inter-individual vs intra-individual differences in vigor should be outlined for the reader.

      Thank you for pointing out this missing information. We now explain in the statistical analysis section (see p.14, l.504–507), that the percentage of inter-individual variability in vigor is estimated using sum-square values as an estimation of inter- and intra-individual variability.

      (4) How was the absolute interaction torque for a paired movement calculated? Was it an integral of the temporal profile of torque for some portion of the combined movement? The method for calculating the absolute interaction torque needs to be specified.

      We have now clarified in the Methods (see p.14, l.490–491) that the reported average interaction effort was computed as the absolute value of the interaction torque as a function of time averaged over the entire movement.

      (5) Lines 123-124: "... interaction torque showed no significant correlation with differences in individual vigor within dyads." This statement should be supported by appropriate statistical measures.

      This result is now supported by reporting the corresponding Pearson correlation analyses. No significant correlations were found between interaction torque and differences in individual vigor within dyads (KL conditions: |r| < 0.43, p> 0.22; KH conditions: |r| < 0.18, p > 0.61, see p.5, l.132–133).

      (6) For the analysis, presented in Figure 3C, and specified on lines 116-123, the text mentions the main effects of both condition and target. There doesn’t appear to be much of an effect of the target for the KH data. Should these results not be reported as an interaction effect between the two factors instead?

      We agree with the reviewer and have corrected our presentation of these results (see p.4, l.126–128). Consistent with the reviewer’s observation, no significant effect of the target is found in the KH condition.

      (7) Figures 3E and S6B. What is the purpose of including the averaged data for each pair in addition to both individuals’ data from each pair? It would be useful to distinguish the individual data from the average data for each pair. Frankly, the number of data points shown on this sub-figure is excessive.

      There may have been a misunderstanding. Because the partners of a dyad are connected by a virtual elastic band (rather than a rigid bar), they do not execute identical movements. Therefore Figs. 3E,S6B display the movement time of all individual participants, together with the corresponding 20 individual regression lines, like in Fig. 2B. The solid black line represents the average across all individuals, and the averaged behaviors of dyads are not included. We have clarified this point by revising the caption of Fig. 3E (see p.5).

      Noted mis-spellings:

      Figure S.3A caption: "trials towards this target."

      Page 10 Line 313: "Importantly, these findings show ...".

      These mis-spellings have been corrected at supplementary p.2 and main text p.11, l.331. Thank you!

      Reviewer #2 (Recommendations for the authors):

      (1) To illustrate the contribution of the three components used to calibrate the overall cost function, it would be informative to include simulation analyses in which each component is selectively removed (i.e., ablation analyses).

      We did not perform ablation analyses, as selectively removing components of the model can lead to instability or ill-suited control inputs, making the resulting simulations difficult to interpret. Instead, we conducted a sensitivity analysis of the key parameters shaping the overall cost function, including the estimated mean and deviation of the slow partner’s movement duration, the weight associated with uncertain torque minimization (Figs. S.18,S.19), and the fast partner’s cost of time (Fig. S20). This analysis reveals the predominant roles of the estimated slow partner movement patterns in determining the model predictions, in agreement with our experimental observations.

      (2) Although the authors refer to the motor-off condition as "passive," participants actively generated the movements in the absence of external forces. Thus, this condition corresponds to active, unassisted movement. A different term may therefore reduce potential confusion for readers.

      We agree that term “passive” was not well-chosen given the context of the paper, thus we have instead replaced this denomination as “null-field” condition. Consequently, the P1 and P2 blocks are now referred to as NF1 and NF2.

      (3) Please clarify the instructions given to participants. Were they informed in advance that their movements would physically interact with those of their partner?

      Thank you for pointing out this missing clarification. We have now specified in the Methods (p.14, l.465–469) that participants were not informed prior to any condition that they would interact with a human partner; they were only told that the robot would provide assistance. When debriefed at the end of the experiment, only one out of the 20 participants reported having realized that they were connected to another human. Most participants believed they were interacting either with a version of themselves or with a robot with some randomness.

      (4) Line 475. Should "Fig. 2D" be "Fig. 2B"?

      Thank you for catching this error. The reference has been corrected to Fig. 2B (see p.15, l.522).

      Reviewer #3 (Recommendations for the authors):

      (1) The analysis of reaction times shows no difference between groups in the passive block, which challenges the assumption that movement vigor covaries with decision speed or action initiation speed. It may be worth discussing this in the context of recent literature.

      We agree that the initial analysis and discussion of reaction times were too superficial. In the revised manuscript, we now report that dyadic interaction leads to significantly shorter reaction times (p.4, l.100–109), concomitantly with improved movement velocity. We have also expanded the Discussion, on the relationship between decision and action speeds/durations (p.11, l.340–348).

      (2) Many abbreviations are unusual for a non-expert. I would recommend using the full terms instead. At least initially, I found it difficult to follow the results because the abbreviations were not immediately clear (at least to me).

      We agree that the paper had to many abbreviations. Therefore, we have removed the abbreviated names of the models and, when possible without impacting the readability, used the full names of the conditions.

      (3) Relatedly, the notation in Figure 1 may be confusing. The labels "S" and "F" (slow and fast) correspond to different concepts than "F" and "L" (follower and leader), so the same participant could be labeled "F" as fast but not "F" as a leader.

      Thank you for pointing out this potential source of confusion. We have therefore modified Fig. 1A (p.2) to avoid any potential confusion by using the full model names rather than abbreviations. In the remainder of the manuscript, "S" and "F" exclusively denote the slower and faster partners within a dyad, and we do not use abbreviations for "leader" or "follower" in the text.

      (4) In figures like 2.C and 3.I, keeping the same scales on the x and y axes and adding a diagonal reference line would make it easier to see shifts across conditions.

      As explained in the Methods, vigor scores in the low- and high-viscosity conditions were computed using the average movement durations from the NF1 condition as a reference. Consequently, because movements are slower in these conditions, the corresponding vigor values are lower than those in NF1. For this reason, using identical scales on the x- and y-axes and adding a 45◦ reference line could mislead the reader in thinking that the vigor scores are expected to be identical and reduce the readability of the figure.

      (5) Multiple hypotheses about dyadic regulation of vigor are nicely explained; it could help to indicate if any of these were a priori favored based on prior literature.

      Previous literature provides mixed evidence regarding how vigor might be regulated in dyadic interaction. For instance, Takagi et al. (2016) [15] reported that mechanically connected partners may rely on independent motor plans, which corresponds to the co-activity hypothesis considered here. However, in that study, movement duration was prescribed. We therefore expected that removing this constraint on movement duration could allow coordination strategies to emerge, particularly in view of findings on haptic communication during tracking of random targets while connected via an elastic band [13,14].

      At the same time, a large body of work on human–human and human–robot interaction has interpreted coordination through a leader–follower framework. In our context, vigor is understood as the outcome of a tradeoff between effort and elapsed time, with time being associated with a decaying reward. Based on this framework, we hypothesized a priori that a leader–follower scheme would emerge, in which the fast partner—being more sensitive to time costs and/or less sensitive to effort—would tend to drive the interaction, even at the expense of increased effort. For these reasons, the leader–follower hypothesis was formulated as the expected outcome throughout the manuscript.

      (6) In the introduction, statements such as "relative vigor of an individual is remarkably stable" appear true only in the solo condition. The same is true in the discussion where it is said that vigor is a stable trait. The whole study show that an individual can shift his/her vigor to the same vigor of another individual, so it doesn’t appear stable to me in such conditions but adaptable.

      Let us first clarify that when we describe vigor as “remarkably stable”, we do not imply that individuals do not adjust their movement timing in response to changes in external dynamics. For example, movement durations increase in visco-resistive conditions even during solo performance; nevertheless, individuals who move faster in the absence of resistance will remain faster relative to others when resistance is introduced. In this sense, stability refers to the preservation of relative rankings across conditions, rather than invariance of absolute movement timing. Because interaction with another individual constitutes a substantial change in task dynamics, an effect on individual pace is therefore expected.

      Told that (and as pointed to by the reviewer) (i) dyadic interactions lead to the emergence of a dyadic vigor characterized by average movement durations close to those of the fast partners, while the ranking across dyads is largely imposed by the slow partners; and (ii) these adaptations persist after the interaction phase. Importantly, the observed vigor adaptations appear to last longer in our physical interaction task than in previous attempts to manipulate vigor using visual feedback [16]. To account for this adaptability of vigor, we have (i) clarified claims in the Introduction regarding the stability of vigor (see p.1, l.18–20), and (ii) expanded the Discussion to more explicitly address vigor adaptability and the possible resulting consequences for the concept of vigor (see p.12, l.407–412).

      References

      (1) O. Labaune, T. Deroche, C. Teulier, and B. Berret, “Vigor of reaching, walking, and gazing movements: on the consistency of interindividual differences,” Journal of Neurophysiology, vol. 123, pp. 234–242, jan 2020.

      (2) L. Rigoux and E. Guigon, “A model of reward-and effort-based optimal decision making and motor control,” PLoS Computational Biology, vol. 8, pp. 1–13, Jan. 2012.

      (3) R. Shadmehr, J. J. O. de Xivry, M. Xu-Wilson, and T.-Y. Shih, “Temporal discounting of reward and the cost of time in motor control,” Journal of Neuroscience, vol. 30, pp. 10507–10516, aug 2010.

      (4) B. Berret and G. Baud-Bovy, “Evidence for a cost of time in the invigoration of isometric reaching movements,” Journal of Neurophysiology, vol. 127, pp. 689–701, feb 2022.

      (5) D. Verdel, O. Bruneau, G. Sahm, N. Vignais, and B. Berret, “The value of time in the invigoration of human movements when interacting with a robotic exoskeleton,” Science Advances, vol. 9, sep 2023.

      (6) K. Jimura, J. Myerson, J. Hilgard, T. S. Braver, and L. Green, “Are people really more patient than other animals? evidence from human discounting of real liquid rewards,” Psychonomic Bulletin & Review, vol. 16, pp. 1071–1075, dec 2009.

      (7) P. L. Gribble, L. I. Mullin, N. Cothros, and A. Mattar, “Role of cocontraction in arm movement accuracy,” Journal of Neurophysiology, vol. 89, pp. 2396–2405, may 2003.

      (8) B. Berret and F. Jean, “Why Don’t We Move Slower? The Value of Time in the Neural Control of Action,” Journal of Neuroscience, vol. 36, pp. 1056–1070, Jan. 2016.

      (9) R. Shadmehr and A. A. Ahmed, Vigor : neuroeconomics of movement control. The MIT Press, 2020.

      (10) D. Thura, A. M. Haith, G. Derosiere, and J. Duque, “The integrated control of decision and movement vigor,” Trends in Cognitive Sciences, vol. 29, pp. 1146–1157, Dec. 2025.

      (11) P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement,” Journal of Experimental Psychology, vol. 47, pp. 381–391, June 1954.

      (12) K. B. Reed and M. A. Peshkin, “Physical collaboration of human-human and human-robot teams,” IEEE Transactions on Haptics, vol. 1, pp. 108–120, July 2008.

      (13) G. Gowrishankar, A. Takagi, R. Osu, T. Yoshioka, M. Kawato, and E. Burdet, “Two is better than one: physical interactions improve motor performance in humans,” Scientific Reports, vol. 4, Jan. 2014.

      (14) A. Takagi, G. Ganesh, T. Yoshioka, M. Kawato, and E. Burdet, “Physically interacting individuals estimate the partner’s goal to enhance their movements,” Nature Human Behaviour, vol. 1, pp. 1–6, Mar. 2017.

      (15) A. Takagi, N. Beckers, and E. Burdet, “Motion plan changes predictably in dyadic reaching,” PLOS ONE, vol. 11, p. e0167314, Dec. 2016.

      (16) P. Mazzoni, B. Shabbott, and J. C. Cortes, “Motor control abnormalities in Parkinson’s disease,” Cold Spring Harbor Perspectives in Medicine, vol. 2, pp. a009282–a009282, Mar. 2012.

    1. Author response:

      Common responses:

      We thank the editors for considering our paper and the reviewers for their thoughtful and detailed feedback. Based on the comments, we will revise our manuscript to better describe how our approach differs from modeling strategies that are common in the field. We also aim to elaborate on the advantages of fastFMM and what scientific questions it is designed to answer. Finally, we will provide more background on our example analyses and the interpretation of the results.

      Within this response, “within-trial timepoints”, “time-varying predictors/behaviors”, and “signal magnitude” are used as specific examples of the general concepts of functional domain”, “functional co-variates”, and “functional outcome”, respectively. To make statements or examples more concrete, we may use the former neuroscience-specific terms when making general claims about functional models.

      - ncFLMM, cFLMM: non-concurrent or concurrent functional linear mixed models.

      - FUI: fast univariate inference. An approximation strategy to perform FLMM Cui et al. (2022).

      - fastFMM the R package that implements FUI.

      - CI confidence interval.

      Before specific line-by-line responses, we provide a brief comparison between cFLMM and fixed effects encoding models. All three reviewers suggested that fixed effects models could be an existing alternative to cFLMM (Reviewer 1 (1B), Reviewer 2 (2C), Reviewer 3 (3A)). Their shared comments highlight that our revision should articulate the advantages and applications of cFLMM relative to existing analysis strategies.

      Functional regression methods like cFLMM produce functional coefficient estimates that quantify how the magnitude of predictor-signal associations evolve across an ordered functional domain such as within-trial timepoints. Standard scalar outcome regression methods, like the GLMs specified in Engelhard et al. (2019), model these associations and their corresponding coefficients as fixed across the functional domain. While GLM encoding models may include time-varying predictors, these analysis strategies do not model the predictor–signal association as changing over the functional domain.

      Moreover, encoding models are less suited to hypothesis testing in clustered or longitudinal settings (e.g., repeated-measures datasets) and yield regression coefficient estimates that are only interpretable with respect to the units of the basis functions. In contrast, cFLMM provides time-varying coefficient estimates that are interpretable as statistical contrasts in terms of the original variables and produces hypothesis tests in clustered settings. cFLMM can be applied to datasets that define covariates in terms of the same flexible representations of covariates used in encoding models; this is a modeling choice rather than a methodological characteristic.

      The remainder of this provisional author response will respond to reviewers’ concerns line-by-line, approximately in the order they appear.

      Reviewer #1 (Public review):

      We thank Reviewer 1 for their comments, especially their efforts to provide first-hand experience with loading and applying fastFMM. We hope that recent improvements to fastFMM’s public release and vignettes address Reviewer 1’s concerns about ease-of-use.

      (1A) Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      We believe the reviewer may have experimented with an old version of fastFMM, so their experience may not reflect recent rewrites and improvements. fastFMM v1.0.0+ is now stable, validated on CRAN, and contains new example data and step-by-step tutorials. We designed fastFMM’s model-fitting code to be similar to common GLM packages in R to reduce the learning curve for new users.

      (1B) …a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help.

      We will provide a clearer description of existing methods in the revised manuscript. Briefly, inference with fastFMM can accommodate large datasets that contain clustered data, repeated measures, or complex hierarchical effects, e.g., experiments with multiple animals and multiple trials per animal. When encoding models are fit to each cluster (e.g., animal, neuron) separately, we are not aware of a principled method to pool these cluster-specific models together to quantify uncertainty or yield an appropriate global hypothesis test.

      Reviewer #2 (Public review):

      Reviewer 2’s thoughtful feedback helped structure our points in the common response above, which we will refer to when applicable. In our response, we aim to clarify the problems that cFLMM solves and characterize the advantages in interpretability.

      (2A) The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      We hope that the common response addresses these concerns. We were motivated to provide a concurrent extension of fastFMM based on our experience with statistical consulting in neuroscience research. Questions that benefit from a functional approach are common and often not adequately modeled with a non-concurrent approach, such as the variable trial length analysis we describe below.

      (2B) It is less clear that this approach makes sense for variables that change within trial…This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modeled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      We thank Reviewer 2 for highlighting a point that we did not adequately explain and that we will address further in the revision. The pointwise and joint CIs estimated by fastFMM account for uncertainty in the coefficient estimates due to variation in the predictors across within-trial timepoints. cFLMM targets a statistical quantity, or estimand, that is defined by trial timepoint specific effects, so the first step of our estimation strategy fits separate pointwise mixed models. However, models from every within-trial timepoint are then combined to calculate uncertainty and smooth the coefficient estimates. Thus, the widths of the pointwise and joint CIs depend on the estimated between-timepoint covariance and a smoothing penalty. Loewinger et al. (2025a) provides further details in Appendices 2 and 3, describing the covariance structure and detailing the power improvements of FUI compared to multiple-comparisons corrections.

      Other functional regression estimation strategies jointly fit the entire model with a single regression, e.g., functional generalized estimating equations Loewinger et al (2025b). However, these methods use basis expansions of the coefficients. In contrast, the encoding models mentioned in 2C below and Reviewer 3 (3A) apply basis-expansions of the covariates, and the resulting model does not capture how signal–covariate associations evolve across some functional domain. Although the first stage in the fastFMM approach fits pointwise linear models, this is only one of three steps in the estimation strategy. fastFMM yields coefficient estimates comparable to those that would be obtained from functional regression estimation strategies that jointly estimate the functional coefficients in a single regression. We mention this to distinguish between the target statistical quantity (functional coefficients) and the estimation strategy (pointwise vs. joint).

      (2C) …an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      Our understanding is that the suggested approach aims to quantify the association between the outcome and within-trial patterns in covariates. This is a great question and we will incorporate a discussion of this into the revision. However, temporal basis functions convolved with the covariate time series cannot directly characterize these relationships. Encoding models can detect the contribution of predictors to neural signals while remaining agnostic to the precise relationship, but this flexibility can come at the cost of interpretability. The coefficients of the convolutions may not be translatable into a clear statistical contrast in terms of the original covariates.

      In our paper, we provide examples of cFLMM models with simple signal-covariate relationships. The coefficient estimates quantify the expected change in signal given a one unit change in the original predictors. Let 𝑌(𝑠) be the outcome and 𝑋(𝑠) be some covariate at within-trial timepoint 𝑠. For brevity, we will suppress subject/trial indices and random effects in the following notation. The coefficient at time point 𝑠 can be captured by the generic mean model

      𝔼[𝑌(𝑠) ∣ 𝑋(𝑠) = 1] − 𝔼[𝑌 (𝑥)|𝑋(𝑠) = 0].

      In contrast, the change in signal associated with patterns in within-trial covariates can be written as

      𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 1] − 𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 0]

      for all pairs of timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. While simple lagged or offset outcome-predictor associations can be incorporated as covariates in cFLMM, the approach does not capture all within-trial timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. Encoding models also do not target the above estimand. Instead, a full function-on-function regression could estimate the above. This topic can be incorporated into our revision and may be a future line of inquiry.

      (2D) In the Machen et al. data…From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials.

      In this experiment, mice waited in a trigger zone, ran through a linear corridor, then received a food reward in the reward delivery zone of either water or strawberry milkshake Machen et al. (2026). Mice received different rewards between sessions but the same reward within all trials of a given session. This design complicated the analysis, as the reward type produced prominent differences in average latency (water: 3.3 seconds, milkshake: 2.0 seconds). The authors wanted to disentangle whether mean differences in the signal across reward types reflected differences in motivation to obtain the reward or differences in reaction to reward receipt.

      We agree that performing a reward-aligned analysis would be an intuitive approach to visualize the differences in average signal for mice that received milkshake compared to water. In fact, we provide a ncFLMM reward-aligned analysis in Figure S1 of Machen et al. (2025). We will add this analysis to the revision and thank the reviewer for the suggestion. We emphasize, however, that this method answers a different question. It does not identify how the signal change associated with receiving the milkshake evolves with respect to latency, especially if the relationship is non-linear. Time warping faces similar obstacles in this setting, especially since sufficiently flexible curve registration can induce similarity due purely to noise. Generally, time warping does not lend itself to hypothesis testing as it is unclear how to propagate uncertainty from the time warping model into final hypothesis tests.

      We believe cFLMM is an appropriate choice for the specific question, and we will revise the manuscript to better reflect its advantages. The functional coefficient estimates in Figures 3C-iii and 3C-iv provide insights that are not possible to derive from the proposed alternatives. For example, we can infer that for short latencies, we do not see a significant difference in signal magnitude for mice receiving water and mice receiving the milkshake. However, for latencies longer than around 2 seconds, receiving the milkshake is associated with an additional positive change in signal. We agree that we should make Figure 3C and the accompanying discussion more clear and thank Reviewer 2 for their feedback on interpretation.

      Reviewer 3 (Public review):

      (3A) …it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other…

      We assume Reviewer 3 is referencing “Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons Engelhard et al. (2019). We hope that the Common response sufficiently contrasts the settings where each approach can be applied. Because these models have different goals and assumptions, they are appropriate for answering different questions.

      (3B) In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary “reward zone vs corridor” (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.

      Thank you for pointing out that we were not clear. This was mentioned by multiple reviewers and highlights the need to elaborate on our motivation in the revision. In this example, we wanted to investigate the change in signal-reward association as a function of within-trial timepoints, not the association between instantaneous velocity and the signal. “Slow” or “fast” means “mouse with below or above average latency”. We ask you to please refer to Reviewer 2 (2C) where we discuss why event alignment is an insufficient correction.

      The functional coefficient estimates in Figure 3C are interpreted as contrasts because the fixed effect coefficients capture the difference in expected signal between strawberry milkshake and water along the functional domain. An advantage of cFLMM is that it is easy to specify models in which the coefficients correspond to interpretable contrasts of the signal across conditions. The coefficient estimate shown in Figure 3B-ii also corresponds to a contrast because the estimates capture the difference in mean signal from strawberry milkshake and water. Equations (7) and (8) in the section “Materials and methods” and sub-section “Variable trial length analysis” provide additional details on the fixed effect coefficients. Based on this confusion, we will convert the two 1 x 4 sub-plots of 3B and 3C into two 2 x 2 sub-plots to avoid unintended direct comparisons.

      To contextualize how we “acknowledge the interpretational difficulties of [our] analysis”, we stated that a non-concurrent FLMM attempting to control for a time-based covariate is difficult to interpret. The concurrent FLMM provides a straightforward interpretation directly related to the question of interest, which we discuss above in Reviewer 2 (2D).

      (3C) Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.

      Thank you for this suggestion. All three reviewers raised this topic (see Reviewer 1 (1B), Reviewer 2 (2C), and the Common responses), and we will incorporate our response in the revision.

      (3D) From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.

      This is an important point that we mentioned implicitly. In our cFLMM specification of the Jeong et al. (2022) model, “we incorporated trial-specific covariates for trial number and session, modeling these as increasing numerical values rather than identical categorical variables”, which are also plotted in Appendix 3. In Box 1, “if the functional covariate of interest is a scalar constant across the domain, the models fit by the concurrent and non-concurrent procedure are identical”. We will explicitly point out that cFLMM can perform inference on combinations of functional and constant covariates.

      (3E) The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.

      Prior to our work described in this Research Advance, it was not obvious that the existing approximation approach in fastFMM could be generalized to cFLMM. During the writing of the article, a fastFMM user reached out for help with producing pseudo-concurrent FLMMs by duplicating rows in a nonconcurrent model, which both underscores the unmet need for cFLMMs and the difficulty in fitting them with available tools.

      The “under-the-hood” differences are described in Appendix 4. Concurrent FLMM with fast univariate inference was theoretically possible as early as Cui et al. (2022). The univariate step was straightforward, but guaranteeing “fast” and “inference” was not. We needed to verify, for example, that the method-of-moments estimation of the random effects covariance matrix generalized to cFLMM, which is not a trivial step. Characterizing whether the method achieved asymptotic coverage required extensive simulation studies (Figure 4, Appendix 2). Future work may focus on fully characterizing the asymptotic convergence in high noise or high complexity regimes.

      (3F) This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

      We hope that the Common responses clarifies how cFLMM compares to existing approaches and fills a gap in the data analysis landscape for neuroscience. The fastFMM R package vignettes contain example analyses, and we intend for these files to be work in tandem with the manuscript. To provide more guidance for interested analysts, we can explicitly reference these tutorials within the revision.

      Planned revisions

      The following summary is not exhaustive.

      Writing additions:

      Per 1B, 2C and 3A, the Common responses will be incorporated in the revision.

      Per 2B, we will discuss function-on-function regression and explore how to estimate statistical contrasts for complex within-trial relationships. Relatedly, we will clarify that the CIs in fastFMM are constructed using an estimate of the within-trial covariance of the predictors, and clarify the definition of pointwise and joint CIs.

      Per 3D, we will explicitly state that concurrent FLMMs can include covariates that are constant over within-trial timepoints.

      Though we cannot prescribe a universally correct model selection procedure, we will mention that AIC, BIC, and other summary statistics can inform the specification of the random effects.

      Analysis modifications:

      Parts of Appendix 3 may be included in Figure 2 to directly address the question investigated by Jeong et al. (2022) and Loewinger et al (2024).

      When discussing Machen et al. (2025) data, the supplementary analysis with reward-aligned ncFLMM models might be added to clarify the ncFLMM/cFLMM difference.

      Per \ref{rvw2:encoding}, the additional analysis aimed at disentangling latency and reward in Machen et al.’s variable trial length data may be incorporated as an additional sub-figure in Figure 3.

      Aesthetic changes:

      Figure 3 will be reorganized to avoid unintended direct comparisons between the coefficients of the non-concurrent and concurrent model.

      Citations for Machen et al. (2026) will be updated to reflect publication of the preprint.

      The version number for fastFMM will be updated.

      References

      Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics. 2022; 31(1):219–230. https://doi.org/10.1080/10618600.2021.1950006, doi: 10.1080/10618600.2021.1950006, pMID: 35712524.

      Engelhard B, Finkelstein J, Cox J, Fleming W, Jang HJ, Ornelas S, Koay SA, Thiberge SY, Daw ND, Tank DW, Witten IB. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019 Jun; 570(7762):509–513. https://www.nature.com/articles/s41586-019-1261-9, doi: 10.1038/s41586-019-1261-9.

      Jeong H, Taylor A, Floeder JR, Lohmann M, Mihalas S, Wu B, Zhou M, Burke DA, Namboodiri VMK. Mesolimbic dopamine release conveys causal associations. Science. 2022; 378(6626):eabq6740. https://www.science.org/doi/abs/10.1126/science.abq6740, doi: 10.1126/science.abq6740.

      Loewinger G, Cui E, Lovinger D, Pereira F. A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments. eLife. 2025 Mar; 13:RP95802. doi: 10.7554/eLife.95802.

      Loewinger G, Levis AW, Cui E, Pereira F. Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets. ArXiv. 2025 Jun; p. arXiv:2506.20437v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC12306803/.

      Machen B, Miller SN, Xin A, Lampert C, Assaf L, Tucker J, Herrell S, Pereira F, Loewinger G, Beas S. The encoding of interoceptive-based predictions by the paraventricular nucleus of the thalamus D2R+ neurons. iScience. 2026 Jan; 29(1):114390. doi: 10.1016/j.isci.2025.114390.

    1. Author response:

      Reviewer 1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      We appreciate the reviewer's concern, but would like to clarify two important misunderstandings in this assessment.

      First, the reviewer suggests that our SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, and that IBD inference may therefore be biased toward haplotypes common between the Himba and Yoruba. This is not the case. Our SNP genotype data were generated from the H3Africa and MEGAex genotyping arrays, which incorporated diverse reference variation to minimize ascertainment bias in non-European ancestries. No read mapping to a Yoruba reference genome was involved in SNP discovery or genotyping. The Yoruba 1000 Genomes data were used solely to provide an ancestry-matched recombination map for phasing and IBD calling–this would not bias IBD inference toward common Yoruba haplotypes. The reviewer's concern about imputation-driven inflation of IBD sharing for common haplotypes should not be relevant in our case.

      Second, regarding HLA haplotype resolution: we trained a bespoke HIBAG model directly on the Himba SNP array genotype data paired with ground-truth HLA allele calls from our own targeted HLA capture sequencing. This Himba-specific model was then used to impute HLA alleles from pseudo-homozygous genotypes derived by extracting phased SNP-based haplotypes across the HLA region for the same individuals. In this way we resolved the phase of the HLA allele calls.. To our knowledge, this paired-data approach to individual-level HLA haplotype resolution is novel; existing HLA haplotype resolution tools generally provide only population-level haplotype frequency estimates rather than individual-level phase assignments. We are confident in the reliability of the haplotypes we report. Resolved haplotypes were required to match the known targeted-sequencing HLA allele calls at a minimum of the first field for at least one allele, and both haplotypes could not be assigned to the same allele unless the individual's HLA allele calls were homozygous. Of 722 total haplotypes, 698 were successfully resolved under these criteria. We report results only on these confidently resolved haplotypes.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      We thank the reviewer for highlighting the difficulty in modeling selection at the HLA - a problem that deserves considerable attention. We acknowledge that demographic processes such as the documented Himba population bottleneck can result in elevated IBD sharing (Swinford et al. 2023, PNAS). However, our comparison of HLA IBD sharing rates against a genome-wide baseline is designed to address this: demographic processes affect all regions of the genome, so if the HLA region maintains elevated IBD sharing significantly above the genome-wide threshold, this provides meaningful evidence for a locus-specific effect beyond demographic history alone.

      We agree with the reviewer that the recombination landscape of the HLA region is complex, but this complexity itself is consistent with the region being a frequent target of selection. Previous HLA analyses have found that at the allele level, frequencies are consistent with balancing selection, while multi-locus haplotype frequencies are consistent with purifying selection and positive frequency-dependent selection (Alter et al., 2017), patterns that contribute to the complex recombination rate heterogeneity observed in the region. Recombination rate can be both a cause of extended haplotypes but also the consequence of selection against combinations of alleles.

      As Alter et al. note, the high levels of linkage disequilibrium observed among HLA alleles serve to limit the amount of diversity within HLA haplotypes, but balancing selection at the allelic level maintains multiple HLA haplotypes at high frequency across populations over long periods of time — so-called "conserved extended haplotypes" as we observe (Supplementary Figures 1 and 9). Regarding the specific selective mechanism, our results are not equally consistent with all forms of balancing selection. Albrechtsen et al. (2010) explicitly modeled overdominant balancing selection and demonstrated that equilibrium overdominance does not produce elevated IBD sharing as we observe — our results are therefore inconsistent with this mechanism. Instead, Albrechtsen et al. conclude that allele frequency change is required to generate elevated IBD, consistent with bouts of directional selection such as negative frequency-dependent or fluctuating positive selection. We will make explicit that while our findings do not support overdominance, they are consistent with these temporally dynamic forms of selection driving periodic allele frequency change at the HLA locus. We will also incorporate local recombination rate into Figure 4 to provide a comparison of local recombination rate across chromosome 6 with the observed areas of elevated IBD sharing.

      Alter, I., Gragert, L., Fingerson, S., Maiers, M., & Louzoun, Y. (2017). HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS computational biology, 13(8), e1005693.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

      We will clarify the presentation of partnership counts and sample sizes throughout the manuscript and improve the scaling and annotation of the flagged figures. Regarding DRB copy number variation, we will add explicit discussion of our analytical choices and their potential limitations. As described in our responses to the main concerns above, we will also provide more nuanced framing of the selective mechanisms consistent with our IBD results, avoiding conclusions that go beyond what our analyses directly support.

      Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      We will improve the framing of our project within the broader non-human MHC mate choice literature in our discussion.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

      We would like to clarify that we did assess the unique pathogen peptides bound across all HLA class I and class II genes by each population's common haplotypes (Figures S12–S13). We acknowledge the reviewer's point that non-pathogenic peptides are also important — for example, binding with self-produced proteins. However, binding with self-produced proteins is more relevant to autoimmune risk, and the selective pressures involved are outside the scope of our current work, which focuses on pathogen-induced fluctuating directional selection and heterozygote advantage. Furthermore, selection on non-pathogenic peptide binding repertoires likely operates in the opposite direction to pathogen repertoire; whereas broader pathogen peptide binding is advantageous, broader self-peptide binding risks excessive immune activation.

      Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      We thank the reviewer for this important clarification. Our claim was intended to be more specific: to our knowledge, this is the first study to investigate HLA-based mate preferences in a non-European small-scale society while explicitly controlling for genome-wide relatedness. Hedrick and Black (1997) did not include genome-wide relatedness controls, which is a critical distinction given that ancestry-assortative mating can produce spurious patterns of HLA similarity or dissimilarity in the absence of such correction. We will make this qualification explicit in the revised manuscript.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      We thank the reviewer for this reference. In our revision, we will incorporate Croy et al. (2020) into our discussion and use it as a reference for comparing the Himba’s probability of highly homozygous offspring given population allele frequencies. This comparison will help support our claim that background HLA diversity in the Himba is sufficiently high so that any unrelated partner is already likely to yield adequately dissimilar offspring—a scenario that would reduce the selective benefit of active HLA-based mate choice and could mask any such preference even if it exists.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      The reviewer is correct that individuals appear multiple times in the dataset—some individuals are members of multiple known partnerships, and all individuals are additionally included many times across the full set of possible random heterosexual pairings that meet our age and relatedness criteria. This non-independence is explicitly addressed in our dyadic linear mixed models by including female ID and male ID as random effects, which account for each individual's unique contribution to their similarity scores across all pairings, both real and random. We explain this explicitly in the (n) Statistical Models section of the methods section.

      Regarding discovered partnerships: we grouped these with reported informal partnerships in the current analyses due to modest sample sizes. We agree this is worth examining more carefully and will test, in our revision, whether treating discovered partnerships as a separate category, or excluding them entirely, meaningfully affects our results. We will report these analyses as a sensitivity check.

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      This information is reported in the (n) ‘Statistical Models section of the Methods’. No pairs were found to be closer than 3rd degree relatives. No arranged marriages were related at 3rd degree or closer; 1 love match marriage and 2 informal partnerships discovered through pedigree analysis were found to be 3rd degree relatives.

      Regarding the difference in relatedness thresholds: we used a 4th degree cutoff to define the unrelated set of individuals for allele and haplotype frequency analyses (n=102), as even 3rd degree relatives would inflate allele frequency estimates. In contrast, we permitted 3rd degree relatives in the background distribution for the partnership analyses to reflect the stated cultural preference for cousin marriages in arranged unions—excluding them would have made the background distribution less representative of the actual mating pool. We explain both decisions in Methods sections (d) and (n).

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      While HIV prevalence is indeed high in Namibia generally, the Himba are a relatively isolated population and, based on personal communication with Dr. Ashley Hazel—who has extensive field experience studying sexually transmitted infections in the Himba (see references 36, 52, 53, and 54)—there is no evidence of HIV transmission within this population. Dr. Hazel's expertise on this question was the basis for our exclusion of HIV from the pathogen list.

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      We will clarify this in our revision, but we restricted random couples to have an age gap within the range observed in actual, known partnerships (the woman is maximum 16 years older than then man and minimum 53 years younger than the man). We included this criteria to make sure random couples represented the best approximation of background, realistic partners. Our age gap criteria was quite permissive due to the large range observed in our actual pairs and we do not imagine it significantly impacted our results.

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      We would like to clarify that for each analysis we explicitly report both the effects of chosen and arranged partnerships relative to the background distribution intercept, and the pairwise contrast between chosen and arranged partnerships. The intercept of each model is derived from the full background distribution of random opposite-sex pairings meeting our age and relatedness criteria, providing a null expectation under random mating. A non-significant effect for both partnership types therefore indicates that neither arranged nor chosen partnerships differ from random mating with respect to the metric in question. We describe this explicitly in the Statistical Models section of the Methods, but we will ensure this interpretation is stated more prominently in the Results section of the revised manuscript to avoid any confusion.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      We can incorporate separate HLA similarity/log odds of homozygous offspring analyses for class 1 and class 2 in our revision.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      We will expand our discussion in the revision to provide a more detailed comparison with previous studies, including Croy et al. (2020), and will add an explicit limitations section incorporating suggestions from multiple reviewers on more careful framing of optimality and specific selective mechanisms. Regarding sample size, we acknowledge this as a genuine limitation given the extensive polymorphism of the MHC region. However, our unrelated sample size used for allelic diversity estimated is comparable to previous studies in African populations (Figure 1), and our dataset is uniquely comprehensive in combining HLA class I, class II, genome-wide SNP data, and partnership data within the same individuals—a combination that enables the genome-wide relatedness correction that distinguishes our study from much of the prior literature.

      References

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

      Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

      Thanks to the referees' input and more work, we think our revised manuscript now meets the high standard of eLife

      Recommendations for the authors:

      The importance of the circular swimming chirality for the observed phenomenon could be further emphasized by actually using the word "chiral" or "chirality" in the text. Also indicating what would change is swimming were counterclockwise rather then clockwise would help the reader understand the key significance of chirality.

      We thank the reviewer for this insightful suggestion. We agree that the chirality of the surface interaction is central to the observed phenomenon and should be explicitly highlighted to improve the reader's understanding.

      In response, we have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming. We clarify that in such a case, the hydrodynamic interaction would cause cells to veer left, resulting in up-gradient accumulation along the left sidewall rather than the right. We believe these additions significantly improve the clarity of the underlying physical mechanism.

      Reviewer #1 (Recommendations for the authors):

      I still have several comments that the authors may want to consider for the last version.

      - The run and tumble behavior of the cells at the surface remains puzzling and would need some more explanation in the text. Tumbles with no significant reorientation angle amount largely to smooth swimmers. How can a model based on run-and-tumbles be used to explain the difference between LSW and RSW?

      We apologize for the lack of clarity regarding the surface run-and-tumble behavior. While it is true that surface tumbles often result in smaller reorientation angles compared to bulk swimming, they are not negligible and play a critical role in the observed asymmetry. As shown in the tumble angle distributions (Fig. 2E and 2F), the probability of a tumble angle exceeding π/2 is approximately 9% for sidewall trajectories and 30% for the middle area. This tumbling behavior leads to differences between the left sidewall (LSW) and right sidewall (RSW) in two key ways:

      First, as detailed in our geometric analysis (Fig. 6), running cells following stable clockwise circular paths are geometrically favored to reach the RSW. Because cells moving up-gradient (towards the RSW) experience suppressed tumbling, they maintain these stable circular trajectories and accumulate effectively. Conversely, cells moving down-gradient (towards the LSW) experience enhanced tumbling. These frequent interruptions distort the circular trajectories required to reach the LSW, resulting in fewer bacteria entering the LSW compared to the RSW.

      Second, once at the wall, the difference in tumbling frequency dictates retention. Majority of LSW cells are swimming down-gradient (LSW-DG) and thus tumble more frequently, increasing their probability of escaping the wall. Majority of RSW cells are swimming up-gradient (RSW-UG), suppressing tumbles and increasing their residence time at the wall.

      The relevant clarifications have been included in the last paragraph of “Results” in the manuscript.

      - Figure 5B would need more explanation. I still don't understand the different behaviors for the right and left side walls at small widths. Is it noise really or a more complex behavior? Since most of these calculations are based precisely on the shape of these curves it would be useful to discuss them in more detail.

      We apologize for the lack of clarity. The behavior observed at small widths in Figure 5B is not noise; rather, it reflects the idealized nature of our simulation model.

      In the simulation, bacteria were modeled as active particles without explicit steric exclusion for the flagella and cell body. Consequently, simulated cells retain the ability to reorient and turn freely even in very narrow lanes (w ≤ 6 μm), allowing the geometric sorting mechanism (which favors the RSW) to function efficiently even at small widths. This is why the simulation shows a distinct difference between LSW and RSW proportions in this regime.

      In the experimental reality, however, the finite size of the bacterial body and flagella creates steric hindrance. In narrow channels, this physical constraint restricts the cells' ability to turn, thereby disrupting the circular swimming mechanism required to sort cells into the RSW. As a result, experimental data shows that the proportions of LSW and RSW cells tend to equalize in narrow channels (e.g., w = 6 μm in Fig. 4B), leading to a lower chemotactic drift velocity than predicted by the simulation.

      We have added a discussion regarding these steric effects and the deviation at narrow widths to the Results section (the penultimate paragraph of subsection "Simulation of E. coli chemotaxis within lane confinement") in the revised manuscript.

      - The importance of the chirality of the circular trajectories, although essential, remains insufficiently mentioned in the text.

      We have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming.

      - It would be useful to color-code the trajectories of Figure 1B and alike with time.

      Thank you for the suggestion. Now the trajectories in Fig. 1B have been redrawn. Distinct colors denote individual trajectories, with color intensity darkening to indicate time progression.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Lenz and colleagues describes a detailed examination of the epigenetic changes and alterations in subnuclear arrangement associated with the activation of a unique var gene associated with placental malaria in the human malaria parasite Plasmodium falciparum. The var gene family has been heavily studied over the last couple of decades due to its importance in the pathogenesis of malaria, its role in immune avoidance, and the unique transcriptional regulation that it displays. Aspects of how mutually exclusive expression is regulated have been described by several groups and are now known to include histone modifications, subnuclear chromosomal arrangement, and in the case of var2csa, regulation at the level of translation. Here the authors apply several methods to confirm previous observations and to consider a possible role for DNA methylation. They demonstrate that the histone mark H3K9me3 is found at the promoters of silent genes, var2csa moves away from other var gene clusters when activated, and while DNA methylation is detectable at var genes, it does not seem to correlate with transcriptional activation/silencing. Overall, the data and approach appear sound.

      Strengths:

      The authors employ the latest methods for epigenetic analysis of histone marks, transcriptomic analysis, DNA methylation, and chromosome conformation. They also use strong selection pressure to be able to examine the gene var2csa in its active and silent state. This is likely the only paper that has used all these methods in parallel to examine var gene regulation. Thus, the paper provides readers with confidence in the interpretation of independent methods that address a similar subject.

      We thank the reviewer for this positive assessment. We appreciate the recognition that our study combines complementary approaches including histone mark profiling, transcriptomic analysis, DNA methylation mapping, and chromosome conformation capture in parallel to the use of strong population selection that enables a controlled comparison of var2csa in active versus silent states. We agree that the convergence of independent methods strengthens confidence in the interpretation.

      Weaknesses:

      The primary weakness of the paper is that none of the conclusions are novel and the overall conclusions do not shed much new light on the topic of var gene regulation or antigenic variation in malaria parasites. The paper is largely confirmatory. The roles of H3K9me3 and subnuclear localization in var gene regulation are well established by many groups (including for var2csa), albeit in some cases using alternative methods. The only truly unique aspect of the manuscript is the description of 5mC at var2csa when the gene is transcriptionally active or silent. Here the authors demonstrate that the mark has no clear role in transcriptional activation or silencing, however, this will not be surprising to many in the field who have previously cast doubt on a regulatory role for this modification.

      While we agree that some individual features of var gene regulation, including H3K9me3 enrichment, have been described previously, our study integrate for the first time several layer of gene regulation on the clinically important var2csa locus using phenotypically homogeneous placental-binding parasite populations. As expected, var2csa activation coincided with a loss of H3K9me3 at the locus. However, using high-resolution chromatin conformation capture (to our knowledge, this experiment had never been applied to phenotypically homogeneous parasite populations), we quantified the repositioning of var2csa relative to heterochromatic telomeric clusters. We further assessed DNA methylation in this framework and show that 5-methylcytosine is broadly present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Reviewer #2 (Public Review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa. Finally, they provide preliminary evidence potentially implicating 5mC in the transcriptional regulation of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      As stated in our response to Reviewer 1, our study combines, for the first time, complementary approaches, including transcriptomic analysis, histone mark profiling, DNA methylation mapping, and chromosome conformation capture, together with strong population selection to enable a controlled comparison of var2csa in active versus silent states.

      Weaknesses:

      No major new finding is reported. The strength of the evidence presented is mostly solid, although certain elements, e.g., the role of 5mC in transcriptional regulation of var2cs, appear preliminary and incomplete.

      While we agree that no major new finding is reported, we were able to use for the first time a high-resolution chromatin conformation capture method to quantify the repositioning of var2csa relative to heterochromatic telomeric clusters. We also further assessed that 5-methylcytosine is present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate for the first time transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) In the second paragraph of the introduction, the authors state "....such as the shielding of the parasite antigens expressed on pRBC surfaces by other cells and the evasion of splenic clearance (8)." What does "other cells" mean here?

      We thank the reviewer for this comment. We have clarified the cell type in the text.

      (2) In their interpretation of the Hi-C data, the authors conclude that the var2csa expressing parasites display "tighter heterochromatin control of var gene regions" and "interactions around other silent var genes were increased" and "an overall compaction of telomere ends and var gene-containing intrachromosomal regions". While the data appear to show that this is true when they compare the two parasite populations, I am concerned that the authors might be misinterpreting the data. It is important to note that the NF54CSAh line is heavily selected to be nearly entirely homogeneous for var gene expression while the NF54 line is exceptionally heterogeneous. This is shown in Figure 1G. Thus, any chromosomal arrangement specific for var gene expression in the unselected NF54 population will be similarly heterogeneous and therefore could appear less tight. In other words, interactions around silent var genes and overall compaction of telomere ends might be identical between individual parasites within these populations, but appear tighter or more compact in the var2csa expressing line simply because it is a homogeneous population. Perhaps this is what the authors meant to convey, however as currently written, it seems that they conclude the expression of var2csa results in a unique change in chromosome organization. A better comparison would be two populations homogeneously expressing different var genes, one expressing var2csa and one expressing an alternative var gene. Such lines can be generated through clonal isolation or selection for binding to a different host receptor.

      We thank the reviewer for this comment. The reviewer is correct, and we have revised the Discussion section of the manuscript to clarify this issue.

      (3) The title of the last section of the Results is "Distribution of DNA methylation influences gene expression overall but does not mediate transcriptional activation and switching in antigenic variation". This is an overstatement. The authors show that DNA methylation is absent at var gene promoter regions and enriched in coding regions, but there they provide no evidence that it "influences gene expression overall". This is speculation. Lastly, when the authors examined 5mC occupancy across genes, did they normalize for GC content of the DNA sequences? GC content is known to increase dramatically in coding regions (particularly in var genes) and thus could explain the distribution of this mark. If the authors corrected for this, they should directly state this in the results section. If they did not, they should explain why they don't think this property of the P. falciparum genome explains the distribution of 5mC.

      There is often a misconception in the field that DNA methylation is primarily confined to CpG islands in promoter regions and functions mainly as a repressor of transcription. However, in contrast to promoter methylation, methylation within gene bodies is generally associated with higher levels of gene expression, suggesting a role in facilitating transcription elongation. Gene-body methylation can also repress internal promoters, thereby preventing spurious transcription initiation within the gene. In addition, it has been shown to influence alternative splicing by affecting RNA polymerase II elongation kinetics.

      We propose that, in Plasmodium, DNA methylation may be associated with priming genes for transcriptional activity rather than repressing transcription. Specifically, higher methylation levels may facilitate recruitment of the RNA polymerase II transcriptional machinery to enable transcription. In Figure 4B, we observe higher levels of DNA methylation in the first exon of highly expressed genes in both the NF54 and NF54CSAh lines. Interestingly, we also detect high levels of methylation across most introns of the var genes, introns that must be transcribed, cannot be degraded, and are essential for var gene regulation, suggesting a possible sequence-recognition function. We have edited the manuscript to improve clarity.

      (4) In the legend to Figure 3D, the authors state that the centromeres are shown in blue, however in the figure they appear to be grey while var2csa is blue.

      We have revised the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):

      I recommend using the term "transcription" rather than "expression" when discussing events at the gene level.

      We have revised the manuscript accordingly.

      I also recommend using the term "adhesion" to describe the physical interaction between infected erythrocytes and adhesion receptors rather than adherence", which should be reserved to describe non-physical affinity (e.g., beliefs, faith).

      We have revised the manuscript accordingly.

      Important new evidence regarding transcriptional regulation of var genes in general and var2csa in particular should be discussed and cited.

      We have revised the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Using a convincing combination of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the authors reveal how cargo binding induces an allosteric transition that propagates to the motor domains and enhances MAP7 binding. Despite limitations arising from conformational heterogeneity and structural resolution, the study presents a unified mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

      We are grateful for the time and effort from the reviewers and editors in providing fair and constructive comments that have helped to improve the manuscript. Our point-by-point response is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain an important question with respect to intracellular transport. It has implications for the accuracy and efficiency of motor transport by different motor families, for example, the direction of cargos towards one or other microtubules.

      Strengths:

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that occur. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry, and simple cell-based imaging. Each set of experiments is thoughtfully designed, and the intrinsic limitations of each method are offset by other approaches such that the assembled data convincingly support the authors' conclusions. This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field.

      Weaknesses:

      It is not always straightforward to follow the design logic of a particular set of experiments, with the result that the internal consistency of the data appears unconvincing in places.

      For example, i) the Figure 1 AlphaFold3 models do not include motor domains whereas the nearly all of the rest of the data involve constructs with the motor domains;

      We appreciate the reviewer’s comment regarding the absence of the motor domains in the AlphaFold3 models shown in Figure 1. These domains were intentionally excluded to improve visual clarity and to better highlight the interaction between the TPR domains and CC1 in the inhibited kinesin-1 conformation. We felt that this simplified presentation in the main figure helps readers focus on the key mechanistic advance introduced in this work at the outset of the paper. For completeness, we have provided full-length kinesin-1 AlphaFold3 models that include the motor domains in the Supplementary Information (Fig. S1), and they are described in detail in the main text. In addition, we have added a note to the Figure 1 legend to explicitly direct readers to these full-length models.

      ii) the kinesin constructs are chemically cross-linked prior to TEM sample preparation - this is clear in the Methods but should be included in the Results text, together with some discussion of how this might influence consistency with other methods where crosslinking was not used.

      Thank you. Chemical crosslinking is typically important for obtaining high-quality negative-stain TEM grids of kinesin-1 complexes and has been employed in all prior EM studies by our group and others. While this was described in the Methods, we agree that it should also be stated explicitly in the Results. Accordingly, we have added a sentence to the Results section noting that the proteins were stabilized using the amine-to-amine crosslinker BS3 (“Proteins were also stabilised using the amine-to-amine crosslinker BS3 that was important for achieving reproducibly high-quality samples for imaging.”).

      Please see point below for acknowledgement of risks of using crosslinker.

      Can those cross-links themselves be used to probe the intramolecular interactions in the molecular populations by mass spec?

      We had considered this, however, cross-linking mass spectrometry (XL-MS) has been applied extensively to essentially identical kinesin-1 complexes by Tan et al. (eLife 2023). That work provided important insights into the overall architecture of the complex, including the new head–CC1 interactions. However, as fully acknowledged by the authors, significant ambiguity remained with respect to the positioning of the TPR domains, with many cross-links that could not be straightforwardly rationalized in a single model. These unresolved aspects provided part of the motivation for the present study, as highlighted in the Introduction.

      We believe that this ambiguity likely reflects an underlying conformational equilibrium of the kinesin-1 complex (e.g. opening/closing transitions) and/or dynamic docking and undocking of the TPR domains, and lysine-rich features of the TPR domains (most notably the loops that connect the TPR alpha helices) which may make them prone to lock in non-native states, which limits the interpretability of static cross-linking data in this system. In this context therefore, we feel that XL-MS has already been thoroughly explored for kinesin-1 and that its practical limitations in resolving these TPR interactions have been reached.

      This consideration was a primary motivation for pursuing cross-linker-free, solution-based approaches, particularly HDX-MS, which we argue provide the most relevant new insights into the assembly and conformational dynamics of the complex. To make this rationale clearer, we have added an explicit note in the HDX-MS section emphasizing that this is a cross-linker-free method. The added text reads:

      “To determine how the local structural changes from adaptor binding and shoulder dislocation affected the dynamics of kinesin-1 complexes in solution, as directly and least invasively as possible, and without the risk of cross-linker artefacts.”

      In general, the information content of some of the figure panels can also be improved with more annotations (e.g. angular relationship between views in Figure 1B, approximate interpretations of the various blobs in Fig 3F, and more thought given to what the reader should extract from the representative micrographs in several figures - inclusion of the raw data is welcome but extraction and magnification of exemplar particles (as is done more effectively in Fig S5) could convey more useful information elsewhere.

      We appreciate these suggestions. We have modified the figures throughout the manuscript in line with the reviewer’s points. Raw data is now provided at higher magnification throughout so the reader can better distinguish individual particles, angular relationships have been added and further annotations provided on 2D class averages. We do not want the reader to draw too many conclusions from images of single closed particles (with the exception of open vs closed in Fig S7) as these require averaging and 2D classification to obtain meaningful insights, and so we have not added zoom panels in these cases. Figure 3F has been annotated as requested.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS), and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility and enhancing interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use a clever construct design - e.g., delta-Elbow, ElbowLock, CC-Di, and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or affecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies. The paper is well-written and easy to follow, though some more attention to figure labels and legends would improve the manuscript (detailed in recommendations for the authors).

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is, and perhaps this could have been explored more here. The authors have shown by HDX-MS that the motor domains become more mobile on KinTag binding, but perhaps molecular dynamics would also be useful for modelling how that might occur.

      We are grateful for the reviewer’s comments. We agree that the weaknesses the reviewer has outlined define the limitations of the study and establish important priorities for future work, that includes molecular dynamics simulations. An important prerequisite for the latter is a starting model that one has confidence in. We think that our study and earlier work now provide a good experimentally supported foundation for using AF3 generated assemblies for this purpose, by ourselves and others.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation - how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation.

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      We agree with the reviewers point. Conformational heterogeneity is a significant challenge, and the model has been developed from multiple complementary approaches. A higher resolution cryoEM study remains a priority, but is challenging because of the size, shape and flexibility of the particle, but we hope that some the approaches used here (e.g. nanobody TPR stabilisation, ElbowLock) will provide a path to achieve this.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      We agree that this is a limitation. We strongly suspect that the TPR domains dynamic and are working to overcome experimental challenges to resolve this important outstanding question. We have expanded the discussion section to better highlight this important priority.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

      We agree that this is a limitation but will be an important priority for future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of places where the text could be more precise or clear, or the figures could be designed to be more informative:

      (1) The word "unitarily" is used in several places, and I don't know what it means in this context.

      We have changed the phrasing throughout the manuscript to this term. We were attempting to contrast with presumed cooperative multivalent interactions in the context of the kinesin-1 tetramer but agree that this choice of word doesn’t quite achieve that.

      (2) On page 5 the phrase "We focused on the ElbowLock background" is introduced and needs to be explained more clearly.

      Thank you. We have amended the text to read “This KIF5C construct contains a short 5 amino acid deletion that restricts flexibility around the elbow and helps maintain particles in their lambda conformation, providing homogenous samples, and facilitating subsequent analysis (34).”

      (3) On page 6, the phrase "To improve the resolution of our images, we turned to single-particle cryoEM analysis" is imprecise - what do the authors mean by the resolution of the images? Cryo-EM data does not always guarantee a higher resolution structure, but it offers the possibility of visualising finer structural features. This is probably what is meant here, but needs to be stated more precisely.

      We have amended the text to ‘visualise finer structural details’ as suggested.

      (4) Page 7 - "suggesting that TPR domains had loosely dissociated from the core" - I don't think the evidence points to dissociation of KLCs from the complex, but the phrase "loosely dissociated" implies this - would benefit from rephrasing.

      We have changed this to ‘undocked’ for consistency with other descriptions in the manuscript.

      (5) Was the effect of the CC-Di insertion (ΔTDS) detectable by AlphaFold prediction? It would be interesting to include this, partly for completeness and partly because a slightly imperfect and maybe a more dynamic coiled-coil in this region of the molecule may be important in supporting the conformational changes required for activation.

      Thank you for this suggestion. Modelling of deltaTDS complex indeed shows displacement of the TPR domains. In the standard 5 output models, the TPR domains now occupy a variety of different positions, all with essentially zero confidence (high position error). Consistent with biochemical data, the CCDi insertion is modelled with with no overall disruption to the architecture or length of CC1 as expected. We think that this is a valuable addition to the study and have included it as a new supplementary figure (Fig S5), with main text reading.

      …. “Supporting this, models of ΔTDS complexes using AF3 showed the expected seamless insertion of CCDi into CC1, with displacement of the TPR domains to a variety of different positions, in 5 models, all with high position error with respect to KHC (Fig S5).”

      (6) Figure S1 has two sections designated (C) in the legend.

      Corrected

      (7) Figure S3 - given the resolution and level of interpretation of the 3D reconstructions, it is not relevant to include an FSC curve, but other standard information, such as angular distribution and any evidence of variability from 3D classifications (and how many particles per 3D class) should be included for all structures.

      Thank you, a complete workflow for all complexes has now been provided in Figure S8 with the information requested. In each case there were typically two ‘good’ classes. For ElbowLock, this included one without a prominent shoulder, consistent with 2D classification and quantification. We assume this may reflect a docking/undocking equilibrium. For the deltaTDS and KinTag particles, neither class showed the shoulder feature. The main text has been modified to reflect this and reads “For ElbowLock complexes, this resulted in classes with and without a prominent shoulder, in agreement with 2D classification. For ElbowLock-ΔTDS and ElbowLock-KinTag complexes, no prominent shoulder containing classes were observed.”

      Reviewer #2 (Recommendations for the authors):

      Overall, the figures would benefit from more labels for clarity, some examples and suggestions below:

      (1) Figure 1A - Connect motors to the rest of the structure e.g., wiggly lines.

      Corrected.

      (2) Figure 1B - Add arrows and angles to indicate different views of the model.

      Corrected.

      (3) Figure 1B - Label TPR1-6 (e.g., inset zoom in).

      Corrected.

      (4) Figure 2D and 3D - Label the lack of a shoulder in all averages (perhaps with an arrow instead of a circle to not obscure density), include an example average which shows prominent shoulder density.

      Corrected. Full sets of classes showing shoulder like features for deltaTDS and KinTag complexes are now shown in Figure S4.

      (5) Figure 3D: Label motor domains and elbow as in other figures.

      Corrected.

      (6) Methods: Include more information on how EM classes were compared to AF projections (e.g., Figure 1D). Was this done visually or computationally? Likewise, more information is needed on how classes were judged to have prominent/weak shoulder density (Figure 2D). In the figure legend, there is a statement that "Full sets of classes are provided in Fig. S4" but this is absent in the supplement.

      Thank you. This information has been added to the methods.

      “For comparison to the AF3 model, simulated density was generated using the molmap command in ChimeraX (73) filtering to 15 Å, and projections were generated/selected automatically using the Reference Based Auto Selected 2D function in CryoSPARC”.

      Full sets of classes are now provided in Figure S4.

      (7) Figure 1-3 - Raw micrographs are a very useful inclusion but would benefit from being a more zoomed-in view (e.g., Figure S5 scale). Particularly useful for 3C, where the mixture of open and closed would be good to see.

      Higher zoom micrographs have been provided throughout.

      (8) Figure 5D: Panels too small to see the result, suggest making full width and moving E below.

      Thank you. We have expanded the panel and moved the model to a new Figure 6.

      (9) Figure S1: PAE plot convincing, but pLDDT colour models needed.

      A representative model coloured for pLDDT has been added to Figure S1. Most of the structure sits within the light blue confident range (90 > pLDDT > 70) with the exception of the disordered regions and neck coil.

      (10) Figure 5B: Reason for the variable inputs?

      The reviewer raises an interesting point. The slightly reduced expression of deltaElbow and slightly increased expression of ElbowLock is a consistent feature of these experiments. We note that this effect is in the ‘opposite direction’ to the impact on binding to MAP7 and so does not affect our conclusions from the experiment. However, we wonder whether opening and closing of the complex may impact on turnover of kinesin proteins, which could have implications for their normal homeostasis and possible degradation after transport in polarised cells. We are considering how to explore this going forwards. We have added a note to the results section to highlight this interesting observation to the reader.

      “We also noted slightly elevated expression of ElbowLock complexes and slightly lower expression of DeltaElbow complexes, suggesting that opening/closing of the complex could impact on kinesin-1 turnover”

      (11) Figure legend 5B: Insufficient detail, the end result is stated, but the three separate gels are not described.

      Legend has been expanded.

      (12) Figure 3F: Currently somewhat problematic. It is unclear if the models are in the same view, and so comparison is difficult. Figure 1C (bottom right) shows class averages with a clear, separate CC density, so the relatively featureless model in this region is puzzling. A statement on how the three model views are related to each other, if aligned with each other, would be useful.

      We appreciate the reviewers point. Models were aligned in Chimera, using the fit in map command. Because of the limited features of the models presumably due to flexibility, achieving a good alignment for all three models was challenging, but we think that showing the 180-degree rotations is probably about the best we can achieve here.

      (13) The following statement is too strong: "Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length 'side' views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features which enabled us to identify CC1 confidently (Fig. 1D)". Given that the negative-stain EM data were collected primarily to validate the AlphaFold model, the assignment of CC1 should be described as consistent with rather than confidently identified from the class averages. The resolution of the EM data does not independently support such an assignment, and the wording needs to be softened.

      We appreciate the reviewer’s point, we have softened the wording as suggested. The paragraph now reads.

      “To visualise finer structural details, we turned to single-particle cryoEM analysis of frozen-hydrated samples. We were unable to obtain optimal samples suitable for determining the complete structure. Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length ‘side’ views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features (Fig. 1D). The motor domains were poorly resolved in these classes, suggesting that the head assembly is somewhat flexible relative to the coiled coil/TPR body. A comparison to low-pass filtered back-projections from the AF3 model (without motor domains) revealed density at a position concurrent with the docked TPR domains (Fig. 1D).”

      (14) There is a typo in the figure legend of Figure 3 - (E) and (F) should be (F) and (G).

      Corrected

      Reviewer #3 (Recommendations for the authors):

      I recommend the following additions:

      (1) Figure 1 labeling - In panel A, please label the "linker domain" and the "KLC subunits" explicitly to help orient the reader. In panel B, please mark the "TPR shoulder" corresponding to the docked TPR domains on CC1; this will help the reader connect parts B and C.

      Thank you, we have modified Figure 1A with this additional information.

      (2) The TPR docking site (TDS) is a central structural element, and its sequence boundaries are provided in the Methods. It would help to visualize this directly in Figure 2A or in an inset.

      We hope that the reviewer agrees that the zoomed in model in Figure 5A (alongside MAP7) provides a sufficiently detailed view of the structural interface to highlight the orientation of TPR1 with respect to CC1. The side chain contacts in the model are very plausible and confidently predicted (and can be straightforwardly reproduced in AF3 using the sequence information provided in the methods), but as our study has not explored this interaction at the single residue level, we would prefer not to imply this to the reader at this stage.

      (3) The authors' model of cargo-induced TPR dislocation is convincing. However, the Discussion could benefit from a clarification on whether both KLC-TPR domains are expected to be bound simultaneously or if a dynamic exchange occurs, as the EM data suggest potential asymmetry.

      Thank you, please see point 5 below where we have modified the discussion to reflect the reviewer’s thoughtful comments.

      (4) The HDX-MS analysis is comprehensive, but the authors may want to briefly comment on the coverage of low-signal regions (especially within CC2-CC3) to enhance clarity.

      We have added an additional supplementary figure (S10) showing sequence coverage. Overall, this is 88% but with some lower coverage around KHC-CC0 (neck) and the acidic linker that connects the KLC coiled-coil to the TPR. We have added a note to the main text to reflect this.

      “Sequence coverage was high (overall 88%) with the exception of KHC-CC0 (neck coil) and the acidic-linker region that connects the KLC coiled-coil to the TPR domains where coverage was lower”

      (5) In the Discussion, the proposed interplay between MAP7 and cargo adaptors is intriguing, especially considering the results from Anna Akhmanova's lab showing that MAP7 activates kinesin-1 processivity. Do the authors suggest that competition for CC1 is mutually exclusive or sequential? The answer has mechanistic implications.

      We have been considering questions for some time, and the short answer is that we don’t fully understand the dynamics yet. However, we appreciate the reviewer’s prompt to clarify our thinking on this. We have attempted to do this in a revised discussion section where we more explicitly outline these outstanding questions.

    1. Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is quantifying how the presence of water or sucrose in water-deprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major in this process not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification on the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg: Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water-deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      Comments on the revised manuscript:

      The manuscript has been revised and improved significantly by the addition of methodological details and new analysis. I remain, however, unconvinced by the argument that increased vigilance in the presence of reward leads to heightened escape behaviour.

      In response to my criticism that the work does not measure vigilance directly, the authors have included measures of foraging interval and foraging speed, which they state are "two direct behavioral analyses of vigilance". I disagree - like reaction time, foraging speed and foraging interval can be modulated, for example, by changes in threat sensitivity. Increased threat sensitivity comes with diverse behavioral changes that may well include increased vigilance, but foraging interval and foraging speed can certainly change without the animal expressing increased vigilance behaviors. A bigger issue I still have though, is with the conclusion that the presence of reward increases "direct escape behaviors". Comparing the no reward, water and sucrose groups indeed shows a difference (which is now clear after the split into early and late phases), but the issue is that these are different mice. As the text is written, is sounds like introducing reward will acutely increase escape. But if we look at the raw data show in Figure 2C, what I think is happening is that the presence of reward is decreasing habituation to the stimulus. The data for trials 1 and 10 in the three conditions show this - there is habituation with no reward (reaction times are all shifting to the right), a bit less with water and very little with sucrose. This is interesting in its own right and we can speculate why it might be happening, but I think this is conceptually different from what the authors are proposing.

    2. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study by Li and colleagues examines how defensive responses to visual threats during foraging are modulated by both reward level and social hierarchy. Using a naturalistic paradigm, the authors test how the availability of water or sucrose, with sucrose being more rewarding than water, shapes escape behavior in mice exposed to looming stimuli of different intensities, which are used to probe perceived threat level and defensive responses. In parallel, the study compares dominant and subordinate animals to assess how social rank biases the trade off between reward seeking and threat avoidance. By combining detailed behavioral analyses with computational modeling, the work addresses how reward level and social context jointly influence escape decisions in an ethologically relevant setting.

      Across the different experimental conditions, perceived threat level is the main determinant of behavior. The authors show that looming stimuli associated with higher threat (contrast) consistently elicit faster and more robust escape responses than lower threat stimuli. This effect is particularly evident during early exposures, when animals are highly vigilant and have not yet habituated to the looming stimulus (learned that it is not dangerous). Later they described that as animals gain experience and habituate, behavior becomes more flexible, and reward level begins to exert a graded modulation of the escape response. Importantly, the authors show that under high threat conditions increasing reward value leads to more frequent and faster escape rather than greater reward pursuit. This finding is particularly relevant, as it suggests that highly valued rewards can heighten vigilance and thereby enhance responsiveness to threat, highlighting that reward does not simply compete with defensive behavior but can also reshape it depending on the perceived level of danger, in contrast to low threat conditions, where threat can be more easily outweighed by reward. Thus, an important conceptual contribution of the study is the introduction of vigilance as a useful framework to interpret these effects. Vigilance is treated as a behavioral state reflecting heightened attention to potential danger. In line with what is known from natural foraging, mice initially maintain high vigilance when confronted with an innate threat. This perspective helps clarify a finding that might otherwise appear counterintuitive. One might expect higher rewards to motivate animals to tolerate risk, explore more, and habituate faster in any scenario. Instead, the data suggest that highly rewarding outcomes can elevate vigilance, making animals more responsive to threat and leading to faster or more frequent escape under high threat conditions. In this sense, reward does not simply compete with threat but can also amplify sensitivity to it, depending on the internal state of the animal.

      The social results are particularly interesting in this context as well. Dominant mice consistently prioritize avoidance over reward, showing stronger escape responses and slower habituation than subordinates. This behavior is well captured by the vigilance framework proposed by the authors: dominant animals appear to maintain higher vigilance, which biases decisions toward threat avoidance. The authors further suggest that stable social relationships sustain high vigilance and slow habituation, framing this as an evolutionarily conserved strategy that may enhance survival. This interpretation provides a valuable perspective on how social structure shapes defensive behavior beyond immediate physical interactions. At the same time, there are important limitations to this interpretation. All experiments were conducted in male mice, and it is possible that the relationship between social hierarchy, vigilance, and defensive behavior would differ substantially in females. In addition, the idea that stable social relationships maintain elevated vigilance does not straightforwardly align with broader views of social stability as protective for mental health and as a buffer against anxiety and stress. These points do not undermine the findings but suggest that the social effects described here should be interpreted with caution and within the specific context of the task and sex studied.

      We thank the reviewer for raising this important point. In the context of repeated looming exposure, slower habituation reflects more sustained vigilance over time. Compared to individually housed mice, group-housed mice exhibit slower habituation (Lenz et al., 2022), and pair-housed mice showed even slower habituation in our current work. Importantly, this pattern does not indicate that pair-housed mice have higher overall vigilance than individually housed animals. Although individually housed mice habituate more quickly, they display higher initial vigilance, as reflected by their increased probability of escaping in response to looming stimuli (Lenz et al., 2022). Thus, pair-housed mice exhibited reduced defensive responses compared to individually housed animals, consistent with a social buffering effect.

      Furthermore, in a separate study (Rank- and Threat-Dependent Social Modulation of Innate Defensive Behaviors; Li, Gao, Li, 2026, eLife 15:RP109571), we directly compared responses to looming stimuli when mice were tested alone versus in the presence of a social partner and observed clear evidence of social buffering.

      Another important limitation is that the neural mechanisms underlying these effects remain speculative. The manuscript includes an extensive discussion of candidate circuits, particularly involving the superior colliculus and downstream structures, but this section is necessarily based on prior literature rather than on data presented in the study. Given the complexity of the circuits involved in integrating internal state, reward, social context, and vigilance, the current work should be viewed as providing a strong behavioral and conceptual framework rather than direct insight into underlying neural mechanisms.

      We fully agree that the proposed neural mechanisms remain speculative and that the circuits involved in integrating internal state, reward, and social context are likely far more complex. We have revised the manuscript to acknowledge this limitation.

      Methodologically, the behavioral paradigm is well suited for studying escape decisions in socially housed animals, and the machine learning based classification of defensive responses is a clear strength. The computational model provides a useful formalization of how threat level, reward level, and vigilance interact and may be valuable for other laboratories studying escape, approach avoidance, or conflict situations, particularly as a way to classify behavioral outcomes after pose estimation. More generally, the work will be of interest to the neuroethology community for its detailed characterization of escape behavior under naturalistic conditions.

      Given the ethological nature of the study and the high inter individual variability reported by the authors, clarity and precision in the methods are especially important for reproducibility. While the revised manuscript addresses many earlier concerns, some aspects remain slightly difficult to follow. For example, the main text states that animals were not water deprived to avoid differences in internal state, whereas parts of the methods describe conditions in which animals were water deprived, suggesting that internal state manipulation may differ across experiments. Clearer separation and explanation of these conditions would further strengthen confidence in the work.

      To improve clarity, we have revised the Methods section to clearly distinguish between experimental conditions that involved water deprivation and those that did not.

      Overall, this study provides a rich and thoughtful analysis of how reward level and social hierarchy modulate defensive behavior through changes in vigilance. It offers a useful conceptual advance for thinking about escape behavior in naturalistic settings and lays a solid foundation for future work aimed at linking these behavioral states to underlying neural circuits.

      Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is quantifying how the presence of water or sucrose in water-deprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major in this process not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification on the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under high-contrast conditions (Figure 3E). However, even under the same high-contrast condition, reaction times were significantly shorter in the water condition compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.

      Upward-directed attention includes rearing, up-stretching, and upward head orientation, which will be clarified in the Method section. To address concerns about statistical validity, we will quantify these behaviors across the first 10 trials rather than limiting the analysis to the first two.

      As for the dominance-related results, we interpret them as reflecting both enhanced vigilance and reduced reward-seeking behavior. Time spent in the reward zone is not a measure of vigilance but an indicator of reward-seeking motivation. We will clarify this in the revised manuscript.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg: Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water-deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      In Figure 3B, the difference between water and sucrose conditions did not reach statistical significance (p = 0.08). We plan to collect additional data to determine whether this is due to limited statistical power. It is also possible that some behavioral readouts are more sensitive to the differences between water and sucrose conditions. For example, Figure 3F shows that escape speed was significantly higher in the sucrose than in the water condition under high-contrast stimulation.

      Thank you for pointing this out. To control for the potential confounds related to internal state, mice were not water-deprived under any of the three conditions in Figures 3A-3H. We will clarify this in the main text and Methods. For Figures 3I-3M, which compare decision-making under no-reward and water conditions, we will conduct additional experiments using non-deprived mice in the water condition.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 15 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.

      All definitions and additional details on behavioral quantification will be included in the revised Methods section.

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      We appreciate the comment and agree that further clarification is needed. We will provide a more detailed description of the model fitting procedure in the revised Methods section. Specifically, the drift rate parameter (r), which reflects the perceived reward value, was constrained to zero in the no-reward condition. To enable statistical comparison across conditions, we will report uncertainty measures for all fit parameters.

      Comments on the revised manuscript:

      The manuscript has been revised and improved significantly by the addition of methodological details and new analysis. I remain, however, unconvinced by the argument that increased vigilance in the presence of reward leads to heightened escape behaviour.

      In response to my criticism that the work does not measure vigilance directly, the authors have included measures of foraging interval and foraging speed, which they state are "two direct behavioral analyses of vigilance". I disagree - like reaction time, foraging speed and foraging interval can be modulated, for example, by changes in threat sensitivity. Increased threat sensitivity comes with diverse behavioral changes that may well include increased vigilance, but foraging interval and foraging speed can certainly change without the animal expressing increased vigilance behaviors. A bigger issue I still have though, is with the conclusion that the presence of reward increases "direct escape behaviors". Comparing the no reward, water and sucrose groups indeed shows a difference (which is now clear after the split into early and late phases), but the issue is that these are different mice. As the text is written, is sounds like introducing reward will acutely increase escape. But if we look at the raw data show in Figure 2C, what I think is happening is that the presence of reward is decreasing habituation to the stimulus. The data for trials 1 and 10 in the three conditions show this - there is habituation with no reward (reaction times are all shifting to the right), a bit less with water and very little with sucrose. This is interesting in its own right and we can speculate why it might be happening, but I think this is conceptually different from what the authors are proposing.

      We agree that vigilance is not directly observable as a single variable. Our intent was not to claim that foraging speed and foraging interval provide a direct measure of vigilance, but rather to suggest that they may serve as indirect behavioral correlates.

      We also considered an alternative interpretation: these two measures could reflect perceived reward value under high-threat conditions across distinct reward types. If that were the case, animals would be expected to exhibit shorter intervals and faster speeds across no reward, water, and sucrose conditions. However, our data do not support this interpretation (Figures 3L and 3M), suggesting that these measures are more likely correlated with vigilance. 

      Furthermore, it is unlikely that changes in foraging interval and speed are driven by altered threat sensitivity, as animals could not see the threat during most of the foraging bout and only encountered it at the end.

      Regarding the conclusion that the presence of reward increases direct escape behaviors, our interpretation is that increased reward value reduces habituation, thereby maintaining higher vigilance during the late phase. This was discussed in the second-to-last paragraph of the "Economic and social modulations of innate decision-making under threat" subsection in the Discussion.

      Reviewer #3 (Public review):

      Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually, using an elegant automated tunnel (see videos for clarity).

      The additional changes made to the paper clarify the work done. While there are some limitations (male mice, weird stimulus), the general results are interesting and a valuable addition to the experimental literature. The main claim of the paper is that the different rewards (none, water, sucrose) did not change the escape properties early in learning, but did late, particularly that in the late (already experienced) conditions, reward value (assuming sucrose > water > no reward) interacted with the salience of the looming stimulus (light gray, dark gray). (Panels 3D, 3G, 3K, 3N).

      For readers, I want to note that one of the most interesting results is actually in Figure S2, where they find that a looming stimulus behind the mouse still makes a mouse run to the nest. In these conditions, the mouse runs past the looming stimulus to get to safety! (I also do love the video of the mouse running around the barriers like a snake to get home.)

      I have a few minor clarification questions and a few notes that I think would be useful additions for authors and readers to think about.

      Dominance: What does the mouse social science literature say about the "test tube" test? What can we conclude from this test? This would be useful when trying to understand what is causing the dominance/submissive difference in responses. Figure 4 shows that the dominant mice are more risk-averse than the submissive mice. Is "dominance" in the test-tube actually a measure of risk-seeking? Is the issue that the submissive mice don't think they can get back to the food-site easily, so they are less willing to sacrifice the current (if dangerous) foraging opportunity? Is the issue that the submissive mice can't get back to the nest? As I understand it, the nest was always available to all the mice, so I suspect inability to get to the nest is an unlikely hypotheses. Is the issue that the submissive mice also don't feel safe in the nest?

      The tube test is a widely used assay in the rodent social behavior literature to assess dominance hierarchies, operationally defined by the ability of one animal to force its opponent to retreat from a narrow tube. Importantly, this assay does not directly measure risk-seeking or anxiety-related traits, but rather competitive outcomes during social conflict. Furthermore, our data indicate that the behavioral responses of subordinate mice to looming stimuli are primarily driven by the visual threat itself rather than by social avoidance. This point was elaborated in the second paragraph of the “Social modulation of innate decision-making” subsection in the Results section.

      Limitations of the study: There is an acknowledged limitation to male mice, and the limitations of the small data sets that are typical of such experiments. In addition, however, it is also worth noting the strangeness of the looming stimulus, which is revealed clearly in the videos. The stimulus is a repeating growing circle, growing in a single location within the environment. The stimulus repeats 10 times, once per second. This is not what an attacking hawk or owl would look like. (I now have this image of an owl diving down, and then teleporting up and diving down again.) Note - I am fine with this stimulus. It produces an interesting experiment and interesting results. I do not think the authors need to change anything in their paper, but readers need to recognize that this is not a "looming predator".

      These "limitations" are better seen as "caveats" when folding these results in with the rest of the literature that has gone before and the literature to come. (Generally, I do not believe that science works by studies making discoveries that change how we think about problems - instead, science works by studies adding to the literature that we integrate in with the rest of the literature.) Thus, these caveats should not be taken as problems with the study or as fixes that need to be done. Instead, they are notes for future researchers to notice if differences are found in any future studies.

      Thus, my only suggestion is that I think authors could write a more careful paper by using the past and subjunctive tense appropriately. Experimental observations should be in past tense, as in "the influence of reward was context-dependent and emerged in the late phase" instead of "the influence of reward is context-dependent and emerges in the late phase" - it emerged in the late phase this once - it might not in future experiments, not due to any fault in this experiment nor due to replicability problems, but rather due to unexpected differences between this and those future experiments. At which point, it will be up to those future experiments to determine the difference. Similarly, large conclusions should be in the subjunctive tense, as in "these data suggest that threat intensity is likely to be the primary determinant of decision making" rather than "threat intensity is the primary determinant of decision making", because those are hypotheses not facts.

      We thank the reviewer for the helpful suggestions and have revised the Abstract accordingly.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how mice make defensive decisions when exposed to visual threats and how those decisions are influenced by reward value and social hierarchy. Using a naturalistic foraging setup and looming stimuli, the authors show that higher threat leads to faster escape, while lower threat allows mice to weigh reward value. Dominant mice behave more cautiously, showing higher vigilance. The behavioral findings are further supported by a computational model aimed at capturing how different factors shape decisions.

      Strengths:

      (1) The behavioral paradigm is well-designed and ethologically relevant, capturing instinctive responses in a controlled setting.

      (2) The paper addresses an important question: how defensive behaviors are influenced by social and value-based factors.

      (3) The classification of behavioral responses using machine learning is a solid methodological choice that improves reproducibility.

      Weaknesses:

      (1) Key parts of the methods are hard to follow, especially how trials are selected and whether learning across trials is fully controlled for. For example, it is unclear whether animals are in the nest during the looming stimulus presentations. The main text and methods should clarify whether multiple mice are in the nest simultaneously and whether only one mouse is in the arena during looming exposure. From the description, it seems that all mice may be freely exploring during some phases, but only one is allowed in the arena at a time during stimulus presentation. This point is important for understanding the social context and potential interactions, and should be clearly explained in both the main text and methods.

      We agree that these details are essential and have clarified them in the Methods. When the door system operated normally, only one mouse was allowed in the arena during looming exposure. Specifically, when all mice were in the nest, the nest-tunnel door was open and the tunnel-arena door was closed. Once a single mouse entered the tunnel, as detected by an OpenMV camera, the nest-tunnel door closed and the tunnel-arena opened, ensuring that only that mouse could enter the arena.

      Habituation was conducted over two days. On day 1, five mice were placed together in the nest for 30 minutes with all doors closed. Each mouse was then placed individually in the nest and allowed to freely explore the arena for 10 minutes under normal door operation. Finally, all mice were returned to the nest with all doors open and allowed for free exploration for 2 hours. On day 2, each mouse was placed individually in the nest and given an additional 1 hour of exploration under normal door operation.

      (2) It is often unclear whether the data shown (especially in the main summary figures) come from the first trial or are averages across several exposures. When is the cut-off for trials of each animal? How do we know how many trial presentations were considered, and how learning at different rates between individuals is taken into account when plotting all animals together? This is important because the looming stimulus is learned to be harmless very quickly, so the trial number strongly affects interpretation.

      We observed substantial inter-individual variability in habituation to looming stimuli, with a sharp decline in defensive responses over the first few trials followed by more gradual changes. To account for this, we segmented trials for each animal into two phases: an early rapidhabituation phase and a later stable phase. Analyzing these phases separately revealed that threat intensity dominates behavior in the early phase, whereas both threat and reward significantly influence behavior in the late phase. These results are now presented in revised Figures 2 and 3. Analyses restricted to first trials are included in Figure S5.

      (3) The reward-related effects are difficult to interpret without a clearer separation of learning vs first responses.

      As noted above, we have re-analyzed our data to account for learning effects.

      (4) The model reproduces observed patterns but adds limited explanatory or predictive power. It does not integrate major findings like social hierarchy. Its impact would be greatly improved if the authors used it to predict outcomes under novel or intermediate conditions.

      We have substantially revised the modeling analysis. The model is now fitted to behavioral data from the late phase and used to predict outcomes across additional conditions, including the early phase behavior and rank-dependent behavioral differences. The model successfully captures behavioral patterns across these conditions, supporting its predictive value beyond descriptive fitting.

      (5) Some conclusions (e.g., about vigilance increasing with reward) are counterintuitive and need stronger support or alternative explanations. Regarding the interpretation of social differences in area coverage, it's also possible that the observed behavioral differences reflect access to the nesting space. Dominant mice may control the nest, forcing subordinates to remain in the open arena even during or after looming stimuli. In this case, subordinates may be choosing between the threat of the dominant mouse and the external visual threat. The current data do not distinguish between these possibilities, and the authors do not provide evidence to support one interpretation over the other. Including this alternative explanation or providing data that addresses it would strengthen the conclusions.

      To support the interpretation of increased vigilance with reward under high-threat conditions, we analyzed additional behavioral measures beyond latency to flee. Rewarded mice showed longer foraging interval and slower foraging speed, both consistent with elevated vigilance (Figures 3L and 3M).

      To address the alternative explanation that subordinate mice may remain in the arena due to restricted nest access, we compared arena occupancy before, during, and after looming exposure. Although subordinates spent more time in the arena before looming, this difference disappeared during and after looming exposure (Figures 4C). Moreover, dominant and subordinate mice were

      equally likely to flee to the nest during escape trials. These findings rule out nest access restrictions as an explanation for the observed rank-dependent differences in defensive behaviors.

      (6) While potential neural circuits are mentioned in the discussion, an earlier introduction of candidate brain regions and their relevance to threat and value processing would help ground the study in existing systems neuroscience.

      We have revised the Introduction to incorporate relevant brain regions and neural circuits.

      (7) Some figures are difficult to interpret without clearer trial/mouse labeling, and a few claims in the text are stronger than what the data fully support. Figure 3H is done for low contrast, but the interesting findings will be to do this experiment with high contrast. Figure 4H - I don't understand this part. If the amount of time in the center after the loom changes for subordinate mice, how does this lead to the conclusion that they spend most of their time in the reward zone?. Figure 3A - The example shown does not seem representative of the claim that high contrast stimuli are more likely to trigger escape. In particular, the 10% sucrose condition appears to show more arena visits under low contrast than high contrast, which seems to contradict that interpretation. Also, the plot currently uses trials on the Y-axis, but it would be more informative to show one line per animal, using only the first trial for each. This would help separate initial threat responses from learning effects and clarify individual variability.

      We have substantially revised the figures. Results from trial segmentation based on individual habituation are now explicitly presented in Figures 2 and 3, and analyses using only the first trials are provided in Figure S5 to separate initial responses from learning effects.

      Regarding the original Figure 4H, we are not entirely certain about the concern. In this panel, we measured time spent in the reward zone, which is defined as the region within 10 cm of the reward port at the end of the arena, not the center of the arena, during looming exposure. Subordinate mice spent significantly more time in the reward zone than dominant mice. We have further clarified this in the revised manuscript.

      (8) The analysis does not explore individual variability in behavior, which could be an important source of structure in the data. Without this, it is difficult to know whether social hierarchy alone explains behavioral differences or if other stable traits (e.g., anxiety level, prior experiences) also contribute.

      We observed substantial individual variability in both dominant and subordinate mice, even on the first trial (Figure S7). Paired dominant–subordinate comparisons were used to isolate rankdependent effects.

      (9) The study shows robust looming responses in group-housed animals, which contrasts with other studies that often require single housing to elicit reliable defensive responses. It would be valuable for the authors to discuss why their results differ in this regard and whether housing conditions might interact with social rank or habituation.

      Robust looming-evoked defensive responses have been reported in both group- and singlehoused mice (Yilmaz and Meister, 2013, Lenzi et al., 2022), although single-housed mice habituate more rapidly. We have now discussed the potential interactions between housing conditions, social rank, and habituation in defensive behaviors in the revised manuscript.

      Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is to quantify how the presence of water or sucrose in waterdeprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major role in this process is not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification of the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under highcontrast conditions. However, even under the same high-contrast condition, reaction times were significantly shorter in the reward conditions compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.

      Regarding the measurement of vigilance, in addition to the latency to flee, we analyzed two additional behavioral measures related to vigilance. First, we examined the foraging interval. Our hypothesis was that more vigilant animals would wait longer before re-entering the reward zone following threat exposure. Consistent with this prediction, mice under sucrose and water reward conditions showed significantly longer foraging intervals than those under no-reward conditions (Figure 3L). Second, we analyzed the foraging speed as mice approached the reward. Increased vigilance should lead to more cautious and therefore slower movements. Our results support this, as mice moved more slowly towards the reward under sucrose conditions (Figure 3M). Taken together, these three measures consistently indicate that mice exhibit increased vigilance under sucrose reward in high-threat conditions.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg, Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      Our new analysis, which segments behavior into an early adaptive phase and a late stable phase, reveals a statistically significant difference between water and sucrose rewards in the late phase (Figure 3H), supporting a graded effect of reward value.

      To control for the potential confounds related to internal state, mice were not water-deprived in all reward conditions. We have clarified this in the revised manuscript.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 10 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.

      All definitions and additional details on behavioral quantification have been included in the revised Methods section.

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      We have provided a detailed description of the model fitting procedure in the revised Methods section. Specifically, the reward-value parameter (r) was constrained to zero in the no-reward condition. We have plotted how the overall loss varies with differeent parameters (Figure S9).

      Reviewer #3 (Public review):

      Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually. Drift-diffusion modeling found that reward-level interacted with threat level such that at low-threat levels, reward contrasted with threat as classically expected (high reward overwhelms low threat, low threat overwhelms low reward), but that reward aligned with threat at higher threat levels.

      Note that they define threat level by the darkness of the looming stimulus. I am not sure that darker stimuli are more threatening to mice. But maybe. Figure 3 shows that mice react more quickly to high contrast looming stimuli, but can the authors distinguish between the ability to detect the visual signal from considering it a more dangerous threat? (The fact that vigilance makes a difference in the high contrast condition, not the low contrast condition, actually supports the author's hypotheses here.)

      Regarding the interpretation of stimulus contrast as a proxy for threat level, we agree it is crucial to distinguish improved detection from heightened threat perception. To address this, we examined not only latency to flee but also escape distance and peak escape speed, two measures that reflect the intensity of the defensive response. If contrast only influenced detection, we would expect differences in latency but not in escape distance or speed. All three measures differed significantly across contrast conditions, supporting the interpretation that high-contrast stimuli are perceived as more threatening rather than simply more detectable. Furthermore, manual review of "no response" trials confirmed reliable detection in both conditions, with only three potential "missed" trials out of 117 under low contrast (Figure S3B). We have included this discussion in the revised manuscript.

      The drift-diffusion model (DDM) is fine. I note that the authors included a "leakage rate", which is not a standard DDM parameter (although I like including it). I would have liked to see more about the parameters. What were the distributions? What did the parameters correlate with behaviorally? I would have liked to see distributions of the parameters under the different conditions and different animals. Figure 2C shows the progression of learning. How do the fit parameters change over time as mice shift from choice to choice? How do the parameters change over mice? How do the parameters change over distance to the threat/distance to safety (as per Fanselow and Lester 1988)? They did a supplemental experiment where the threat arrived halfway along the corridor - we could get a lot more detail about that experiment - how did it change the modeling?

      Because our model is fit to the variance of latency distributions, it cannot be applied to singletrial data. Instead, we analyzed how decisions and latencies vary as functions of the fitted threat gain and reward value parameters (Figures 5G and 5H). We have also introduced a simplified deterministic model to further elucidate the decision-making process.

      Regarding the influence of distance to the threat, we conducted additional experiments, presenting the looming stimulus at the end of the arena when the mouse was at different distances from it (Figures S2C–G). We found that as the prey-threat distance increased, mice showed less direct escape behavior, with longer latencies to flee and slower escape speeds. This is consistent with the predatory imminence continuum theory (Fanselow and Lester, 1988), which describes graded defensive behaviors tuned to perceived threat level.

      Regarding the influence of distance to safety, our data indicate that it did not significantly affect defensive responses (Figures S2H and S2I). To test this further, we introduced barriers that lengthened the return path to the safe zone. We found that defensive decisions were not correlated with the distance to the safe zone (Figures S2J and S2K), suggesting that once a threat is detected, animals prioritize escape initiation over evaluating the exact path to safety.

      Overall, this is a reasonable study showing mostly unsurprising results. I think the authors could do more to connect the vigilance question to their results (which seems somewhat new to me).

      We have expanded our analysis of vigilance. In addition to escape latency, we examined the foraging interval and foraging speed. We hypothesized that more vigilant animals would wait longer before re-entering the reward zone following a threat and would approach the reward more slowly. Consistent with this prediction, mice in the sucrose- and water-reward conditions exhibited significantly longer foraging intervals and slower foraging speeds compared to those in the no-reward condition (Figures 3M and 3N). Together, these three measures consistently demonstrate that mice display heightened vigilance under high-threat, high-reward conditions.

      Although the data appear generally fine and the modeling reasonable, the authors do not do the necessary work to set themselves within the extensive literature on decision-making in mice retreating from threats.

      First of all, this is not a new paradigm; variants of this paradigm have been used since at least the 1980s. There is an *extensive* literature on this, including extensive theoretical work on the relation of fear and other motivational factors. I recommend starting with the classic Fanselow and Lester 1988 paper (which they cite, but only in passing), and the reviews by Dean Mobbs and Jeansok Kim, and by Denis Paré and Greg Quirk, which have explicit theoretical proposals that the authors can compare their results to. I would also recommend that the authors look into the "active avoidance" literature. Moreover, to talk about a mouse running from a looming stimulus without addressing the other "flee the predator" tasks is to miss a huge space for understanding their results. Again, I would start with the reviews above, but also strongly urge the authors to look at the Robogator task (work by June-Seek Choi and Jeansok Kim, work by Denis Paré, and others).

      Similarly, in their anatomical review, they do not mention the amygdala. Given the extensive literature on the role of the amygdala in retreating from danger, both in terms of active avoidance and in terms of encoding the danger itself, it would surprise me greatly if this behavior does not involve amygdala processing. (If there is evidence that the amygdala does not play a role here, but that the superior colliculus does, then that would be a *very* important result that needs to be folded into our understanding of decision-making systems and neural computational processing.)

      Second, there is an extensive economic literature on non-human animals in general and on rodents in particular. Again, the authors seem unaware of this work, which would provide them with important data and theories to broaden the impact of their results (by placing them within the literature). First, there are explicit economic literatures in terms of positively-valenced conflicts (e.g., neuroeconomics within the primate literature, sequential foraging and delaydiscounting tasks within the rodent literature), but also there is a long history within the rodent conditioning world, such as the classic work by Len Green and Peter Shizgal. I would strongly urge the authors to explore the motivational conflict literature by people like Gavin McNally, Greg Quirk, and Mark Andermann. Again, putting their results into this literature will increase the impact of their experiment and modeling.

      We have substantially revised the manuscript to contextualize our findings within the extensive literature on defensive behavior and decision-making. The revised Introduction and Discussion now integrate key theoretical frameworks, such as the predatory imminence continuum, and cite relevant work on active avoidance and other "flee the predator" paradigms (e.g., the Robogator task).

      We have also incorporated perspectives from neuroeconomics and motivational conflict, including literature on sequential foraging, delay-discounting tasks, and relevant rodent studies. Furthermore, we now discuss the potential contributions of specific brain regions, including the superior colliculus and the amygdala, to the economic and social modulation of innate defensive decisions in response to visual threats.

      Recommendations for the authors:

      Reviewing Editor Comments:

      These additional recommendations are generally consistent and overlapping across reviewers, particularly Reviewer #1 and 2, so it is advisable to undertake these changes/additions.

      Reviewer #1 (Recommendations for the authors):

      (1) Experimental methods and trial structure need clarification: It is often unclear how many trials were included per condition, per mouse, and whether the key behavioral effects (especially reward-related changes) were observed early in the session or after repeated stimulus exposure. For example, in several reward-related plots (e.g., Figure 3), it is not specified whether results are driven by early or later trials. Since the authors themselves report rapid learning of the looming stimulus (habituation), it is critical to state how many trials were included in each comparison, and to analyze whether effects hold on the first exposure and not the rest. Otherwise, conclusions about value-based behavior are hard to separate from learning effects, which may also differ between individuals. Specifically, the methods section is vague and hard to follow.

      We have substantially expanded the Methods section with additional details to improve clarity.

      To account for individual variability in habituation to the looming stimulus, we segmented trials for each animal into early and late phases. We demonstrate that threat level is the dominant factor driving behavioral responses in the early phase, while both threat level and reward condition shape behavior in the late phase. We have substantially revised Figures 2 and 3 to reflect these changes.

      (2) Add a summary of experimental design: A table or schematic summarizing the trial structure, experimental groups, reward/threat conditions, and the timeline of exposures would greatly improve clarity.

      We have added a schematic to Figure 2 summarizing the trial structure, experimental groups, reward and threat conditions, and the overall timeline.

      (3) Replot key results using only the first trial per mouse: This would allow readers to assess the first (not learned) responses and help control for habituation/suppression.

      We have replotted behavioral results using only the first trial from each mouse and included these analyses in Figure S5. These results confirm that threat level is the dominant factor driving the initial response to looming stimuli.

      (4) The model needs stronger justification and predictive value: As it stands, the model primarily fits the existing data and does not offer new insights beyond what is already evident from the behavioral results.

      Important findings, such as social hierarchy effects and habituation dynamics, are not captured in the model, reducing its relevance to the full dataset.

      The drift-diffusion framework is widely used, and in this implementation appears to have been adjusted post hoc to fit the observed data rather than generating new conceptual advances. No comparison with simpler models is included. Without testing simpler or alternative models, it is not clear whether the added complexity is necessary or justified.

      Use the model to generate and test predictions: to increase the model's contribution, the authors could simulate new conditions. Suggested experiments include:

      a) Predicting escape probability and latency at intermediate threat intensities to test whether behavior shifts gradually or abruptly.

      b) Using the model's habituation parameters to predict changes in escape behavior over repeated exposures.

      c) Adjusting vigilance or threat gain parameters to simulate dominant versus subordinate animals, and comparing model predictions to actual behavioral differences based on social rank.

      We have substantially revised the modeling section to address these concerns. The updated model is now fitted to behavioral data from the late phase of the reward–threat experiments and used to generate predictions for the early phase and for rank-dependent behavioral differences.

      The model accurately captures behavioral patterns across these conditions, demonstrating predictive power beyond descriptive fitting. Accordingly, we have removed the habituation component. Furthermore, we have introduced a simplified deterministic model in the revised manuscript to further understand the decision-making process.

      (5) Clarify housing and arena access conditions: It is unclear from the text whether all mice are in the nest during looming presentations and whether only one mouse is in the arena during the stimulus. This is important for understanding the social context of each trial and should be explained in the main text and methods.

      We have clarified this point in the Methods section. Under normal door operation, only one mouse was allowed in the arena during looming exposure. Specifically, when all mice were in the nest, the nest-tunnel door was open and the tunnel-arena door was closed. Once a single mouse entered the tunnel, as detected by an OpenMV camera, the nest-tunnel door closed and the tunnel-arena opened, ensuring that only that mouse could enter the arena.

      (6) Alternative interpretation of subordinate behavior: differences in area coverage and time in the reward zone may not reflect reduced vigilance, but rather avoidance of dominant mice. Subordinates may remain in the open arena to avoid conflict. The authors do not provide evidence distinguishing between these interpretations, and this should be addressed.

      To address the alternative explanation that subordinate mice may remain in the arena due to restricted nest access, we compared arena occupancy before, during, and after looming exposure (Figure 4C). Before looming exposure, subordinate mice spent significantly more time in the arena, consistent with the idea that they may perceive a social threat from the dominant mouse in the absence of any external threat. However, this difference disappeared during and after looming exposure. This shift suggests that the presence of an external threat alters the social dynamic, reducing the influence of dominance on nest access.

      To further assess whether dominant mice blocked subordinate access to the nest during threatdriven escapes, we analyzed the fraction of escape trials in which mice returned to the nest (Figure 4D). We found no significant difference between dominant and subordinate mice, indicating that dominant mice did not restrict nest access during these trials. Importantly, rank differences in reward-zone occupancy cannot be explained by nest exclusion, as mice do not need to return to the nest when escaping the threat—they can flee directly to the safe zone. Thus, nest access limitations do not account for the observed rank-dependent patterns.

      We agree with the reviewer that reward-zone occupancy should not be interpreted as reduced vigilance in subordinate mice; instead, it likely reflects higher perceived reward value. The manuscript has been revised accordingly.

      (7) Address why robust looming responses were observed in group-housed mice: previous studies often require single housing to elicit strong defensive responses. The authors should explain why their setup yields robust results in group-housed animals and whether housing conditions may interact with dominance or habituation.

      Looming exposure elicits robust defensive behaviors in both group- and single-housed mice (Yilmaz and Meister, 2013, Lenzi et al., 2022), with single-housed animals habituating more quickly to the stimulus (Lenzi et al., 2022). We have now discussed how housing conditions may interact with social rank and habituation to shape defensive behaviors in the revised manuscript.

      For the social-rank experiments, we intentionally co-housed dominant and subordinate mice to maintain a stable hierarchy. This choice was motivated by two considerations. First, our goal was to investigate how social rank modulates defensive responses under ethologically relevant conditions, where mice naturally live in groups. Single housing would remove this social context. Second, singly housing mice can destabilize or eliminate rank relationships, making it difficult to interpret rank-dependent behavioral differences.

      (8) Add analysis of individual variability: trial-by-trial variability or stable behavioral tendencies in individual animals are not explored. This could explain part of the variation currently attributed to social rank.

      We have analyzed individual variability in both dominant and subordinate mice. We observed substantial variability across all behavioral measurements for each group (Figure S7). To attribute the observed behavioral differences to social hierarchy rather than to other individual traits, we conducted paired comparisons between dominant and subordinate mice (Figure 4).

      (9)  Improve figure labeling and readability: some plots are ambiguous in terms of whether rows represent trials or animals. Overlapping points obscure the data in several figures, for example, Figure 3H, sucrose is n=4?- consider using jittered scatter plots, boxplots, or individual traces to improve clarity. Also same Figure axis Y is missing an 'e'.

      We have revised figures to improve clarity and corrected the typos.

      (10) Avoid overinterpretation of causal explanations: Statements such as "reward increases vigilance due to evolutionary pressure" or that "subordinates are less vigilant" go beyond what the current data can demonstrate and should be rephrased more cautiously.

      We have revised the manuscript to tone down the statement.

      Reviewer #2 (Recommendations for the authors):

      (1) Provide much more extensive methodological details on analyses and model fitting

      We have thoroughly revised the Methods section to provide extensive detail on both behavioral analyses and computational modeling, as outlined in our responses to points (3) and (4) of the Public Review.

      (2) Perform experiments or analyses that directly measure vigilance, if vigilance is to remain as a key explanation for the data.

      As detailed in our response to point (1) of the Public Review, we have supplemented the escape latency measure with two direct behavioral analyses of vigilance: foraging interval and foraging speed. This multi-metric approach robustly supports the interpretation of heightened vigilance.

      (3) Provide extra evidence for an effect of reward value, as opposed to the presence or absence of reward. Control for differences arising from the water deprivation state by performing the no reward condition experiments in water-deprived mice.

      All behavioral data in the reward–threat experiment were collected on normal (non-deprived) mice (Figures 2 and 3), which have been clarified in the revised manuscript. We have reanalyzed the data by segmenting trials into early and late phases for each animal. In the late phase, under low-threat conditions, the effect of reward value is reflected in significant differences between water and sucrose in terms of escape distance and time spent in the reward zone (Figures 3I and 3J). Under high-threat conditions, the reward value effect is reflected in significant differences in latency to flee and peak escape speed (Figures 3K and 3N).

      (4)  Using drift rate to describe the "r" variable is confusing because the drift rate of the drift diffusion process is also determined by terms alpha, beta, and h-terms.

      We have termed “r” as the reward value in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I would tone down some of the extreme statements about the problems of previous experiments (such as that most decision-making is on 2AFC). Lots of people do decision-making in serial foraging, fleeing, and other behavioral tasks. The classic Morris water-maze or Barnesmaze are decision-making tasks that aren't 2AFC. Serial foraging tasks, such as the Restaurant Row task aren't 2AFC. And, actually, lots of mouse behavior tasks are deciding when to stop on a treadmill for a reward. And, for that matter, your task isn't all that "realistic" - mice aren't evolved to flee looming disks, they are evolved to flee hawks and owls. This doesn't invalidate your task at all. I just recommend making it about your work in a positive way rather than others in a negative way.

      We have revised the manuscript to adopt a more positive framing of our work.

      (2) I also don't think there's much use in bringing in crayfish in a mouse task. Spend your time connecting to the other rodent data (mice and rats) instead.

      We agree and have revised the manuscript accordingly, focusing our discussion on relevant rodent literature to provide a more appropriate context for our findings.

      Minor concerns:

      (1) The authors use the term "cognitive control" without making clear what they mean. In general, the authors seem to have a view on decision-making as either being "reflexes" or "cognitive control". This is a very outdated perspective. Modern perspectives include multiple decision-making systems competing, separating these based on their computational properties, such as planning, procedural, instinctual, and, yes, reflexive. Current views on the kinds of behaviors they are discussing generally see fleeing as a transition from reflexive (tonic immobility, freezing) and instinctual responses (freezing, fleeing) to deliberative (anxiety) and procedural (habit). The authors might take a look at the recent Calvin and Redish (2025) paper for some ideas on this.

      We appreciate the reviewer’s insight regarding the term “cognitive control.” In our study, we used this term to emphasize that defensive responses to looming threats are not purely reflexive. Mice exhibit four distinct types of defensive decisions within a short time window, and these decisions are systematically modulated by reward value and social rank. Notably, reward modulation is bidirectional: high reward suppresses defensive responses under low-threat conditions but enhances them under high-threat conditions, indicating that animals integrate multiple sources of information rather than relying solely on instinctive mechanisms.

      We did not observe mid-trajectory aborts in mice, as reported in rats by Calvin & Redish (2025). This difference may reflect species-specific behavior or the nature of the threat: our looming stimulus is purely visual and non-harmful, whereas the robotic predator in their study presents a physical threat. We have revised the Discussion to clarify our use of “cognitive control” and to incorporate these perspectives.

      (2) Only male mice were used. This limits the conclusions that can be drawn.

      We acknowledge the limitation of using only male mice and have discussed this limitation in the revised manuscript.

      (3) Did the authors observe darting behavior? (Gruene...Shansky 2015).

      We did not observe darting behavior, characterized by rapid movement, as reported during inescapable fear conditioning. In our experiment, the mice consistently escaped towards the nest, in most trials, ran directly to the nest without stopping. Occasionally, under low contrast conditions, mice paused once or twice but never moved towards the reward.

      (4) How was only one mouse allowed into the linear arena at a time?

      When all mice were in the nest, the nest-tunnel door was open while the tunnel-arena door remained closed. When a single mouse entered the tunnel, as detected by the RFID and OpenMV camera system, the nest-tunnel door closed and the tunnel-arena door opened, allowing only that mouse to enter the arena. We have clarified this protocol in the Methods section.

      (5) I would like to see more extensive analyses of the animal's responses as a function of distance to the threat (as per Fanselow and Lester 1988).

      As detailed in our response to the public review, we conducted new experiments analyzing behavior as a function of prey–threat distance. The finding that defensive responsiveness decreases with increasing prey–threat distance is now presented in Figures S2C–G and discussed in the context of the predatory imminence continuum.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to assess the variability in the expression of surface protein multigene families between amastigote and trypomastigote Trypanosoma cruzi, as well as between individuals within each population. The analysis presented shows higher expression of multigene family transcripts in trypomastigotes compared to amastigotes and that there is variation in which copies are expressed between individual parasites. Notably, they find no clear subpopulations expressing previously characterised trans-sialidase groups. The mapping accuracy to these multicopy genes requires demonstration to confirm this, and the analysis could be extended further to probe the features of the top expressed genes and the other multigene families also identified as variable.

      Strengths:

      The authors successfully process methanol-fixed parasites with the 10x Genomics platform. This approach is valuable for other studies where using live parasites for these methods is logistically challenging.

      Weaknesses:

      The authors describe a single experiment, which lacks controls or complementation with other approaches and the investigation is limited to the trans-sialidase transcripts.

      It would be more convincing to show either bioinformatically or by carrying out a controlled experiment, that the sequencing generated has been mapped accurately to different members of multigene families to distinguish their expression. If mapping to the multigene families is inaccurate, this will impact the transcript counts and downstream analysis.

      We thank the reviewer for raising these important points.

      We agree that the analysis of multigene families at the single-cell level is an important question, particularly given the heterogeneity observed across several of them. However, the aim of this short report is not to provide a comprehensive analysis of the entire experiment, but rather to focus on what we consider an important biological phenomenon observed in TcTS genes.

      Regarding the mapping accuracy of the reads, we acknowledge that this can limit the disambiguation of highly similar multicopy transcripts. This is, in fact, a common challenge when analyzing transcriptomic data from T. cruzi.

      To address this issue, we analyzed the sequence identity of the 3′ ends of TcS transcripts (defined as the 3′UTR plus 20% of the CDS region). As shown in Author response image 1, these regions display a median sequence identity of approximately 25%, indicating that sufficient sequence divergence exists for mapping algorithms to use during read assignment.

      In addition, it is important to note that kallisto, the software used in our analysis, was specifically designed to address multimapping reads through pseudoalignment combined with an expectation-maximization algorithm that probabilistically assigns reads across compatible transcripts.

      To directly assess performance, we simulated reads from the T. cruzi transcriptome used in this study (3′UTRs plus 20% of the CDS regions) and compared two mapping/counting strategies: (a) transcriptome pseudoalignment using kallisto, and (b) genome alignment followed by counting using STAR + featureCounts. The latter approximates the strategy implemented in CellRanger, the standard pipeline for quantifying expression levels from 10X Genomics single cell RNA-seq data. We found that kallisto recovered the simulated “true” counts with substantially higher accuracy than STAR + featureCounts (Pearson correlation: all genes, 0.991 vs 0.595; surface protein genes, 0.9996 vs 0.827; trans-sialidase (TcS) genes, 0.9998 vs 0.773). These results indicate that pseudoalignment is currently the optimal strategy for recovering the relative expression of highly similar gene family members (Author response image 1 C).

      Author response image 1

      (A) Distribution of pairwise sequence identity values calculated among the 3′-end regions of all transcripts (defined as the 3′UTR plus 20% of the coding sequence). (B) Distribution of read mapping coordinates over all multigene family transcripts normalized as percentage of the gene length (C) Scatter plots showing the correlation between estimated transcript counts obtained using kallisto (red) and STAR + featureCounts (grey) versus the corresponding simulated ground-truth values.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a valuable single-cell RNA-seq study on Trypanosoma cruzi, an important human parasite. It investigates the expression heterogeneity of surface proteins, particularly those from the trans-sialidase-like (TcS) superfamily, within amastigote and trypomastigote populations. The findings suggest a previously underappreciated level of diversity in TcS expression, which could have implications for understanding parasite-host interactions and immune evasion strategies. The use of single-cell approaches to delve into population heterogeneity is strong. However, the study does have some limitations that need to be addressed.

      The focus on single-cell transcriptional heterogeneity in surface proteins, especially the TcS family, in T. cruzi is novel. Given the important role of these proteins in parasite biology and host interaction, the findings have potential significance.

      Strengths:

      The key finding of heterogeneous TcS expression in trypomastigotes is well-supported. The analysis comparing multigene families, single-copy genes, and ribosomal proteins highlights the unusual nature of the variation in surface protein-coding genes.

      Weaknesses:

      While the manuscript identifies TcS heterogeneity, the functional implications of the different expression profiles remain speculative. The authors state it may reflect differences in infectivity, but no direct experimental evidence supports this.

      The manuscript lacks any functional validation of the single-cell findings. For instance, do the trypomastigote subpopulations identified based on TcS expression exhibit differences in infectivity, host cell tropism, or immune evasion? Such experiments would greatly strengthen the study.

      We thank the reviewer for their careful reading of the manuscript. We agree that obtaining experimental evidence on the influence of multiple multigene families would represent a significant advancement in the field. However, we would like to emphasize that this study is presented as a short communication centered on a specific and biologically relevant observation within a single multigene family. The aim of the manuscript is to highlight what we consider an important biological phenomenon that raises hypotheses to be tested in future work.

      The influence of phenotypic heterogeneity and its possible advantages under environmental pressures has been previously proposed for Trypanosoma cruzi, related trypanosomatids, and other biological systems, ranging from bacteria to tumors (Seco-Hidalgo 2015, doi: 10.1098/rsob.150190 and Luzak 2021, doi: 10.1146/annurev-micro-040821-012953, for a comprehensive review on this topic). While the reviewer is correct in noting that our model does not demonstrate a functional role for TcTS heterogeneity, the experimental approaches required to address this question in a large multigene family are highly complex. This is particularly challenging in T. cruzi, where the study of multigene families is limited by the restricted set of available molecular biology tools (such as RNAi). Therefore, further experimental validation of these observations falls outside the scope of this short report.

      In this revised version, we have included additional validation and clarification of the results, as well as a more explicit discussion of their limitations. In addition, we present a preliminary analysis exploring potential mechanisms that could coordinate the observed expression patterns of the TcTS family.

      The authors identify a subpopulation of TcS genes that are highly expressed in many cells. However, it is unclear if these correspond to previously characterized TcS members with specific functions.

      The TcS subgroup with a high frequency of detection comprises 31 genes, none of which belong to the catalytically active Group I trans-sialidases. Instead, this subgroup includes members of Groups II, III, IV, V, VI, and VIII. This information has been added to Supplementary Table 3 and is now stated in the revised manuscript.

      The authors hypothesize that observed heterogeneity may relate to chromatin regulation. However, the study does not directly address these mechanisms. There are interesting connections to be made with what they identify as the colocalization of genes within chromatin folding domains, but the authors do not fully explore this. It would be insightful to address these mechanisms in future work.

      In response to the reviewer’s and editorial team’s request for additional mechanistic insight into the regulatory processes that may be involved in the observed patterns, we have expanded the revised manuscript to discuss how the genomic context of TcS loci could contribute to the observed heterogeneity in TcS expression. As noted in the original version of the manuscript, TcS genes and other surface-protein gene families are largely partitioned into discrete genomic compartments, whose expression has been reported to be regulated by epigenetic control of chromatin-folding domains (doi.org/10.1038/s41564-023-01483-y). However, we previously showed that TcS genes detected in a high proportion of cells are, in most cases, dispersed throughout the genome, arguing against a model in which their preferential expression results from colocalization within a small number of ubiquitously activated chromatin domains. In response to the reviewer’s suggestion, we performed a more detailed analysis of the genomic locations of these TcS genes. We found that many of them are localized within the core compartment (new Figure 5). Because the core compartment is enriched for conserved, housekeeping genes that typically display more constitutive expression (doi.org/10.1038/s41564-023-01483-y), whereas the disruptive compartment is enriched for lineage-specific multigene families associated with variable, stage-specific, and recently reported stochastic expression (doi.org/10.1038/s41467-025-64900-2), our results are consistent with a model in which compartment-specific regulatory mechanisms (in addition to post-transcriptional regulation) influence the differential cellular expression of core- versus disruptive-located TcS genes. We have incorporated these results and discussion in the revised manuscript.

      The merging of technical replicates needs further justification and explanation as they were not processed through separate experimental conditions. While barcodes were retained, it would be informative to know how well each technical replicate corresponds with the other. If both datasets were sequenced on the same lane, the inclusion of technical replicates adds noise to the analysis.

      Regarding technical details, we now include the total number of mapped reads and average number of reads mapped per cell (new paragraph in the Methods section.

      The technical replicates consist of a single Illumina library that was sequenced in two separate runs. As this approach is expected to be highly reproducible, we merged both runs into a single count table. To support this decision, we assessed the concordance between the two sequencing runs and observed an almost perfect correlation between them (Author response image 2).

      Author response image 2.

      Correlation analysis of number of reads assigned to cells between technical replicate 1 and technical replicate 2.

      While the number of cells sequenced (3192) seems reasonable, it's not clear how much the conclusions are affected by the depth of sequencing. A more detailed description of the sequencing depth and its impact on gene detection would be valuable.

      We detected a mean of 1088 genes per cell. Based on the 15,319 annotated protein-coding genes in the reference genome, this represents 7.1% of the T. cruzi protein-coding gene complement detected in each cell.

      Across the entire dataset, a total of 14,321 genes were detected in at least one cell, representing 93.5% of all annotated protein-coding genes. This suggests that our experiment captured a broad representation of the parasite's transcriptome.

      This per-cell detection rate is characteristic of droplet-based scRNA-seq and is consistent with other trypanosomatid studies. For example, the T. brucei single-cell atlas (Hutchinson et al., 2021) reported a median detection of 1052 genes per cell. In the case of T. cruzi, the recently published pre-print of the T. cruzi single cell atlas from Laidlaw & García-Sánchez et al. reported a mean between 298 and 928 genes detected per cell (depending on the sample).

      This information is now included in Methods.

      While most of the methods are clear, the way in which the subsampled gene lists were generated could be more thoroughly described, as some details are not clear for the subsampling of single-copy genes.

      The subsampling method was originally described in the Figure 2 legend; to better highlight this approach, we have now moved its description to the Methods section.

      Some of the figures are difficult to interpret. For example, the color scaling in the heatmap of Supplementary Figure 3B is not self-explanatory and it is hard to extract meaningful conclusions from the graph.

      We agree with the reviewer in this assessment. We have now modified the figures to be more self-explanatory and better reflect the conclusions.

      Reviewer #3 (Public review):

      The study aimed to address a fundamental question in T. cruzi and Chagas disease biology - how much variation is there in gene expression between individual parasites? This is particularly important with respect to the surface protein-encoding genes, which are mainly from massive repetitive gene families with 100s to 1000s of variant sequences in the genome. There is very little direct evidence for how the expression of these genes is controlled. The authors conducted a single-cell RNAseq experiment of in vitro cultured parasites with a mixture of amastigotes and trypomastigotes. Most of the analysis focused on the heterogeneity of gene expression patterns amongst trypomastigotes. They show that heterogeneity was very high for all gene classes, but surface-protein encoding genes were the most variable. In the case of the trans-sialidase gene family, many sequence variants were only detected in a small minority of parasites. The biology of the parasite (e.g. extensive post-transcriptional regulation) and potential technical caveats (e.g. high dropout rates across the genome) make it difficult to infer what this might mean for actual protein expression on the parasite surface.

      We thank the reviewer for this important comment, highlighting a central challenge when studying trypanosomatid biology. We acknowledge that in most eukaryotes and particularly in T. cruzi, where there is a predominant role of post-transcriptional regulation, mRNA levels are not always directly correlated with protein abundance, as previously reported by us and others (10.1186/s12864-015-1563-8, 10.1128/msphere.00366-21, 10.1590/S0074-02762011000300002, 10.1042/bse0510031). Nevertheless, steady-state transcript levels obtained by RNA-seq remain informative for assessing differential gene expression, and this approach has been widely used as a proxy for the study of gene expression profiles in T. cruzi (10.7717/peerj.3017, 10.1371/journal.ppat.1005511, 10.1016/j.jbc.2023.104623, 10.3389/fcimb.2023.1138456, 10.1186/s13071-023-05775-4).

      It's also interesting to note that recent proteomic analyses (10.1038/s41467-025-64900-2) have revealed substantial heterogeneity in the expression of surface proteins, including trans-sialidases, supporting the idea that the transcriptional heterogeneity we observe reflects a genuine biological feature that propagates to the protein level.

      We have now added a sentence to the discussion acknowledging this limitation and discussed the results from Cruz-Saavedra, et al. in the revised manuscript.

      (1) Limit of detection and gene dropouts

      An average of ~1100 genes are detected per parasite which indicates a dropout rate of over 90%. It appears that RNA for the "average" single copy 'core' gene is only detected in around 3% of the parasites sampled (Figure 2c: ~100 / 3192). This may be comparable with some other trypanosome scRNAseq studies, but this still seems to be a major caveat to the interpretation that high cell-to-cell variability in gene expression is explained by biological rather than technical factors. The argument would be more convincing if the dropout rates and expression heterogeneity were minimal for well-known highly expressed genes e.g. tubulin, GAPDH, and ribosomal RNAs. Admittedly, in their Final Remarks, the authors are very cautious in their interpretation, but it would be good to see a more thorough discussion of technical factors that might explain the low detection rates and how these could be tested or overcome in future work.

      (2) Heterogeneity across the board

      The authors focus on the relative heterogeneity in RNA abundance for surface proteins from the multicopy gene families vs core genes. While multicopy gene sequences do show more cell-to-cell variability, the differences (Figure 2D) are roughly average Gini values of 0.99 vs 0.97 (single copy) or 0.95 (ribosomal). Other studies that have applied similar approaches in other systems describe Gini values of < 0.2-0.25 for evenly expressed "housekeeping" genes (PMIDs 29428416, 31784565). Values observed here of >0.9 indicate that the distribution for all gene classes is extremely skewed and so the biological relevance of the comparison is uncertain.

      We recognize the limitations imposed by gene dropout in our data, as highlighted by the reviewer. Unfortunately, gene dropout is an inherent limitation of 10x genomics data. Trypanosomatids are not an exception in this regard, and the general metrics of the single-cell RNA-seq data in other reports are equivalent to those obtained in our experiment.

      Despite this important limitation, we believe that our comparative analyses (the contrast between TcS and ribosomal protein expression) provide valuable insights into a biological phenomenon with potential functional relevance for the parasite. Furthermore, we are actively working on generating single-cell RNA-seq data using alternative methodologies that improve gene dropout rates. We anticipate that these future studies will help clarify the extent of the phenomenon described in this work.

      Our results reveal a small subset of TcS genes that are frequently detected across cells, a pattern that is not compatible with random detection unless these genes were highly expressed and preferentially captured by random sampling. However, as shown in Figure 4b, many genes expressed at comparable levels are not detected at high frequencies. In line with this, Figure 4c shows that within individual cells, the detected TcS genes exhibit similar expression levels. Finally, we confirmed that this frequently detected subset shows high read counts at the bulk RNA-seq level (Figure 4 - Figure Supplement 1), consistent with the fact that these TcS are frequent in the population even when they are not specially highly expressed within each cell. Taken together, these findings argue against a purely random sampling of TcS genes and support the interpretation that this pattern reflects an underlying biological feature. We agree that further validation will be required. Accordingly, since the initial submission, we have been careful to frame our conclusions conservatively, explicitly noting that dropout remains a limitation of these data that could influence the observed patterns. In the revised version, we have strengthened this point by including a specific statement in the final remarks. Our interpretation is presented as a working hypothesis that is fully compatible with the observations reported here and may be informative for the field. To better reflect this reasoning, we have revised Figure 4b, expanded the discussion, and explicitly included this limitation in the final remarks of the revised manuscript.

      Nevertheless, this study does provide some tantalising evidence that the expression of surface genes may vary substantially between individual parasites in a single clonal population. The study is also amongst the very first to apply scRNAseq to T. cruzi, so the broader data set will be an important resource for researchers in the field.

      We thank the reviewer for highlighting the relevance of our study and for their positive assessment of the potential significance of these observations. We also agree that the dataset generated here may represent a useful resource for the community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figures 1c and 1d, it would be useful to include the genes as the plot titles.

      We agree with the reviewer that including gene names in the plot makes the panels more self-explanatory. We have added gene names to the updated version of Figure 1.

      (2) Can you include the read lengths of the sequencing and whether this is sufficient to map accurately to very similar genes of the same multigene family? As stated in the public summary, this would make the data far more convincing as standard 10x chromium cannot distinguish similar gene copies unless a longer read 2 is used. Given that only the 3' end is targeted, is this enough to distinguish the TcS and other mutligene family transcripts?

      We thank the reviewer for raising this important point. We agree that short 3′ biased reads can limit the disambiguation of highly similar multicopy transcripts. This is, in fact, a common challenge when analyzing transcriptomic data from T. cruzi.

      To address this issue, we analyzed the sequence identity of the 3′ ends of TcS transcripts (defined as the 3′UTR plus 20% of the CDS region). As shown in Author response image 1, these regions display a median sequence identity of approximately 25%, indicating that sufficient sequence divergence exists for mapping algorithms to use during read assignment.

      In addition, it is important to note that kallisto, the software used in our analysis, was specifically designed to address multimapping reads through pseudoalignment combined with an expectation-maximization algorithm that probabilistically assigns reads across compatible transcripts.

      To directly assess performance, we simulated reads from the T. cruzi transcriptome used in this study (3′UTRs plus 20% of the CDS regions) and compared two mapping/counting strategies: (a) transcriptome pseudoalignment using kallisto, and (b) genome alignment followed by counting using STAR + featureCounts. The latter approximates the strategy implemented in CellRanger, the standard pipeline for quantifying expression levels from 10X Genomics single cell RNA-seq data. We found that kallisto recovered the simulated “true” counts with substantially higher accuracy than STAR + featureCounts (Pearson correlation: all genes, 0.991 vs 0.595; surface protein genes, 0.9996 vs 0.827; trans-sialidase (TcS) genes, 0.9998 vs 0.773). These results indicate that pseudoalignment is currently the optimal strategy for recovering the relative expression of highly similar gene family members (Author response image 1C).

      The length of the R2 read (91bp) was included in Methods (line 411).

      (3) It is stated that 'single copy' genes also include 'low copy number genes". What does this include exactly? Is it more actuate to say non-surface protein genes?

      The distinction we aim to make is between multigene families and the rest of the genome. Most multigene families encode surface proteins, but not all surface protein genes belong to multigene families. To clarify this point we included a sentence in methods to reflect that when we describe “surface proteins” we are referring to surface proteins coded by multigene families (line 453). In addition, long-read genomic DNA sequencing and assembly have revealed that many genes previously believed to be single-copy are actually duplicated at low copy numbers (doi.org/10.1099/mgen.0.000177). For this reason, we extend the concept of “single-copy” genes to include those that have only a few duplicates.

      (4) It is stated in line 127 that TcS have particular high heterogeneity - it does not look that way by eye compared to the other multigene families. Can statistic be used to prove this, or simply state the decision was made to focus on the TcS?

      As noticed by the reviewer, all multigene families show significantly higher heterogeneity compared to single-copy genes, as stated in the text and shown in figure legends from Figure 2, Supplementary Figure 1 and the new Supplementary Table 2.

      That said, it was not the statistical results that guided our decision to focus on TcS, but rather their well-established biological relevance in T. cruzi. As suggested, we have now emphasized this rationale more clearly in the revised text (lines 160-167).

      Besides, recent work has shown that TcS genes exhibit a bimodal distribution of expression levels using bulk RNA-seq data, in contrast to core genes and other multigene families (doi.org/10.1038/s41467-025-64900-2, doi.org/10.1038/s41564-023-01483-y). This distinct regulatory behavior further justifies our decision to examine TcS separately.

      (5) Expression of different TcS has been investigated between the different life cycle stages for a few individual genes previously (Freitas et al). Can the authors not extend this investigation to all the genes detect by scRNA-seq here to demonstrate those with higher/lower expression in amastigotes vs trypomastigotes building on Figure 2A? Are particular groups linked to either stage?

      We performed this analysis and did not observe any correlation between TcS groups and life cycle stage. In all cases TcS were more frequently detected in trypomastigotes. This difference was statistically significant for all groups except group VII, likely due to the low number of genes analyzed in this group (Author response image 3).

      Author response image 3.

      Per-gene number of expressing cells by TcS group and life-stage. Boxplots show, for each TcS group (I–VIII), the distribution across genes of the number of cells in which the gene is detected. Each point represents a single TcS; Amastigote cells: green points/boxes, Trypomastigote cells: salmon points/boxes. The y-axis is on log10 scale. Asterisks indicate statistically significant differences from the comparison between Amastigote and Trypomastigote within each TcS group, assessed using a paired two-sided Wilcoxon signed-rank test: * p < 0.05, ** p < 0.01, *** p < 0.001.

      (6) What exactly is the Z-score shown in Figure 2B?

      In this analysis num_multigene represents the number of multigene family genes detected in each individual cell. For every cell, we counted how many genes from our predefined multigene family gene list has detectable expression (more than zero UMI counts); in the UMAP plot, this value is reflected by the size of each point. On the other hand, z_multigene captures the relative expression level of multigene family genes within each cell. This metric is calculated by summing the UMI counts of all multigene family genes per cell and then standardizing this value across the dataset using a z-score transformation, such that positive values reflect above-average multigene family expression and negative values reflect below-average levels. In the UMAP plot, this metric determines the color scale of each point. Taking together num_multigene and z_multigene allow us to distinguish cells that express multigene family genes broadly (high gene counts), strongly (high relative expression), both, or neither, and to relate these patterns to identified cell populations.

      We included a short description in legend of the new version of Figure 2 (lines 176-180).

      (7) For the reclustering of trypomastigotes based on TcS genes alone, please show the UMAP and discuss why the resolution giving two clusters is chosen? I assume increasing the resolution does not reveal clusters of cells express one of the 8 groups of TcS for example?

      We appreciate the reviewer’s suggestion. In this analysis, our goal was to test whether the phenotypic heterogeneity previously reported in trypomastigotes could be recapitulated using TcS genes alone, as prior studies described two major transcriptomic phenotypes within this stage.

      Increasing the clustering resolution did not reveal subclusters corresponding to the eight TcS sequence groups. This might reflect the fact that these groups are defined based on sequence similarity rather than on expression patterns, as noted by Freitas et al. (doi:10.1371/journal.pone.0025914).

      (8) In Figure 4B, there may be an upward trend in the level of expression and the number of cells a transcript is detected in? It would be worth showing this is or is not the case with statistics if possible.

      The number of genes detected in a high proportion of cells is low, which limits the statistical power of this analysis. Also, substantial dispersion is observed within the 0-5% interval. Nevertheless, this figure is presented primarily to highlight that a considerable number of highly expressed genes are detected in only a small fraction of cells. If expression level were the main determinant of detection frequency across cells, one would expect very few highly expressed genes to fall within the 0-5% interval. Contrary to this expectation, among the 50 highest expressed TcS genes, 62% are detected in fewer than 5% of cells, and even among the top 10 most highly expressed TcS genes, 40% fall within this lowest detection group. To facilitate this interpretation, we modified the figure (new Figure 4b) to explicitly highlight the top 50 most expressed TcS genes and incorporated this discussion into the main text of the revised manuscript (lines 244-251), making the conclusion clearer to the reader.

      (9) Do the cells group instead by expression of any of the other multigene families not investigated in detail?

      It is possible that additional transcriptional substructure among trypomastigotes is driven by the expression of other multigene families beyond TcS. In this short report (with limited number of figures, words, etc.), we focused specifically on the trans-sialidase family as discussed earlier. A more comprehensive analysis including other large surface gene families (MASPs, mucins, GP63) is planned as part of ongoing work and will be presented in future reports.

      Reviewer #2 (Recommendations for the authors):

      This reviewer suggests the conduction of functional experiments in follow-up studies to establish links between TcS expression profiles and parasite behavior and into potential regulatory mechanisms responsible for the observed TcS heterogeneity, particularly focusing on epigenetic modifications. It would be interesting to correlate the highly expressed TcS members identified here with previously characterized TcS isoforms and provide more description regarding which particular groups and TcS members are driving the findings. It would benefit from further clarification regarding sequencing depth, technical replication merging, subsampling, and specific parameters for alignment methods and more information regarding the specific statistical tests and their applicability to the data.

      This is a promising single-cell study with potentially high significance. The manuscript is well-written, and the analyses are reasonably well-executed. However, the current manuscript is limited by a lack of functional validation and mechanistic insights. The addition of further analyses and experiments, as suggested, will strengthen the conclusions and increase the impact of the work.

      We thank the reviewer for their careful reading of the manuscript. As suggested, we have performed additional validation and clarification of the results, as well as a more explicit discussion of their limitations. In addition, we have included a preliminary analysis exploring potential mechanisms that could be coordinating the observed expression patterns of the TcS family (see below). Even though we consider relevant and interesting to experimentally validate these results, given the inherent difficulties in studying multigene families in T. cruzi, an organism with a very limited set of molecular biology tools (such as RNAi), further experimental validation of these observations is outside of the scope of this short report.

      Regarding the reviewer’s question, we studied if any TcS subgroup could be driving our observations. However, we did not find any correlations indicating that a particular group was associated with any of our findings. We now include TcS group information to Supplementary Table 3.

      Regarding technical details, we now included the total number of mapped reads (line 422) and average number of reads mapped per cell (new paragraph in the Methods section, line 432-436).  

      The technical replicates consist of a single Illumina library that was sequenced in two separate runs. As this approach is expected to be highly reproducible, we merged both runs into a single count table, as stated in line 424. To support this decision, we assessed the concordance between the two sequencing runs and observed an almost perfect correlation between them (Author response image 2).

      The subsampling method was originally described in the Figure 2 legend; to better highlight this approach, we have now moved its description to the Methods section (line 456).

      The specific kallisto parameters used are stated in Methods (line 418-419). We now included that default options were used unless otherwise specified (line 419-420).

      In response to the reviewer’s and editorial team’s request for additional mechanistic insight into the regulatory processes that may be involved in the observed patterns, we have expanded the revised manuscript to discuss how the genomic context of TcS loci could contribute to the observed heterogeneity in TcS expression. As noted in the original version of the manuscript, TcS genes and other surface-protein gene families are largely partitioned into discrete genomic compartments, whose expression has been reported to be regulated by epigenetic control of chromatin-folding domains (doi.org/10.1038/s41564-023-01483-y). However, we previously showed that TcS genes detected in a high proportion of cells are, in most cases, dispersed throughout the genome, arguing against a model in which their preferential expression results from colocalization within a small number of ubiquitously activated chromatin domains. In response to the reviewer’s suggestion, we performed a more detailed analysis of the genomic locations of these TcS genes. We found that many of them are localized within the core compartment (new Figure 5). Because the core compartment is enriched for conserved, housekeeping genes that typically display more constitutive expression (doi.org/10.1038/s41564-023-01483-y), whereas the disruptive compartment is enriched for lineage-specific multigene families associated with variable, stage-specific, and recently reported stochastic expression (doi.org/10.1038/s41467-025-64900-2), our results are consistent with a model in which compartment-specific regulatory mechanisms (in addition to post-transcriptional regulation) influence the differential cellular expression of core- versus disruptive-located TcS genes. We have incorporated these results and discussion in line 301-313 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors consistently refer to gene "expression" but somewhere they should acknowledge that in trypanosomes RNA abundance is less predictive of protein than in most other organisms.

      We thank the reviewer for this important comment, highlighting a central challenge when studying trypanosomatid biology. We acknowledge that in most eukaryotes and particularly in T. cruzi, where there is a predominant role of post-transcriptional regulation, mRNA levels are not always directly correlated with protein abundance, as previously reported by us and others (10.1186/s12864-015-1563-8, 10.1128/msphere.00366-21, 10.1590/S0074-02762011000300002, 10.1042/bse0510031). Nevertheless, steady-state transcript levels obtained by RNA-seq remain informative for assessing differential gene expression, and this approach has been widely used as a proxy for the study of gene expression profiles in T. cruzi (10.7717/peerj.3017, 10.1371/journal.ppat.1005511, 10.1016/j.jbc.2023.104623, 10.3389/fcimb.2023.1138456, 10.1186/s13071-023-05775-4).

      It's also interesting to note that recent proteomic analyses (10.1038/s41467-025-64900-2) have revealed substantial heterogeneity in the expression of surface proteins, including trans-sialidases, supporting the idea that the transcriptional heterogeneity we observe reflects a genuine biological feature that propagates to the protein level.

      We have now added a sentence to the discussion acknowledging this limitation and discussed the results from Cruz-Saavedra, et al. in linea 266-271 of the revised manuscript.

      (2) Line 29, in the abstract there is a strong statement that T. cruzi "does not employ antigenic variation". I don't think there is much evidence either way if we are thinking about antigenic variation in the broad sense rather than the extreme model of T. brucei VSG switching. Later in the abstract they state that "no recurrent combinations of TcS genes were observed between individual cells in the population", which sounds very much like a form of antigenic variation.

      We agree with the reviewer. Indeed, we meant to state that T. cruzi does not employ an antigenic variation mechanism such as the one from T. brucei. We change this statement as suggested in lines 28 - 32.

      (3) Line 29, "relies on a diverse array of cell-surface-associated proteins encoded by large multi-copy gene families (multigene families) essential for infectivity and immune evasion" and lines 55-58 "T. cruzi infection relies on a heterogeneous set of membrane proteins, encoded mainly by large multigene families ... most of which are involved in infection, tropism, and immune evasion". It would be worth adding a bit more detail on the nature and strength of the evidence that Tc "relies on" these various genes or that they are "essential" for infectivity, tropism, and immune evasion.

      Because the journal’s short format imposes word limits, we strengthened the original statement by adding specific references that document genomic, transcriptomic and functional evidence linking the major multigene families to infectivity, tropism and immune evasion (doi.org/10.1371/journal.pone.0025914; doi.org/10.1038/nrmicro1351; doi.org/10.1128/iai.05329-11; doi.org/10.1093/nar/gkp172, doi.org/10.1371/journal.ppat.1006767), in line 77.

      (4) Line 89, 1088 genes detected per cell - what is this as a % of genes in the genome?

      We detected a mean of 1088 genes per cell. Based on the 15,319 annotated protein-coding genes in the reference genome, this represents 7.1% of the T. cruzi protein-coding gene complement detected in each cell.

      Across the entire dataset, a total of 14,321 genes were detected in at least one cell, representing 93.5% of all annotated protein-coding genes. This suggests that our experiment captured a broad representation of the parasite's transcriptome.

      This per-cell detection rate is characteristic of droplet-based scRNA-seq and is consistent with other trypanosomatid studies. For example, the T. brucei single-cell atlas (Hutchinson et al., 2021) reported a median detection of 1052 genes per cell. In the case of T. cruzi, the recently published pre-print of the T. cruzi single cell atlas from Laidlaw & García-Sánchez et al. reported a mean between 298 and 928 genes detected per cell (depending on the sample).

      This information is now included in Methods (line 435).

      (5) Line 93-94, how many cells were assigned to clusters 0 and 1?

      Cluster 0 had 2201 cells and cluster 1 had 824 cells assigned.  We have now included these specific numbers in new version of the manuscript (line 114).

      (6) Line 96, cluster 2 ama-trypo transitioning parasites - were these observable by microscopy?

      We did not perform microscopy specifically to observe or quantify the putative ama/trypo transitioning subpopulation: microscopy was only used as a pre-experiment quality check to verify cell morphology and viability. The inference that cluster 2 reflects ama/trypo transitioning parasites is drawn from the transcriptomic profile (particularly from the pattern of stage-associated marker expression observed in that cluster) and should be considered a hypothesis generated by the data, that merits further analysis, as stated in the manuscript.

      (7) Line 106-107, "As expected, single-copy gene expression is high in both amastigotes and trypomastigotes and similar on average between both cell types".

      (8) Why as expected? For a broad journal it would be useful to explain this. Amastigotes are replicative and trypomastigotes are not, so would we not expect to see some differences that reflect this?

      (9) What do you mean by the expression being "high"? High compared to what?

      (10) "Similar on average between both cell types". This does not seem concordant with Figure 1a showing a highly significant difference between ama and trypo.

      We thank the reviewer for this helpful request for clarification for broader readers and the observations regarding global expression of single copy and multigene family genes.

      Figure 2a is intended as an experimental control where we show that our 10X Genomics data shows the previously reported upregulation of surface protein genes in trypomastigotes. We have now modified the text in order to highlight this (line 129). In turn, Supplementary Figure 1a is shown as a control that this upregulation is not a general feature of trypomastigote cells.

      Regarding comment 9, what we meant is that single-copy genes display relatively high expression in both amastigotes and trypomastigotes compared with surface protein-coding genes (see expression values in Figures 2a and Supplementary Figure 1a).

      Finally, differential expression between amastigotes and trypomastigotes at the transcriptomic level has been previously studied and has shown that most single copy genes do not show variation, explaining the overall pattern of Supplementary Figure 1a where average expression is similar between stages (mean fold change = 1.1). This is likely due to the fact that these genes are related to basic cellular functions. Genes related to stage specific functions such as replication in amastigotes or normalization effects may be causing the slight, but statistically significant increase observed in overall expression in amastigotes. This contrasts with the pattern observed for multigene families where there is a clear overexpression in trypomastigotes (mean fold change = 1.5).

      As observations commented on questions 9 and 10 have been described in previous studies and are not novel nor key points in our results, we decided not to focus on them and modified the text accordingly in lines 129-135.

      (11) Line 110, "with high variation". What does "high variation" mean here? Compared to what? For the two metrics (n cells +ve for each gene and total expression level) can they give an average and the SD? It would be useful to know how many parasites the "average" surface (and core) gene is expressed in, or more precisely for which the RNA is above the limit of detection.

      We refer to the comparison with the expression profile observed for single-copy genes. This point has now been clarified in the text, and we have included the mean and standard deviation for both TcS multigene family genes and single-copy genes in trypomastigotes for both metrics in the Figure 2 legend. The average and distribution of the number of cells in which each gene is detected are shown in Figure 2c and Supplementary Figure 1a. We also added a reference to this panel at the point in the text where the phenomenon is first described.

      (12) Line 134, Figure 2b legend needs more detail - what are num_multigene and z_multigene?

      Please see our response to Reviewer 1, Question 6. We have now added a clarification to the legends of Figure 1 and Supplementary Figure 1.

      (13) Figure 2c, correct the y-axis legend because it implies your values are log10 transformed. Also, it would be useful to have more markers on the y axis so the reader can better estimate the data ranges.

      We thank the reviewer for this observation. We have now corrected the y-axis label and markers.

      (14) If the y-axis of Figure 2D started at 0 instead of 0.8 and if Lorenz curves were provided then the reader would probably get a fuller sense of the expression heterogeneity in the dataset. The legend states the differences are statistically significant but the actual p-values are not shown.

      (15) Line 142-3, more precision is needed on the p-values.

      We thank the reviewer for this helpful suggestion. We agree that Lorenz curves provide a clearer representation of expression heterogeneity than the previous plot. Accordingly, we have replaced the original panel (Figure 2d) with Lorenz curves for the groups under comparison, and have made the same change in Supplementary Figure 1d. In addition, we have included gini index values and p-values for all comparisons in Supplementary Table 2.

      (16) Figure 3, as in Figure 1a it would be useful to add another UMAP plot to show the two trypo subpopulations.

      We thank the reviewer for this suggestion. We have now updated Figure 3 to include a UMAP plot showing the two trypomastigote subpopulations.

      (17) What is the observed proportion of broad vs slender trypomastigote morphologies for Dm28c? To be consistent with the speculation at line 162 then wouldn't it need to be approximately 50-50?

      The proportions of each trypomastigote subpopulation in the DM28c strain are currently unknown. The only available relevant data come from Brener, 1965 (doi.org/10.1080/00034983.1965.11686277), in which this strain was not included. In the strains analyzed in that study, the relative proportions of broad and slender trypomastigote morphologies were highly variable: across seven strains, broad forms ranged from 18.0% to 77.3%, while slender forms ranged from 2.3% to 71.6%. Given this wide variability and the lack of DM28c-specific data, we cannot assume any expected proportion for this strain.

      (18) Line 170, please state how many genes are in the TcS subgroup mentioned here. This is an interesting finding - does this include mostly catalytically active trans-sialidase genes or is it a mixture from across all the subfamilies?

      The TcS subgroup with a high frequency of detection comprises 31 genes, none of which belong to the catalytically active Group I trans-sialidases. Instead, this subgroup includes members of Groups II, III, IV, V, VI, and VIII. This information has been added to Supplementary Table 3 and is now stated in the revised manuscript (lines 227 - 228).

      (19) Line 175-176, "Gene dropouts might favor random patterns of gene family's detection in scRNA-seq experiments, particularly affecting genes with low expression" - I'm not sure if the authors mean the detection of a gene (or not) in an individual parasite is truly random (pure luck) or whether the term stochastic would be more appropriate because they seem to be referring to randomness around a certain threshold of RNA abundance/stability? They go on to rule this out, at least for TcS genes, essentially arguing that they have something resembling an ON or OFF pattern rather than a spectrum of expression levels. This is potentially very important and could advance the field in a major way, but the fact that so many core and ribosomal genes, which 'should' be always ON, cannot be detected in most cells is a concern. A version of Figure 4B for core and ribosomal genes could be informative - do they show a different pattern to TcS?

      Our results reveal a small subset of TcS genes that are frequently detected across cells, a pattern that is not compatible with random detection unless these genes were highly expressed and preferentially captured by random sampling. However, as shown in Figure 4b, many genes expressed at comparable levels are not detected at high frequencies. In line with this, Figure 4c shows that within individual cells, the detected TcS genes exhibit similar expression levels. Finally, we confirmed that this frequently detected subset shows high read counts at the bulk RNA-seq level (Supplementary Figure 2), consistent with the fact that these TcS are frequent in the population even when they are not specially highly expressed within each cell. Taken together, these findings argue against a purely random sampling of TcS genes and support the interpretation that this pattern reflects an underlying biological feature. We agree that further validation will be required. Accordingly, since the initial submission, we have been careful to frame our conclusions conservatively, explicitly noting that dropout remains a limitation of these data that could influence the observed patterns. In the revised version, we have strengthened this point by including a specific statement in the final remarks. Our interpretation is presented as a working hypothesis that is fully compatible with the observations reported here and may be informative for the field. To better reflect this reasoning, we have revised Figure 4b, expanded the discussion, and explicitly included this limitation in the final remarks of the revised manuscript.

      (20) Line 238-9, Add details of removing extracellular epimastigotes after cell infections.

      Only cellular trypomastigotes collected from the supernatant on day 6 were used for the secondary infection, at a 10:1 parasite-to-cell ratio. After 24 hours, the cultures were washed twice with PBS to remove any remaining extracellular parasites. Under these conditions, i.e. using exclusively trypomastigotes, at this infection ratio, and maintaining the cultures in mammalian medium, we do not expect the presence or survival of extracellular epimastigotes. We have included a sentence in the Methods section clarifying this information in the revised version of the manuscript, line 382.

      (21) Line 260, was methanol used to directly resuspend the parasite pellet, or was it resuspended first e.g. in a small volume of PBS?

      As described in lines 250-257 of the original manuscript, parasites were washed and resuspended in DPBS before methanol fixation. Methanol fixation was then carried out according to the 10X Genomics Methanol Fixation Protocol. We have now emphasized this more clearly in the revised text in line 400.

      (22) What was the doublet rate?

      We identified and removed 41 doublets, all belonging to cluster 2, and retained 3,151 singlets for downstream analysis (total cells before removal = 3,192). The resulting doublet rate was 1.28%. We have included a sentence in the Methods section clarifying this information in the revised version of the manuscript, line 439 -440.

      (23) What was the frequency of rRNA and kDNA-derived reads?

      Approximately 4.02% of the reads were derived from kDNA sequences, while 1.10% corresponded to rRNA-derived reads (Author response image 4).

      Author response image 4.

      Percentage of mitochondrial and ribosomal rRNA derived reads.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the Reviewers for their comments on our manuscript “Structural insights into mitotic-centrosome assembly”. As described below, we have substantially revised the manuscript in response to their comments and are hoping you would consider the revised manuscript “Phosphorylation relieves autoinhibition to drive Cnn centrosome scaffold assembly” at The EMBO Journal. Our specific responses (black text) to the Reviewer’s comments (blue text) are detailed below

      Reviewer #1

      Main Points:

      1) From previous studies, it seems to me that for the residues potentially relevant for the hairpin regulation there is direct evidence of phosphorylation only for S567 (mass spec, phospho-antibody). Have the authors tested single site mutants (S567A and E)? Also, have they tested D mutations? If so, this should be commented on and shown. If not, it should be tested, in particular since the 2E phospho-mimetic is not functioning properly in vivo. If S571 is indeed crucial, it should be demonstrated that it is also phosphorylated. Otherwise it is possible that the mutation of this residue simply impairs important interactions (e.g. PReM-CM2, others), independent of phosphorylation.

      As requested, we have now tested individual S567A and S571A mutations and found that they both perturb Cnn scaffold assembly, but to a lesser extent than the 2A double mutant (New Fig.S3A). We also now confirm by MS that recombinant Polo can phosphorylate both S567 and S571 in vitro, and we have examined the behaviour of a 2D mutant and find that it behaves very similarly to the 2E mutant (New Fig.S3B).

      2) It is unclear why in vitro only A mutations have been tested and not phospho-mimetics. This should be tested for the interaction between PReM and CM2. This would allow to probe the model that phosphorylation opens the hairpin to allow interaction. Currently, such proof is missing in the study. Alternatively, the authors could phosphorylate the recombinant protein in vitro. The in vivo data is harder to interpret due to the complexity of the model and the authors should take advantage of the in vitro system.

      As requested, we now show in New Fig.S5 that whereas in vitro WT Cnn490-608 and Cnn-2A490-608 behave as dimers, Cnn-2E490-608 elutes in two major fractions—a tetramer species and a much larger species that elutes in the void volume (meaning that 2E can form very large species even in the absence of CM2) (Figure S5A). In the presence of CM2, Cnn-2E490-608 forms a tetramer (that eluted slightly later than the Cnn-2E490-608 tetramer) and larger complexes that contained CM2 and eluted in the void volume with a profile similar to Cnn-2E490-608 on its own (Figure S5B). These results are consistent with the possibility that the 2E substitutions open the helical hairpin to allow self-interactions that drive homo-tetramer and larger complex assembly in vitro.

      3) Regarding the worm PReM and CM2 domains, the authors mention that they have tested in vitro phosphorylation by PLK-1, but I could not find any data showing this. They should demonstrate successful phosphorylation or test candidate site by phospho-mimetic mutation. It is possible that the worm proteins depend more strongly on phosphorylation to relieve autoinhibition compared to the fly proteins.

      This is a good point, and we apologise for this omission. We now state that we confirmed by MS analysis that the recombinant worm PLK-1 we used in these in vitro experiments phosphorylates the putative SPD-5 PReM domain on the three sites (S627, S653 and S658) known to be important for promoting SPD-5 scaffold assembly in vivo (Figure Legend, Figure 6). Thus, the lack of detectable binding between these proteins is not due to the lack of phosphorylation.

      Minor Point:

      4). Fig. 6C, D: the labeling of the chimeric constructs using "+" symbols is confusing, since it suggests that separate proteins were expressed. If I understand this correctly, with the current labeling, deltaCM2+DmCM2 means WT? The authors should write the full name of the wildtype or chimeric construct in each case and use a more standard/less confusing nomenclature. Also, I suggest to start the panels and graphs with the WT sample.

      We thank the Reviewer for this suggestion and have re-labelled this Figure to clarify this point. We understand the point about putting the WT panels first in Figure 6C,D (now Figure 5C,D) but think that this is not the correct comparison to emphasise. We are testing the ability of the various CM2 domains to “rescue” the lack of a CM2 domain, so we feel Drosophila Cnn lacking CM2 is the correct baseline for this comparison.

      Reviewer #2

      Main Comments:

      1. The title is too vague. Any number of existing papers could be said to provide "structural insights into mitotic centrosome assembly". The authors need to narrow down to a defined conclusion and state this as the title.
      2. I think the strongest and most novel aspects of this study relate to the mechanism of Cnn assembly via relief of the auto-inhibited PReM. The effort to elucidate assembly mechanisms of SPD-5 and CDK5RAP2 are comparatively light and there are no accompanying experiments in worms or human cells. Without the in vivo experiments, it's hard to know if the in vitro experiments are valid. It's speculative for the authors to say they found the true PReM for CDK5RAP2; they do not demonstrate that PLK-1 phosphorylation potentiates assembly in Figure 8. Thus, I suggest re-writing the paper to focus on Cnn. Experiments in Figure 6 are still valid if reframed. For example, substituting Cnn's CM2 with the CM2 from CDK5RAP2 vs. the C-term of SPD-5 illustrates that a simple coiled-coil with open ends (H.s.CM2) is sufficient to interact with PReM whereas a coiled-coil with a closed end (SPD-5 C-term, predicted by Figure 6A) cannot. We thank the Reviewer for these helpful comments and have re-written and re-organised the manuscript in accord with these suggestions—most importantly providing a more specific title and re-ordering the data to better focus the paper on the relief of Cnn autoinhibition.

      The purpose of Figure 1 is unclear. None of the other figures examine SPD-5 and CNN in the condensate form, which required using 4% PEG in this paper. The other assays look at the network form, which could behave differently and have different dependence on specific domains. I think they should perform the condensate assay for all other figures, otherwise leave it out. Furthermore, CDK5RAP2 is mentioned, yet not examined in Figure 1. It must be noted that CDK5RAP2 will also condense into droplets under crowding conditions or with a synthetic nucleator (Rios et al., 2025 J Cell Sci). Thus, it seems that condensation potential is a universal feature of known PCM scaffold proteins.

      The original Figure 1 has been moved to end of the paper (now Figure 8) and we now more thoroughly explain the logic of these experiments. Briefly, given that the PReM and CM2 domains in flies and worms seem to function in different ways in vivo, we sought here to test whether this was also the case in vitro—where the behaviour of full-length SPD-5 and of these domains of Cnn have been extensively studied, but never directly compared. We believe such a direct comparison will be of some interest to the field (the Woodruff et al., 2017 paper describing these in vitro SPD-5 condensates has been cited >700 times). We now also cite the Rios et al., 2025 paper but note that, despite extensive efforts, we were unable to purify enough well-behaved CDK5RAP2 for our experiments and so could not include it in this analysis. We think Rios et al., used an MBP-fusion of CDK5RAP2 in their experiments, which may explain this difference.

      The study uses different species without doing the same types of experiments on each. Sometimes human CDK5RAP2 is thrown in, sometimes not. They solve crystal structures of PReM from Cnn but not from the other proteins. This gets confusing, especially since the authors state that they seek to test if fly Cnn and worm SPD-5 assemble through different mechanisms (see last sentence of the intro). Also, if the focus is on worm vs. fly PCM assembly mechanisms, why include the human protein, especially Figure 8?

      On re-reading our original manuscript we appreciate this confusion. We hope that in re-writing the manuscript along the lines suggested by the Reviewer the logical flow of our experiments will be clearer.

      The conclusion that SPD-5's narrow PReM and "CM2" domains don't interact is consistent with the cross-linking mass spectrometry data from Rios et al. 2024. They showed only one X-link with low occurrence (1 out of 6 samples) between these two regions, even in the phosphorylated state (Fig. 1G). However, Nakajo et al (2022) claimed the opposite, showing that a larger PReM-containing construct (a.a. 272-732) interacts with a C-terminal construct (a.a. 1061-1198) after PLK-1 phosphorylation. Can the authors comment on this? Perhaps there is another site in SPD-5, outside of a.a. 541-677, that acts like the Cnn PReM?

      These are good points and we now mention this last possibility in the Discussion. We also now mention the supporting cross-linking Mass Spec data from Rios et al., 2024.

      I have serious doubts that the C-terminus of SPD-5 has a CM2 domain. To me, there is no real sequence homology with the traditional CM2's from humans and flies, and the AF3 predictions support this. Ohta et al. (2021) called this region "CM2-like" based on very poor homology, which a is questionable practice. Any coiled-coil region will appear somewhat homologous due to the heptad repeat pattern that defines them (e.g., leucines line up quite nicely). Thus, is it fair to say that SPD-5 doesn't assemble through a PReM-CM2 interaction? There may be a different region in SPD-5 that looks more like the canonical CM2. I think the authors have compelling evidence to give the C-terminal coiled-coil region in SPD-5 its own name rather than calling it CM2.

      This is a fair point, although the literature is already quite confusing on the nomenclature for the C-terminal region of SPD-5 (e.g., Ohta et al., JCB, 2021; Nakajo et al., JCS, 2022), so we are reluctant to add another name to the mix. Given that we draw comparisons with the fly and human CM2 domains (that are clearly related by sequence), we think it is easiest for readers if we use the “CM2” nomenclature throughout, although making clear our conclusion that SPD-5 “CM2” does not appear to function in the same way as fly/human CM2.

      Figure 3E. Would measuring scaffold mass be more appropriate? The PReM(deltaH1,NTH2) leads to more compact scaffolds, but maybe they assemble just as well as the deltaH1 mutant. As it stands, there is a discrepancy between panel E and F in terms of what is measured (area vs. intensity) and the outcome.

      In several previous papers we use fluorescence intensity to measure the “amount” of protein at centrosomes in vivo but, in our original paper (Feng et al., Cell, 2017), we quantified PReM::CM2 scaffold assembly in vitro by measuring the area of scaffold assembly. Thus, we prefer to present the current data in this way for consistency across publications, and we believe either measure is valid. We could measure the area and intensity of the PReM∆H1 and PReM∆H1∆NTH2 scaffolds to compare scaffold density, but we think this would unnecessarily complicate this data. The main point is not how much or how dense each scaffold is, but rather that the PReM∆H1∆NTH2 protein doesn’t really make a scaffold at all—but rather makes smaller “blobs” that tend to bunch together (further characterised in Fig.S2).

      Minor Comments:

      1. In one version of the PDF there are images missing in Fig 1F, 4C, 4D. I opened another version (source version) and the images were there. Just FYI.
      2. Figure 4A. The blue coloration makes it difficult to read the black letters.
      3. Figure 4A. Why is part of the protein colored in green? This coloration isn't defined, nor does it show up again in panel B.
      4. The layout of Figure 4 is confusing. It took me a few minutes to realize that the big red box inset belonged to panel B and not panel A.
      5. Figure 4C,D. The sample size is not mentioned in the legend.
      6. The title for Figure 4 seems too speculative. How can the authors say that phosphorylation relieves the autoinhibition without structural data?
      7. Figure 5B. The sample size is not mentioned in the legend.
      8. Figure 6B,D. The sample size is not mentioned in the legend.
      9. The text in Figure 7B is hard to read because it is too small. Please make this bigger.
      10. Figure 8C. What is colored in magenta? Is there an additional labeled protein besides mNG-CM2?
      11. Figure 8C. What is the sample size? How many images were taken? Also, why are there data points off to the right of the last column?
      12. The wording of these sections needs improving. I found them complicated and difficult to understand. We thank the Reviewer for taking the time to make these helpful comments. We have addressed all these points in the revised manuscript. On point 10, the magenta objects were fiduciary beads that were inadvertently included on this panel (and are no longer shown).

      Reviewer #3

      Major Comments: 1. The title, "Structural Insights into Mitotic-Centrosome Assembly," is overly broad. The study primarily focuses on CM2-PReM intramolecular interactions in D. melanogaster Cnn and does not comprehensively address mitotic centrosome assembly across species. A more specific title reflecting the fly-centric and structural focus would better align with the manuscript's scope and conclusions.

      As described at the start of our response to Reviewer #2, the title and focus of the manuscript have been extensively revised along these lines.

      The authors analyze condensate formation by Cnn and SPD-5 but overlook condensate formation by CDK5RAP2, which was recently reported by Rios et al. (2025, PMID: 40454523). Including CDK5RAP2 would enable a more balanced and informative comparison across fly, worm, and human homologs.

      As described in point 3 of our response to Reviewer #2, we now cite Rios et al., 2025 but note that, despite extensive efforts, we were unable to purify enough well-behaved CDK5RAP2 for our experiments and so could not include it in this analysis. We believe Rios et al., used a full-length MBP-fusion of CDK5RAP2 in their experiments, which may explain this difference as MBP is very good at keeping proteins soluble (but would not be appropriate in our experiments where we compare full-length untagged proteins).

      In Figure 3, reconstitution of Cnn scaffolds using purified CM2 and PReM fragments yields "macromolecular scaffolds," but their physical properties are not defined. It remains unclear whether these assemblies are ordered or amorphous, and whether they exhibit solid- or gel-like behavior. Moreover, the heterogeneous, scattering particles observed by negative-stain EM (Figure S3B), likely corresponding to the Cnn490-608-CM2 complex, raise the possibility of nonspecific aggregation rather than organized scaffold formation. Appropriate controls lacking CM2 are needed to exclude spontaneous aggregation of PReM fragments. In addition, testing shorter truncations of the PReM H2 helix could help define the minimal requirements for scaffold assembly. Finally, the rationale for including the CnnΔExPReM construct only in vivo (Figure 3F), but not in the in vitro assays (Figure 3A-E), should be clarified.

      We apologise, as our presentation of this data has clearly led to some confusion on these points.

      First, as we now clarify, the amorphous solid-like physical properties of the PReM::CM2 scaffolds were described in our previous paper where we also showed that these scaffolds are not simply non-specific aggregates—as several single point mutations that disrupt the LZ::CM2 tetramer also prevent PReM::CM2 scaffold assembly in vitro as well as Cnn scaffold assembly in vivo (see Fig.5, Feng et al., Cell, 2017). Also, in all in vitro scaffolding experiments we always perform a negative control (-CM2) to confirm that none of the scaffolds are aggregates of the PReM domain being tested. We don’t usually show this control now as there would be lots of empty black boxes on the Figures. We do, however, show this control for the human putative PReM domain (Figure 7C), as we are testing this here for the first time.

      Second, the request to test shorter truncations of the PReM H2 helix to define the minimal requirements for scaffold assembly is unnecessary as PReM∆H1∆NTH2 already cuts H2 at the start of the LZ, and we previously showed the LZ is required for PReM::CM2 scaffold assembly in vitro (Feng et al., Cell, 2017). Thus, any further truncation of H2 will start to remove the LZ, which we already know is essential. We have now made this point more clearly.

      Finally, the Cnn∆ExPReM construct the Reviewer mentions was tested in both the in vitro (now Figure 2B) and in vivo (now Figure 2F) assays, but the labelling was confusing so this was not clear. We have now clarified this point.

      The coarse-grained (CG) simulation methodology is insufficiently described. Given that CG approaches sacrifice atomic detail and may oversimplify interactions, readers require more information to evaluate the model's reliability and limitations. A comparison with the framework used by Ramirez et al. (2024, PMID: 38356260) would be informative. It is also unclear why available crystal structures of WT and 2A Cnn (Figure 2C; Figure S4) were not used as simulation inputs, or why the structure of Cnn490-579 2E was not determined to complete the structural comparison.Furthermore, mutation of Ser567 and Ser571 to alanine markedly stabilizes the PReM domain (Figure 5C, D), implying that these residues maintain domain flexibility. Back-mapping CG models to atomic resolution could reveal the interactions altered by these mutations. The exclusive focus on double mutants (2A and 2E) is also limiting; analysis of single-point mutants at S567 or S571 would clarify whether both residues contribute equally or play distinct roles.

      We performed coarse-grained simulations because although they simplify atomic interactions and capture overall conformational dynamics, which is what we are trying to assess here (Fig.4C,D). We now clarify this point and provide more detail of our simulation methodology in the main text and Materials and Methods. We used the full helical hairpin (i.e., H2+H3+H4) prediction in these simulations—rather than the crystal structure of the partial helical hairpin (i.e., H2+most of H3)—as we reasoned that the presence of the full H3 and H4 might influence breathing, and the full helical hairpin (see Video S1) seems likely to be the relevant biological fold. As we now show (new Figure S5), and as discussed above, the 2E mutants do not behave well in vitro so we were unable to solve their structure. We agree that we could perform atomic resolution simulations to better understand how the 2A/E and single A/E mutations might suppress/enhance breathing, but we believe such an analysis is beyond the scope of the current manuscript and would distract from our main conclusions.

      The discussion lacks sufficient integration with prior studies and often presents conclusions without adequate citation. For example, the claim that flies and humans rely on related PReM-CM2 interactions whereas worms use distinct phosphorylation-regulated mechanisms is not supported by appropriate references. In addition, limited cross-referencing to the manuscript's own data weakens the connection between results and conclusions. Expanding and better grounding the discussion in existing literature would significantly enhance its depth and clarity. We thank the Reviewer for this general point and have tried to better integrate our results with prior studies—particularly in the Discussion section.

      Minor Comments: 1. In Figure 1B, the molecular weight units for the protein marker are missing and should be included. Fixed.

      In Figures 1E and 1F, readability would be improved by including x-axis labels on all graphs, rather than only on the bottom panels.Fixed. The protein structures shown in Figures 2C and 2D sh7w b b∫ybb ould be explicitly labeled as dimers to avoid confusion. Fixed. In Figures 3A-D, using fluorescently labeled CM2 would help validate both the interaction with the PReM domain and its localization within the scaffold.We have previously tried fluorescently tagging the CM2 domain, but scaffold formation is much less robust. We do not think this invalidates this assay, as the evidence supporting the PReM::CM2 interaction is very strong—including assessing the physiological influence of multiple point mutations in both domains in residues at the heart of the interaction interface identified by crystallography (e.g., see Fig.4, Feng et al., Cell, 2017).

      In Figure 3E, no statistical comparisons are presented between the original PReM construct and other samples. In addition, information regarding sample size and the number of experimental replicates is missing from the figure legend. Fixed. In Figure 3F, the absence of a pixel intensity scale bar makes the data difficult to interpret, as color values corresponding to high and low signal intensities are unclear. Moreover, no additional centrosome marker is included, nor is there evidence that PReM fragment expression levels are comparable across samples. These concerns also apply to Figures 4C and 4D.We now include pixel intensity scales in all relevant Figures. We think we do not need to show additional centrosome markers in our images as centrosomes exhibit a very reproducible behaviour in these embryos so we can be very confident that the objects we show here are genuine centrosomes. Considering expression levels, the images in Fig.4C,D (now 3C,D) are derived from stable transgenic lines so we can measure protein expression levels and show that the 2A and 2E mutants are expressed at similar levels to WT (new Figure S6). The images in 2F are from mRNA injections, so cannot be quantified in this way. However, we have vast experience with this assay (used in >15 publications since 2014) and can tell when, very occasionally, an injected mRNA is not expressed well (as this leads to a lack of general fluorescence in the cytoplasm). In addition, we know that deletions in Cnn do not generally destabilise the protein as we have analysed many such transgenic lines (see, for example, Reviewer Figure 1). Thus, the differences in centrosomal levels observed and quantified in 2F are almost certainly not caused by differences in the stability of the proteins being generated from the injected mRNAs.

      In Figure 4A, the interacting residues of PReM and CM2 shown in the red inset would be clearer if residue annotations for each domain were displayed in distinct colors. Additionally, the legends for Figures 4C and 4D do not specify the scale bar length.Fixed. The authors state that interactions between CM2 and PReM-2A462-608 could not be detected in vitro based on SEC chromatograms (Figure 5A), yet the figure does not clearly show this result. The accompanying SDS-PAGE images are too small and lack lane labels, making interpretation difficult (a similar issue applies to Figure 7B). Furthermore, the SEC chromatogram x-axis lacks volume annotations, hindering correlation between chromatographic peaks and SDS-PAGE results (in contrast to Figure 7B, which provides an appropriate example).We thank the reviewer for these points, all of which have now been fixed/adjusted.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Mohamed et al. set out to compare the assembly mechanisms of pericentriolar material (PCM) in flies and nematodes. They reveal that the main PCM scaffold protein in each species (Cnn in flies, SPD-5 in nematodes) are sufficient to form supramolecular droplets (with a crowding agent) or networks (without a crowding agent). However, they diverge in one key aspect: Cnn scaffold assembly relies on the interaction between a C-terminal CM2 domain and a central phospho-regulated domain (PReM), whereas SPD-5 does not. The authors solve the crystal structure of a region within Cnn's PReM. With the help of modeling, they speculate that this region is auto-inhibited through backfolding of alpha helices, thus preventing its interaction with the CM2 domain. This auto-inhibition would be relieved by phosphorylation, which modeling suggests would increase "breathing" of the backfolded structure. The author end by presenting evidence to suggest that the human PCM scaffold protein CDK5RAP2 may assemble through a PReM-CM2 interaction.

      Major Comments:

      1. The title is too vague. Any number of existing papers could be said to provide "structural insights into mitotic centrosome assembly". The authors need to narrow down to a defined conclusion and state this as the title.
      2. I think the strongest and most novel aspects of this study relate to the mechanism of Cnn assembly via relief of the auto-inhibited PReM. The effort to elucidate assembly mechanisms of SPD-5 and CDK5RAP2 are comparatively light and there are no accompanying experiments in worms or human cells. Without the in vivo experiments, it's hard to know if the in vitro experiments are valid. It's speculative for the authors to say they found the true PReM for CDK5RAP2; they do not demonstrate that PLK-1 phosphorylation potentiates assembly in Figure 8. Thus, I suggest re-writing the paper to focus on Cnn. Experiments in Figure 6 are still valid if reframed. For example, substituting Cnn's CM2 with the CM2 from CDK5RAP2 vs. the C-term of SPD-5 illustrates that a simple coiled-coil with open ends (H.s.CM2) is sufficient to interact with PReM whereas a coiled-coil with a closed end (SPD-5 C-term, predicted by Figure 6A) cannot.
      3. The purpose of Figure 1 is unclear. None of the other figures examine SPD-5 and CNN in the condensate form, which required using 4% PEG in this paper. The other assays look at the network form, which could behave differently and have different dependence on specific domains. I think they should perform the condensate assay for all other figures, otherwise leave it out. Furthermore, CDK5RAP2 is mentioned, yet not examined in Figure 1. It must be noted that CDK5RAP2 will also condense into droplets under crowding conditions or with a synthetic nucleator (Rios et al., 2025 J Cell Sci). Thus, it seems that condensation potential is a universal feature of known PCM scaffold proteins.
      4. The study uses different species without doing the same types of experiments on each. Sometimes human CDK5RAP2 is thrown in, sometimes not. They solve crystal structures of PReM from Cnn but not from the other proteins. This gets confusing, especially since the authors state that they seek to test if fly Cnn and worm SPD-5 assemble through different mechanisms (see last sentence of the intro). Also, if the focus is on worm vs. fly PCM assembly mechanisms, why include the human protein, especially Figure 8?
      5. The conclusion that SPD-5's narrow PReM and "CM2" domains don't interact is consistent with the cross-linking mass spectrometry data from Rios et al. 2024. They showed only one X-link with low occurrence (1 out of 6 samples) between these two regions, even in the phosphorylated state (Fig. 1G). However, Nakajo et al (2022) claimed the opposite, showing that a larger PReM-containing construct (a.a. 272-732) interacts with a C-terminal construct (a.a. 1061-1198) after PLK-1 phosphorylation. Can the authors comment on this? Perhaps there is another site in SPD-5, outside of a.a. 541-677, that acts like the Cnn PReM?
      6. I have serious doubts that the C-terminus of SPD-5 has a CM2 domain. To me, there is no real sequence homology with the traditional CM2's from humans and flies, and the AF3 predictions support this. Ohta et al. (2021) called this region "CM2-like" based on very poor homology, which a is questionable practice. Any coiled-coil region will appear somewhat homologous due to the heptad repeat pattern that defines them (e.g., leucines line up quite nicely). Thus, is it fair to say that SPD-5 doesn't assemble through a PReM-CM2 interaction? There may be a different region in SPD-5 that looks more like the canonical CM2. I think the authors have compelling evidence to give the C-terminal coiled-coil region in SPD-5 its own name rather than calling it CM2.
      7. Figure 3E. Would measuring scaffold mass be more appropriate? The PReM(deltaH1,NTH2) leads to more compact scaffolds, but maybe they assemble just as well as the deltaH1 mutant. As it stands, there is a discrepancy between panel E and F in terms of what is measured (area vs. intensity) and the outcome.

      Minor Comments

      1. In one version of the PDF there are images missing in Fig 1F, 4C, 4D. I opened another version (source version) and the images were there. Just FYI.
      2. Figure 4A. The blue coloration makes it difficult to read the black letters.
      3. Figure 4A. Why is part of the protein colored in green? This coloration isn't defined, nor does it show up again in panel B.
      4. The layout of Figure 4 is confusing. It took me a few minutes to realize that the big red box inset belonged to panel B and not panel A.
      5. Figure 4C,D. The sample size is not mentioned in the legend.
      6. The title for Figure 4 seems too speculative. How can the authors say that phosphorylation relieves the autoinhibition without structural data?
      7. Figure 5B. The sample size is not mentioned in the legend.
      8. Figure 6B,D. The sample size is not mentioned in the legend.
      9. The text in Figure 7B is hard to read because it is too small. Please make this bigger.
      10. Figure 8C. What is colored in magenta? Is there an additional labeled protein besides mNG-CM2?
      11. Figure 8C. What is the sample size? How many images were taken? Also, why are there data points off to the right of the last column?
      12. The wording of these sections needs improving. I found them complicated and difficult to understand.

      "Fly and worm Spd-2/SPD-2 and Polo/PLK-1 are clear homologues, but Cnn and SPD-5 share little sequence homology-although they are both predicted to be large coiled-coil-rich proteins. Thus, it remains unclear whether these two, largely unrelated, molecules form mitotic-PCM scaffolds that assemble and function in a similar manner"

      "We first focused on Drosophila Cnn as, although the full structure of the original PReM domain (Cnn403-608) is unknown, this domain contains an internal leucine-zipper (LZ) dimer (Cnn490-544) whose crystal structure, in a tetrameric complex with a CM2 dimer, had been solved (Figure 2A) (Feng et al., 2017)."

      "When the full PReM and CM2 domains are mixed in vitro, they form large micron-scale assemblies and point mutations that perturb the LZ::CM2 tetramer perturb PReM::CM2 scaffold assembly in vitro and Cnn scaffold assembly in vivo."

      Significance

      Overall Assessment:

      While I find the premise of this study to be interesting, its execution and presentation are not fully convincing. The study is a collection of experiments connected by a thread that can be difficult to follow. One concern is the lack of focus and a clearly stated conclusion, which is ultimately embodied by the vague title. For example, the research question at the beginning doesn't match with the outcome in the end. At the end of the introduction, the authors state they wish to compare assembly mechanisms of Cnn and SPD-5. However, at the end of the results, they present data on CDK5RAP2 and speculate on its assembly. Why introduce the human protein here? Another concern is the lack of symmetry in the experiments. There is much more in vitro characterization of Cnn than SPD-5 or CDK5RAP2, and all in vivo work is performed in flies. Finally, this study does not address if the best-established model for SPD-5 assembly-multimerization via specific, multivalent coiled-coil interactions-applies to fly Cnn. Thus, to me, this is study is a deeper dive into the mechanism of Cnn assembly, not necessarily a fair cross-species comparison. I do not have major issues with the results, but I recommend that this paper undergo significant re-writing before being re-reviewed. There are also issues with data display and reporting of experimental details (e.g., sample sizes) that should be easily fixed.

      Advance: this study provides new insight into how two specific domains interact within PCM scaffold proteins to promote scaffold assembly. It provides some new structural insight into the mechanism of Cnn auto-inhibition. However, there is limited conceptual advance, as the bigger ideas (e.g., auto-inhibition as a regulatory control, PCM scaffold assembly through condensation of coiled-coil proteins) were already established.

      Audience: this study will be of interest to cell biologists studying centrosome assembly, mitosis, and evolution.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide extensive immunoreactivity and expression data to map monoaminergic neurotransmitter production sites in Pristionchus pacificus. This nematode is relatively distantly related to the popular model nematode Caenorhabditis elegans, for which such information is already available. They find that dopamine, tyramine, and octopamine are present in the same neurons in both species, but differences are observed for serotonin. This forms the basis for a comparison of serotonergic neurons across 22 nematode species. In addition, they evaluate monoaminergic effects on egg-laying, head movement during reversals, and nictation behavior, to find that monoaminergic control over the latter differs between C. elegans and P. pacificus. This shows that some anatomical flexibility supports similar outcomes, whereas in other cases it is the basis of evolved regulatory differences.

      Strengths:

      The comparative efforts are laudable and valuable, including a thorough revisiting of old data and corrections of what is judged as a historic misannotation. The expected continued value of this work is also appreciated, because nematodes have similar anatomies and behaviors, cellular-resolution data of different species permits the study of functional evolution of neurotransmitter usage in homologous neurons.

      Despite the strong experimental approach, there are some points that require addressing:

      (1) Not all the concepts of the introduction ('feeding behaviors', to a lesser extent also 'evolution of neurotransmitter usage in homologous neurons') are followed up upon in the results or discussion sections.

      We will address the relative treatment of particular topics in the introduction and discussion in a revised version of the article.

      (2) The choice of nematodes ('only' 13 species) may affect what is perceived as ancestral.

      See above regarding ‘13 species’ (actually 22). Most species and genera were specifically selected previously (Loer and Rivard, 2007; Rivard et al., 2010) for broad phylogenetic coverage, representing different species and genera in 4 major clades within ‘clade V’ (Kiontke et al., 2007; Sudhaus, 2011): Anarhabditis (Caenorhabditis, including both the Elegans and Drosophilae species groups), Synrhabditis (Oscheius, Metarhabditis, Reiterina and Rhabditella), Pleiorhabditis (Teratorhabditis, Mesorhabditis, Rhomborhabditis and Pelodera), and Diplogastrids represented by P. pacificus. Among the outgroups to clade V, there are 3 distinct clades represented, each with at least two species and/or genera represented. Therefore, we believe that the determination of an ancestral condition is well-founded. We plan to add this rationale to the revised version to make this clearer.

      (2, continued) Also, identifying their cells based on comparisons with Ce or Ppa identifications only is understandable but mildly risky: there are many cells in the head, and mistakes would go unnoticed until detailed analysis in each species can provide conclusive evidence.

      We agree that there is a mild risk of incorrect identification but believe that appropriate caveats are noted in the text. Furthermore, the recent head EM reconstruction and complete embryonic cell lineage of the P. pacificus (Cook et al., 2025) shows a nearly 1-1 homology correspondence between head neurons (e.g., only a single head neuron is missing in the Ppa head relative to Cel due to altered apoptosis), and a quite high level of conservation of neurite morphology and soma position between Cel and Ppa suggests that identifications are likely correct when examining related nematodes. In cases for which a serotonin-immunoreactive cell is found in the predicted location (and often having apparent associated neurites), its homology to the matching Cel and Ppa cell is the most parsimonious interpretation: otherwise, one cell would have to lose expression and another nearby cell gain it.  

      (3) It is not reported whether the nictation-defective mutants have general locomotion defects; therefore, whether the reported problem is specific to this host-finding behavior or not.

      None of the mutants we tested for nictation behavior, including those that show severe defects in nictation (Ppa-cat-1, Ppa-tph-1, Ppa-tdc-1, Ppa-tbh-1), exhibited noticeable general locomotion defects either as dauers or non-dauers. Further clarification will be provided in a revised version of the article.

      (4) The section on RIP neurons makes sense for Ppa, but not for Ce (dauers in fact have weakened IL2-to-RIP connections) and should be revised. The nictation data also do not support the breadth of the conclusions, which should either be toned down or rephrased as hypothetical.

      We plan to address these concerns in a revised version of the article.

      (5) The discussion mostly reiterates the results, leaving little room for the author's interpretations and opinions. I would suggest reworking in favor of conceptual discussion.

      As noted above, we agree to address the relative treatment of matters in discussion in a revised version of the article.

      Reviewer #2 (Public review):

      Summary:

      This paper makes important contributions to our understanding of how nervous systems evolve, with a particular focus on whether changes in neurotransmitter usage within homologous neurons represent a mechanism for evolutionary adaptation without large-scale changes to circuitry. Comparing the predatory nematode P. pacificus with C. elegans, this study systematically examines monoamine-producing neurons, assesses how their neurotransmitter identities differ between homologous neural types, and determines how these differences relate to behavior.

      Strengths:

      The major strength of this work is its breadth, rigor, and data quality. It combines multiple, independent lines of evidence to assign neurotransmitter identity for neurons with homology grounded in lineage, morphology, and connectomics, which is essential for meaningful cross-species comparisons. Additionally, by extending the analysis beyond P. pacificus and C. elegans to other nematodes, the authors convincingly argue that features observed in P. pacificus likely reflect an ancestral state. This depth greatly enhances the significance of the conclusions.

      This work is likely to have a significant impact on the fields of comparative neurobiology and nervous system evolution. It demonstrates a powerful system and approach for linking molecular identity, cell-type homology, circuit context, and behavior across species. The data generated here will be a valuable resource for the community and provide a strong foundation for future mechanistic studies.

      More broadly, the study reinforces the idea that evolutionary change in nervous systems can occur through modulation of chemical signaling within conserved circuits, rather than through complete rewiring. This conceptual framework is likely to influence how researchers think about neural evolution in other systems.

      Weaknesses:

      Given the availability of detailed connectivity information for both species, a more explicit comparison of the local circuit context of key neurons would further strengthen the link between molecular identity and circuit function.

      We plan to address these concerns in a revised version of the article.

      Reviewer #3 (Public review):

      Summary:

      The study by Hong, Loer, Hobert, and colleagues is a comprehensive description of monoaminergic neurons in the nematode Pristionchus pacificus. The work used multiple, complementary approaches, including immunostaining and expression of genes involved in neurotransmitter synthesis or transport, to identify neurons that express a monoamine neurotransmitter. Moreover, this study characterized the phenotypes of various mutants to study their organismal function. Extensive comparisons are made to C. elegans, the nematode model that, in a way, anchors the model studied here, and new outgroup species were examined for some features so that the polarity of their evolution could be inferred. Although there is no simple or groundbreaking punchline to distill from the manuscript (i.e., other than some things are the same as in C. elegans, and some things are different), and while the study is basically descriptive in nature, the scope of the project warrants broad attention.

      Strengths:

      This manuscript offers a tremendous resource for those who use this species as a model, which, based on the author list alone, includes many labs. This study sets the bar for what can be done in a "satellite" model system.

      Given the complementarity of approaches used, such as the position of cell bodies, the connectivity and morphology of dendrites, and a previously published atlas of the connectome for this species, the identification of specific neurons (which, as the authors point out, can be easily mistaken) is convincing throughout. Likewise, appropriate caution is observed where neuron identities are ambiguous, e.g., unlabeled cells in Figure 5, or ambiguous identities in other species, as shown in Figure 10. There was a lot of data to unpack in this manuscript, but I could not find any obvious flaws in neuron identification.

      Also, the phenotypic assays were straightforward and informative.

      Weaknesses:

      No serious weaknesses were noted. One minor comment is that in general, I think the Methods could use some additional text to describe what the goal of any given technique was. For example, although there is a description of the HCR protocol in the methods, nowhere does it say what genes this method would be used for. In addition to what is shown in Figure 4, this information should be given in the Methods.

      More detailed methods will be provided in a revised version of the article.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) I found the bigger picture analysis to be lacking. Let us take stock: in other work, during active cognition, including at least one study from the Authors, TDLM shows significance sequenceness. But the evidence provided here suggests that even very strong localizer patterns injected into the data cannot be detected as replay except at implausible speeds. How can both of these things be true? Assuming these analyses are cogent, do these findings not imply something more destructive about all studies that found positive results with TDLM?

      Our focus here is on advancing methodology. Given the diversity of tasks and cognitive states in the TDLM literature, replay could exceed detection thresholds under specific conditions—especially when true event durations align with short analysis windows. While a comprehensive re-analysis of prior datasets is beyond our scope, we agree a concise synthesis can strengthen the paper.

      The previous TDLM literature uses a diverse set of tasks and addresses a broad spectrum of cognitive constructs/processes. As we acknowledge, it is perfectly possible that replay bursts in short time windows are well detectable by TDLM. However, we acknowledge that some commentary on this is warranted and have added the following paragraph to the discussion that addresses “improving TDLMs sensitivity”:

      “Finally, what do our simulations imply for the broader MEG replay literature? Our implementation successfully detects replay when boundary conditions are met, as shown in the simulation. But sensitivity depends critically on high fidelity between the analysis window and the density of replay events. A systematic evaluation of these conditions as they apply to prior studies remains beyond the scope of the current paper. Instead, our focus is on delineating boundary conditions that we hope will motivate conduct of power analyses in future work as well as inclusion of simulations that approximate realistic experimental conditions.”

      (2) All things considered, TDLM seems like a fairly 'vanilla' and low-assumption algorithm for finding event sequences. It is hard to see intuitively what the breaking factor might be; why do the authors think ground truth patterns cannot be detected by this GLM-based framework at reasonable densities?

      We agree with the overall sentiment of the referee. Our intuition is that one of the principal shortcomings of the method relates to spurious sequenceness induced by unknown factors at baseline, and poor transfer of the decoder to other modalities. and have a rough understanding of how they occur, we are currently not in a position to identify their nature. Note that we believe that these confounders are not exclusive to TDLM but are potentially threatening to all kinds of sequenceness analysis of longer time series that rely on decoders. Indeed, we suspect that classifier training is another bottleneck, as we don’t know the exact nature of the representations that are replayed, including the degree of overlap there is with a commonly used visual localizer. That said, this is not of relevance for the simulation in so far as we insert patterns that exceed the pattern strength in the localizer.

      Finally, a potential major drawback is the permutation test for significance testing. As the original authors of TDLM have noted, the current test which permutes states is overly conservative. It measures fixed effects and as it only considers the group level mean it is accordingly easily biased by individual outliers. This we have tried to account for by z-scoring sequenceness scores. We have also conferred on this with some of the authors of TDLM and discussed a yet unpublished method that aims to address this exact issue. The proposed new method uses a sign-flip permutation test at a group level and therefore implements a random-effects model of the data. This significance test has markedly increased power while still controlling for FWER. However, while we show in our power analysis that the new method is indeed more sensitive, it does not materially change the interpretation of the data. We have included this novel method in the paper and added it into the main analysis and most of the simulations.

      (3) Can the authors sketch any directions for alternative methods? It seems we need an algorithm that outperforms TDLM, but not many clues or speculations are given as to what that might look like. Relatedly, no technical or "internal" critique is provided. What is it about TDLM that causes it to be so weak?

      We believe there are several shortcomings and bottlenecks within TDLM that need to be evaluated and improved. While we highlight these issues in the discussion section titled “Improving TDLMs sensitivity,” we agree that we should provide a clearer outline of its current shortcomings. We have now added to the discussion to expand on that we think needs improvement (‘fixed time lag’) and also add a summary statement at the end of the relevant paragraph to recap the main issues needed for an improved successor method. The new paragraphs read:

      “Lastly, there are certain assumptions that TDLM makes that might not hold (see Methods Study II): Current implementations look for a fixed time lag that is the same across all participants and between all reactivation events. If time lags differ across participants, TDLM will fail to find them. Similarly, TDLM assumes a fixed sequence order and is not robust against slight within-sequence permutations or in-sequencemissing reactivation events. However, from other data sources., such as hippocampal place cell recordings, it is known that such permutations can occur where some states are skipped or fail to decode during replay. Similarly, it is assumed that each reactivation event lasts between 10-30 milliseconds, but the true temporal evolution of reactivation measured by TDLM is currently unknown. Future method development might focus on improving invariance to these assumptions.

      […]

      In summary, there are several areas where TDLM might be improved, including a restriction in its search space, improvement in classifiers, a validation of localizer representation transfer to other domains (e.g. memory representations), and the extension of TDLM to render it more robust against violations of its core assumptions.”

      Reviewer #2 (Public review):

      Weaknesses:

      The sample size is small (n=21, after exclusions), even for TDLM studies (which typically have somewhere between 25-40 participants). The authors address this somewhat through a power analysis of the relationship between replay and behavioural performance in their simulations, but this is very dependent on the assumptions of the simulation. Further, according to their own power analysis, the replay-behaviour correlations are seriously underpowered (~10% power according to Figure 7C), and so if this is to be taken at face value, their own null findings on this point (Figure 3C) could therefore just reflect under sampling as opposed to methodological failure. I think this point needs to be made more clearly earlier in the manuscript.

      We agree with the referee that our sample is smaller than previous studies due to participant exclusion criteria. However, the take-away message from our behavioural simulation and bootstrapping is that even with larger sample sizes, it is difficult to overcome baseline fluctuations of sequenceness, even if very strong replay patterns were detectable and sample sizes were of similar size to that of previous studies. Therefore, we are not convinced that that our null findings are fully explained by the smaller sample size compared to that of previous studies, Additionally, we show that even within the range of other studies, similar power would have been expected (Supplement Figure 11). However, it is true that in general null findings can be explained by under-sampling, under the assumption that an effect is present. To amplify this point, we have added the following to the Figure 3C:

      “[…]. NB, however, as our simulation shows, correlations of sequenceness with behavioural markers are likely to be underpowered and occur only with very high replay rates or much higher sample size. See our simulation discussion for a more detailed explanation on how correlations may be inherently biased, where fluctuations in baseline sequenceness overshadow individual scaling with behavioural markers.”

      Furthermore, we have added the following paragraph to the discussion to highlight this point and refer to a power analysis we have now added to the supplement (see next answer):

      “Sample sizes in previous TDLM literature usually range between 20 to 40 participants. A bootstrap power analysis shows that even at those sample sizes, power would remain low unless unrealistically high replay rates are assumed (Supplement Figure 11). Our bootstrap simulation shows that a correlation analysis between sequenceness and behaviour would in these cases be drastically underpowered, even under an assumption of high replay densities.”

      Finally, we have added a remark about the sample size to the limitations section, as naturally, an increase in sample size would yield higher power:

      “Finally, while initially planning for thirty participants, due to exclusion criteria, our study featured fewer participants than most previous studies using TDLM (i.e. usually 25-40, but 21 in our study). While we are confident that our simulation results hold under these sample sizes, as sample sizes of other studies show comparable power to ours (Fehler! Verweisquelle konnte nicht gefunden werden.), we cannot fully rule out a possibility that our null-findings are explained by a lack in power alone.”

      Relatedly, it would be very useful if one of the recommendations that come out of the simulations in this paper was a power analysis for detecting sequenceness in general, as I suspect that the small sample size impacts this as well, given that sequenceness effects reported in other work are often small with larger sample sizes. Further, I believe that the authors' simulations of basic sequenceness effects would themselves still suffer from having a small number of subjects, thereby impacting statistical power. Perhaps the authors can perform a similar sort of bootstrapping analysis as they perform for the correlation between replay and performance, but over sequenceness itself?

      We agree with the referee that this, in principle, is a great idea. However, the way that significance thresholds are calculated poses a conceptual problem for such an analysis: as for significance threshold we are defining the maximum sequenceness value across all participants, all time lags and all permutations. This sequenceness value is compared against the mean of all participants, disregarding the standard deviation. This maximum threshold would not change if we bootstrapped some of our samples. Additionally, the 95% would also not change significantly. To illustrate this point, we have added this analysis to the supplement, as Supplement Figure 10. However, the new sign-flip permutation test we now include allows for such a comparison, as it takes variance between participants into account as well! We have included all three variants of the power analysis and the figure description now reads:

      “Supplement Figure 11 Power analysis of sequenceness significance for bootstrapped samples sizes. A) Powermap for state-permutation thresholds. However, here the bootstrap approach suffers from a conceptual problem: significance thresholds are defined by the permutation maximum and/or 95-percentile of the maximums across all sequence-permutations across participants. If we resample bootstrap-participants from our existing pool, the maximum thresholds computed will remain relatively stable across resampled participants, as it only compares against the mean and disregards the standard deviation. B) The newly presented statistical approach is significantly more sensitive at higher sample sizes. Note that even then, 80% power is only reached with replay density of higher than 50 min-1 at a sample size of 60 participants. Additionally, the sign-flip permutation test assumes that the mean is at zero. As we observed a non-zero mean due to spurious oscillations, we subtracted the mean sequenceness of the baseline condition from each participant before permuting to achieve a null distribution with mean zero, as otherwise, we would have found significant replay effects in the baseline condition at increasing sample size. Nevertheless, due to the higher sensitivity, the new sign-flip test is recommended over the previous sequence-permutation-based test. Colours indicate the power from 0 to 1 for different bootstrapped sample sizes and densities. 80% power thresholds are outlined in black.”

      The task paradigm may introduce issues in detecting replay that are separate from TDLM. First, the localizer task involves a match/mismatch judgment and a button press during the stimulus presentation, which could add noise to classifier training separate from the semantic/visual processing of the stimulus. This localizer is similar to others that have been used in TDLM studies, but notably in other studies (e.g., Liu, Mattar et al., 2021), the stimulus is presented prior to the match/mismatch judgment. A discussion of variations in different localizers and what seems to work best for decoding would be useful to include in the recommendations section of the discussion.

      We agree and thank the referee for raising this issue. Note, we acknowledge we forgot to mention that these trials were excluded from classifier training. Our rationale of presenting the oddball during stimulus presentation, and not thereafter, was an assumption that by first presenting the audio and then the visual cue we would create more generalized representations that would be less modalitydependent. However, importantly, we excluded all trials that were oddballs from localizer training. Therefore we assume that this particular design choice will not greatly affect the decoder training. If some motor-preparation activity is present during the stimulus presentation, then it should be present equally across all trials and hence be ignored by the classifier as we balanced the transitions between images. We now added this information to the main text:

      “In each trial, a word describing the stimulus was played auditorily, after which the corresponding stimulus was shown. In ~11% of cases, there was a mismatch between word and image (oddball trials), and these trials were excluded from the localizer training.” Additionally in the methods section: “These oddball-trials were excluded from all further analysis and decoder training.”

      Nevertheless, we agree that the extant variety in localizer designs is underdiscussed where many assumptions of classifier training are not, as yet, fully validated. We have added a sentence highlighting different oddball paradigms to the section on the discussion of localizers and also add a summary statement with recommendations. The passage now reads:

      “Additionally, a wide variety of oddballs has been used (e.g. upside-down, scrambled, or mismatched images, cues presented visually, as words, auditorily, etc), and at this time it is unclear if these affect the representations that the classifier learns [...] In summary, we would expect a multimodal categorical localizer, and a classifier that isn’t trained on a specific timepoint, to generalize best.”

      Second, and more seriously, I believe that the task design for training participants about the expected sequences may complicate sequence decoding. Specifically, this is because two images (a "tuple") are shown together and used for prediction, which may encourage participants to develop a single bound representation of the tuple that then predicts a third image (AB -> C rather than A -> B, B -> C). This would obviously make it difficult to i) use a classifier trained on individual images to detect sequences and ii) find evidence for the intended transition matrix using TDLM. Can the authors rule out this possibility?

      We thank the reviewer for raising a possibility we have not considered! While there is some evidence that a single bound representation would have overlap with its constituents (especially before long term-consolidation) and therefore be detectable by the classifiers, we acknowledge the possibility that individual classifiers would fail to be sensitive to such a compound representation. In fact we find in the retrieval data some evidence for a combined replay of representations (where representations are replayed seemingly at the same time, see Kern 2024). We have added such a possibility to the interims-discussion of Study 1 as a qualification . However, this does not change the results or interpretation of our simulation which we consider is a key message of the paper.

      The relevant segment in the discussion section now reads:

      “Additionally, given that the stimuli were presented in combined triplets, participants may have formed a singular representation of associated items and subsequently replayed these (e.g., AB→C), instead of replaying item-by-item transitions (A→B→C). Under such a scenario, a classifier trained on individual items may fail to detect these newly formed bound representations, particularly if they diverge strongly from the single-item patterns. In our previous study where we address retrieval (Kern et al., 2024) we found that states were to varying extent co-reactivated, yet classifiers trained on single items retained sensitivity to detect these combined reactivation events. Consistent with this, prior work suggests that unified representations retain overlap with their constituent item representations (Dennis et al., 2024; Liang et al., 2020), however, there’s also evidence that different brain regions are involved if representational unitization occurs (Staresina & Davachi, 2010), potentially confusing classifiers. Therefore, we cannot exclude that rest-related consolidation replays engendered unitized representations that were insufficiently captured by our singleitem classifiers.“

      Participants only modestly improved (from 76-82% accuracy) following the rest period (which the authors refer to as a consolidation period). If the authors assume that replay leads to improved performance, then this suggests there is little reason to see much taskrelated replay during rest in the first place. This limitation is touched on (lines 228-229), but I think it makes the lack of replay finding here less surprising. However, note that in the supplement, it is shown that the amount of forward sequenceness is marginally related to the performance difference between the last block of training and retrieval, and this is the effect I would probably predict would be most likely to appear. Obviously, my sample size concerns still hold, and this is not a significant effect based on the null hypothesis testing framework the authors employ, but I think this set of results should at least be reported in the main text.

      We disagree that an absence or presence of replay might be inferred from an absolute memory enhancement. While consolidation can lead to absolute improvement of performance in, for example, motor memory domains one formulation is that in declarative learning tasks replay stabilizes latent memory traces, and in such a scenario would not necessarily lead to a boosted performance. While many declarative consolidation studies report an increase of performance compared to a control condition (i.e. without a consolidation window), this does not necessarily entail an absolute performance increase, as replay might just act to protect against loss of memory traces. Therefore, the modest increase we observe does not inference as to the presence of absence of replay absent a proper control condition.

      We did expect to find a correlation between replay and individual behavioural. Indeed, a weak correlation with performance and sequenceness can be detected. However, as we also show any such correlation is overshadowed by baseline fluctuations in sequenceness such that its overall validity is questionable, even under very high replay rates. We are therefore circumspect about this correlation, even if it was significant. Therefore, in the discussion, we chose to refrain from putting much focus on this correlation. Nevertheless, we do add a short statement to the corresponding figure label, discussing this precise issue. The segment now reads:

      “While we found a non-significant relation between a memory performance enhancement and post-learning forward sequenceness we are cautious not to overinterpret these results. As in the section “Correlation with behaviour only present at high replay speeds” the noted correlational measure oscillates heavily with baseline sequenceness fluctuations, and any true replay effect is likely to be overshadowed by such fluctuations.”

      I was also wondering whether the authors could clarify how the criterion over six blocks was 80% but then the performance baseline they use from the last block is 76%? Is it just that participants must reach 80% within the six blocks *at some point* during training, but that they could dip below that again later?

      We thank the reviewer for highlighting this point: The first block wherein participants reached >80% ended the learning blocks. After a maximum of six blocks the learning session was ended regardless of performance. Therefore, some participants’ learning blocks were ended after six blocks and without them reaching a performance of 80%.. While we described this in the Methods section, it was missing from the Results Study I section, which now contains:

      “[...] Participants then learned triplets of associated items according to a graph structure. Within the learning session, participants performed a maximum of six learning blocks, but the session was stopped if participants reached 80% memory performance (criterion learning,, up to a memory performance criterion of 80% (see Methods for details)”

      The Figure 2 description now contains

      “[...] Participants’ completed up to six blocks of learning trials. After reaching 80% in any block, no more learning blocks were performed (criterion learning) [...]”

      Lastly, there was a mistake in the Behavioural results section, which stated “All thirty participants, except one, [..] to criterion of 80%.” This is an error. In our preregistration, we defined to only include participants that successfully learned anything at all above chance. Here,we meant that only one participant failed to reach a criterion that we defined as “successful learning”. We fixed it and it now reads

      “with an accuracy above 50% (which we preregistered beforehand as an exclusion criterion for “successful learning above chance”).”

      Additionally, we have noted this for clarity in the methods section and excuse this mistake:

      “Additionally, as successful above-chance learning was necessary for the paradigm, we ensured all remaining participants had a retrieval performance of at least 50% (one participant had to be excluded, but was already excluded due to low decoding performance).”

      Because most of the conclusions come from the simulation study, there are a few decisions about the simulations that I would like the authors to expand upon before I can fully support their interpretations. First, the authors use a state-to-state lag of 80ms and do not appear to vary this throughout the simulations - can the authors provide context for this choice? Does varying this lag matter at all for the results (i.e., does the noise structure of the data interact with this lag in any way?)

      This was a deliberate choice but we acknowledge the reasoning behind this was not detailed in our initial submission. We chose a lag of 80 millisecond for three reasons: first, it is distant from the 9-11 Hz alpha oscillations we observed in our participants and does not share a harmonic with the alpha rhythm; second, we wanted to get a clear picture of the effect of simulated replay that is as isolated as possible from spurious sequenceness confounders present in the baseline condition. Thus, we chose a lag in which the sequenceness score was close to zero in the baseline condition; thirdly , in this revision, we subtracted the mean sequenceness value of the baseline such that any simulation effects would start, on average, at zero sequenceness. In this way, we could attribute any increase in sequenceness to the experimentally inserted replay, that was independent of spurious oscillations. Finally (but less importantly), as we observed that a correlation of sequenceness with behaviour was fluctuated strongly, for the reason detailed above, we chose a lag in which a correlation was as close as possible to zero. If we had not chosen a lag that adhered to these conditions, we were at risk of measuring simulated replay plus spurious sequenceness confounders.

      We have added a sentence to the main text detailing this justification:

      “We chose this timepoint (80 msec state to state lag) as its sequenceness value was close to zero in the baseline condition as well as being distant to the observed alpha rhythms of the participants (which varied between ~9-11 Hz). Additionally, we subtracted the mean sequenceness value of the baseline at 80 milliseconds lag such that any simulation effects would, on average, start at zero sequenceness “

      Additionally, we now add a more detailed explanation to the methods section.

      “This time lag (80 msec) was chosen in order to isolate precisely an effect of the experimentally inserted sequenceness. Thus, we chose a lag at which the mean baseline sequenceness was close to zero and where the correlation with behaviour was low. Additionally, we subtracted the mean sequenceness value (at 80 milliseconds) at baseline from the specific lag recorded for each participant, such that simulation effects would be initialized at zero sequenceness on average enabling any effects to be attributed purely to inserted replay. Additionally, we excluded time lags too close to the alpha rhythms of participants (which varied between ~9-11 Hz) or lags which would have a harmonic with the rhythm.”

      Second, it seems that the approach to scaling simulated replays with performance is rather coarse. I think a more sensitive measure would be to scale sequence replays based on the participants' responses to *that* specific sequence rather than altering the frequency of all replays by overall memory performance. I think this would help to deliver on the authors' goal of simulating an "increase of replay for less stable memories" (line 246).

      The referee makes an excellent point and our simulations could be rendered more realistic by inserting the actual tuples that participants answered correctly. If we understand the point correctly, there are two different ways replay might be impacted by performance: First, we can conjecture that there is greater replay if memory performance is not saturated. Second, replay only occurs for content that has actually been encoded!

      The main reasons why we chose to simulate the entire sequence being replayed for each participant is based on the following. TDLM is implemented such that the amount of replay alone is relevant, and actual transitions are not affecting the results beyond noise. Under the assumption that class-specific classifiers perform equally well, simulating A->B, B->C or simulating A->B, A->B yields equivalent results. However, results can differ if this assumption is violated. By drawing from the entire space of classes we insert, we minimize the risk of some classifiers being worse than others for some participants. For example, if we simulated only A->B for some participant instead of the whole sequence, and by chance classifier A performs suboptimally, we would then introduce additional unwanted variance into our results.

      Secondly, from our reading of the literature we infer that replay is increased generally (i.e. density of learning-specific replay is increased) for less stable memories. However, we do not have indicators of memory strength, but only a binary “remembered or not”. As TDLM is invariant to the actual transitions being replayed and only indexes the number of transitions, we chose to ignore which transitions we insert and only scaled the amount of replay.

      We have added an analysis to the Appendix that discusses this specific aspect of our study where we show that results are equivalent if we simulate replay of “A->B B->C C->D” or only “A->B A->B A->B A->B”. As we do not know how replay density interacts with memory trace stability, we opted to leave the current simulation as is. The corresponding paragraph and figure description now read:

      “From literature we know that replay is increased after learning and that less stable memories are replayed more often. We simulated this effect by scaling our replay density inversely with performance. However, for simplicity, in our simulation, we inserted sampled transitions from all valid transitions given by the graph structure, i.e., the following transitions were valid: However, this meant that some participants would have transitions inserted that they didn’t actually remember. To show that this would not change results, we simulated two scenarios: In the full sequence scenario, all valid graph transitions are inserted (i.e. all participant’s replay is sampled from 'A->B, B->C, C->D, D->E, E->F, F->G, G->E, E->H, H->I, I->B, B->J, J->A'). In the second scenario (memorized transitions) we only replayed transitions that the participant actually retrieved correctly during the post-resting state testing sessions (i.e. a participant’s replay would have been sampled from ‘A->B, B->C, G->E, E->H, H>I’, if those were the ones he remembered). In both scenarios, the number of events is kept constant. The results are equivalent as can be seen in Appendix A Figure 3. NB this only holds under the assumptions that classifiers are equally good at decoding each class.”

      […]

      “TDLM is insensitive towards which transitions are replayed and only sensitive to how many transitions are detected in total. Here we simulate transitions either sampled from the full graph (light orange/green) or participant-specific transitions of trials that participants correctly remembered (dark orange/green). Shaded areas denote the standard error across participants.”

      On the other hand, I was also wondering whether it is actually necessary to use the real memory performance for each participant in these simulations - couldn't similar goals (with a better/more full sampling of the space of performance) be achieved with simulated memory performance as well, taking only the MEG data from the participant?

      The decision to use real memory performance is indeed arbitrary. We could have also used randomly sampled values. However, as we wanted to understand our nullresults better we opted to use real performance to adhere as close as possible to the findings we previously reported. Using uniformly sampled memory performance would be less explanatory w.r.t to our actual results of the resting state data that are reported in the first study we report in the manuscript (Study I).

      Nevertheless, our current implementation already presents an approach that samples the entire performance range for the sub-analysis focusing on the correlation with behaviour. Here, in the section on “best-case”-scenario, we implement this such that it spans factors from 1 to 0 (i.e., a participant with 100% performance gets a replay scale factor of 0 and hence no replay simulated, and the worst performing participant with 50% performance has a replay rate multiplied by 1). We scale the amount of replay with this factor. As a correlation is invariant to linear scaling, statistically this is equivalent to stretching the performance distribution from 0 to 100%. We have added a sentence to the methods to provide further focus on this point:

      “To assess how performance might affect replay in our specific dataset, we chose to use the original participants’ performance values instead of uniformly sampling the performance space (which ranged from 50 to 100%). However, for the correlation analysis, we additionally added a “best-case” scenario, in which we scale replay from 0 to 1, an approach that is statistically equivalent to scaling values to the full space of possible performance (0 to 100%) (see Results Study II: Simulation).”

      Finally, Figure 7D shows that 70ms was used on the y-axis. Why was this the case, or is this a typo?

      Thanks, this is indeed a typo, we fixed it.

      Because this is a re-analysis of a previous dataset combined with a new simulation study on that data aimed at making recommendations about how to best employ TDLM, I think the usefulness of the paper to the field could be improved in a few places. Specifically, in the discussion/recommendation section, the authors state that "yet unknown confounders" (line 295) lead to non-random fluctuations in the simulated correlations between replay detection and performance at different time lags. Because it is a particularly strong claim that there is the potential to detect sequenceness in the baseline condition where there are no ground-truth sequences, the manuscript could benefit from a more thorough exploration of the cause(s) of this bias in addition to the speculation provided in the current version.

      We are currently working on a theoretical basis to explain these spurious sequenceness confounders in the baseline condition. Indeed, in our preliminary work, in certain contexts we can induce significant sequenceness in the absence of any replay signal during baseline. However, this work is at an early stage and we still have some conceptional problems to solve before we are confident enough with these data. We believe at present it would be premature to add these data to the current manuscript. Nevertheless, we now mention these spurious sequenceness confounders to raise awareness for the field and also add greater context to the discussion, highlighting one of the issues that we think is of importance:

      “[…] For example, if two classifiers’ probabilities oscillate at 10 Hz but at a different phase, a spurious time lag can be found reflecting this phase shift. We speculate that more complex interactions between classifiers oscillating at different phases are also conceivable.”

      In addition, to really provide that a realistic simulation is necessary (one of the primary conclusions of the paper), it would be useful to provide a comparison to a fully synthetic simulation performed on this exact task and transition structure (in addition to the recreation of the original simulation code from the TDLM methods paper).

      Thank you for this suggestion! We have now added a synthetic simulation, trying to keep as close as possible to the original simulation code in Liu et al. (2021), while also incorporating our current means of simulating the data (i.e. scaling by performance). We think this synthetic simulation greatly improves the paper and gives weight to our suggestion about the superiority of a hybrid approach. Additionally, it prompted us to look closer at patterns that are inserted in the synthetic simulation and perform a comparative analysis. We have now added the simulation to the main text, together with a methodological explanation of how we simulated the data in the methods section. We also added a discussion on the results and why we think a hybrid approach is currently superior to synthetic approach. The whole new section is too long to paste here – it is found after the main simulation section in the manuscript. We have also added another sentence to the abstract referring to this new inclusion.

      Finally, I think the authors could do further work to determine whether some of their recommendations for improving the sensitivity of TDLM pan out in the current data - for example, they could report focusing not just on the peak decoding timepoint but incorporating other moments into classifier training.

      While we do understand the desire to test further refinement to TDLM on the data directly, we intentionally do not include such analyses in the current paper. Our experience also informs us that there is an enormous branching factor of parameters when applying TDLM, with implications for significance of results in one or other direction. However, as there are currently only limited ways to know how well parameter changes actually improve the sensitivity to replay versus exacerbate potential underlying confounders that induce spurious sequenceness (e.g., we can get significant replay in the control condition with some parameter changes). To exclude such false positive findings, we opt for a relatively strict adherence to previously published approaches. Thus, in the current paper, we limit ourselves to assessing the reliability and robustness of previous approaches.

      Furthermore, while training on a later timepoint might increase sensitivity for a classifier when transferring between different modalities (e.g. visual to memory representation), this approach does not transfer well in our simulations, as the inserted patterns are from the same modality. We consider other, more bespoke studies, are better suited to improve classifier training. NB also see our recently started Kaggle challenge to tackle this problem: https://www.kaggle.com/competitions/the-imagine-decoding-challenge

      However, we have added a note about this dilemma to the improvement section. The section now includes:

      “Nevertheless, as the considerable branching factor poses a threat of increased falsepositive findings we opt to focus the current simulations on previously published pipelines and parameters. Future studies should systematically evaluate parameter choices on TDLM under different conditions, something that is beyond the remit of the current study.”

      Lastly, I would like the authors to address a point that was raised in a separate public forum by an author of the TDLM method, which is that when replays "happen during rest, they are not uniform or close." Because the simulations in this work assume regularly occurring replay events, I agree that this is an important limitation that should be incorporated into alternative simulations to ensure the lack of findings is not because of this assumption.

      The temporal distribution of replay throughout the resting state should not matter, as TDLM is invariant w.r.t to how replay events are distributed within the analysis window. Specifically, it does not matter if replay events occur in bursts or are uniformly distributed. Only the number of transitions is relevant, where they occur or if they are close to each other is not relevant to the numerical results (as long as the refractory window is kept, too short distances will lead to interactions between events and reduce sensitivity).). To emphasize this point, we have added another simulation which is shown in Appendix A.1 and Appendix A Figure 1. We have referenced it in the text and added the following paragraph in the Methods section

      Additionally, the timepoints of inserting replay within the resting state are sampled from a uniform distribution. Even though TDLM tracks reactivation events over time, at a macro-scale the algorithm is invariant to the temporal distribution. At each time step, the GLM regresses onto a future time step up to the maximum time lag of interest, yielding a predictor per lag. However, these predictors within the GLM are independently assessed, and hence, TDLM is, outside of the time lag window, relatively invariant to the temporal distribution of replay. To demonstrate our claim, we simulated uniform replay vs “bursty” replay that only occurs in some parts of the resting state, both yield equivalent sequenceness results (see Appendix A.1).

      Reviewer #3 (Public review):

      (1) I am still left wondering why other studies were able to detect replay using this method. My takeaway from this paper is that large time windows lead to high significance thresholds/required replay density, making it extremely challenging to detect replay at physiological levels during resting periods. While it is true that some previous studies applying TDLM used smaller time windows (e.g., Kern's previous paper detected replay in 1500ms windows), others, including Liu et al. (2019), successfully detected replay during a 5-minute resting period. Why do the authors believe others have nevertheless been able to detect replay during multi-minute time windows?

      (Due to similarity, we combined our responses with the first question of Reviewer 1)

      We are reluctant to make sweeping judgments in relation to previous literature as we wanted to prioritize on advancing methodology instead. The previous TDLM literature uses a diverse set of tasks and cognitive processes. As we state ourselves, it is possible that replay bursts in short time windows are well detectable by TDLM. We were intentionally cautious to directly critique previous studies without detailed re-analysis of their work and wanted to leave such a conclusion up to the reader. However, we realize that such a “thought-starter” might be warranted and improve the paper. Therefore, we have added the following paragraph to the discussion about “improving TDLMs sensitivity”:

      “Finally, what do our simulations imply for the broader MEG replay literature? Our implementation successfully detects replay when boundary conditions are met, as shown in the simulation. But sensitivity depends critically on high fidelity between the analysis window and the amount of replay events. A systematic evaluation of these conditions across prior studies is beyond the scope of this paper, so we do not want to adjudicate earlier findings and leave this assessment up to the reader. Instead, we delineate the boundary conditions and urge future work to conduct power analyses where possible and include simulations that approximate realistic experimental conditions.”

      For example, some studies using TDLM report evidence of sequenceness as a contrast between evidence of forwards (f) versus backwards (b) sequenceness; sequenceness was defined as ZfΔt - ZbΔt (where Z refers to the sequence alignment coefficient for a transition matrix at a specific time lag). This use case is not discussed in the present paper, despite its prevalence in the literature. If the same logic were applied to the data in this study, would significant sequenceness have been uncovered? Whether it would or not, I believe this point is important for understanding methodological differences between this paper and others.

      This approach was first introduced as part of a TDLM-predecessor that utilized crosscorrelations (Kurth-Nelson 2016), where this step is a necessity to extract any sequenceness signal at all by subtracting signals that are present in both (akin to an EEG reference). However, its validity is less clear when fwd and bkw are estimated separately, as is in the GLM case. The rationale behind subtracting here is the same as for autocorrelations: there are oscillatory confounds present in the data that introduce spurious sequenceness in both directions alike, i.e. at the same time lag, that can simply be removed by subtracting. However, this assumption only holds if the sole confounder is auto-correlations caused by a global signal that oscillates at all sensors at the same phase. In our own experience, and mentioned in the discussion, we do not think this assumption holds. Arguably, there are more complex interactions at play that cannot be removed by such a subtraction such as an increase in false positives if confounders are in an opposite direction at a specific time lag. This assumption-violation can be seen in our baseline condition, where other spurious sequenceness diverges in opposite directions for some time lags (e.g. at ~90 ms where forward sequenceness is negative and backward sequenceness is positive). We reasoned that oscillatory confounds are more stable when comparing pre vs post for the same direction than comparing within session between forward minus backward.

      Finally, we note issues introduced by the various ways that sequenceness has been analysed in previous papers: normalization of sequenceness (z-scoring across time lags or across participants or not at all), normalization of probabilities (taking raw decision scores, z-scoring, soft-max, dividing by mean, subtracting mean), taking a windowed approach and summing sequenceness scores, not to mention the various classifier choices that can be made, and all of this can be applied before subtracting conditions from each other or before subtraction. In our experience there is insufficient regard to control for multiple comparison when running all these analyses risking selectivity in reporting.

      Nevertheless, subtracting forward from backward replay is probably as valid as post minus pre. Therefore, we have added fwd-bkw plots to the supplement and explained some of the reasoning for not reporting them in the main text in the figure label. The figure label and reference now read:

      “Finally, we report forward minus backward sequenceness and our motivation for using an across-session post-pre comparison instead of within-session forwardbackward in Supplement Figure 10.”

      […]

      “Forward minus backward sequenceness within each resting state session. Previous papers often report subtraction of backward from forward sequenceness (fwd-bkw) as a means to remove oscillatory confounds that impact both sequenceness directions in synchrony. While required in early cross-correlation approaches (KurthNelson et al., 2016), its validity in GLM-based frameworks depends on an assumption that confounds are global and in-phase across sensors. We observed this assumption is violated in our baseline data, where spurious sequenceness occasionally diverges in opposite directions at specific time lags (e.g., ~90 ms). In such instances, subtraction would increase the false-positive rate rather than suppress noise. In Figure 3B, we prioritized the comparison of pre-task versus post-task sequenceness within the same direction, as oscillatory confounds appeared more stable across time within a single direction, as opposed to across directions within a single session. However, we consider both approaches are valid. We now provide the fwd-bkw plots for completeness and comparison with previous literature. A) forward minus backwards sequenceness for Control (left) and Post-Learning resting-state (right). B) T-value distribution of the sign-flip permutation test for Control (left) and Post-Learning resting-state (right)”

      (2) Relatedly, while the authors note that smaller time windows are necessary for TDLM to succeed, a more precise description of the appropriate window size would greatly improve the utility of this paper. As it stands, the discussion feels incomplete without this information, as providing explicit guidance on optimal window sizes would help future researchers apply TDLM effectively. Under what window size range can physiological levels of replay actually be detected using TDLM? Or, is there some scaling factor that should be considered, in terms of window size and significance threshold/replay density? If the authors are unable to provide a concrete recommendation, they could add information about time windows used in previous studies (perhaps, is 1500ms as used in their previous paper a good recommendation?).

      We currently do not have an empirical estimate of which window sizes are appropriate. While we used 1500ms in our previous paper, this was solely given by the experiment design which had a 1.5s wait period before the next stimulus. Our recommendation for best guidance on this matter would be to investigate related intracranial literature for SWR rate increases under similar experimental conditions. We have added the following paragraph to the discussion:

      “At this stage we cannot offer a general recommendation for window sizes as they are likely to depend on details of the research paradigm. However, intracranial recordings can be used as proxy to estimate the duration of replay bursts, for example as reported in (Norman et al., 2019) where increased SWRs were seen up to 1500 ms after retrieval cue onset”

      (3) In their simulation, the authors define a replay event as a single transition from one item to another (example: A to B). However, in rodents, replay often traverses more than a single transition (example: A to B to C, even to D and E). Observing multistep sequences increases confidence that true replay is present. How does sequence length impact the authors' conclusions? Similarly, can the authors comment on how the length of the inserted events impacts TDLM sensitivity, if at all?

      Good point! So far, most papers do not seem to include multi-step TDLM and in our experience rightfully, as it is conceptionally difficult to define clear significance thresholds while keeping in mind that shorter sub-sequences are contained within a longer sequence (e.g. ABC contains both AB and BC and a longer dependency of AC) that renders it difficult to define the correct way to create a null distribution for the permutation test. Therefore, we tried to stay as close as possible to previous approaches and only looked for single-step transitions. Nevertheless, we have added an analysis to the supplement comparing how TDLM behaves if we simulate A->B->C or A->B and separate B->C. It shows that TDLM is only sensitive to the number of transitions present in the data, and it does not matter if they are chained or chunked. The segment reads:

      “We intentionally designed our study to encourage replay of triplets. However, this begs the question as to whether it matters if triplets or individual chunks of a sequence are replayed at different time points? Here, we simulated two scenarios. In one, we inserted replay of single transitions alone with a refractory period, e.g. A->B and separate B->C transitions. In a second scenario, we simulate replay of chained triplets, e.g. A->B->C, with a distance of 80 milliseconds each. Importantly, we kept the number of transitions constant (i.e., A->B, … B->C and where A->B->C would both have 2 transitions. This creates a context wherein a four-minute resting state would have ~100 events of A->B->C inserted and ~200 events of A->B or B->C, such that in both cases this results in the same number of single step transitions. We found both are equivalent, with TDLM agnostic to the length of sequence trains, i.e., it does not matter if replay is chunked or chained under the assumption that the number of transitions remains fixed, as can be seen in Appendix A Figure 2”

      And the reference Figure description reads:

      “TDLM is invariant to the length of sequence replay trains under an assumption that the number of target transitions (e.g. single steps) is fixed. We simulated replay either as two temporally separate A->B, B->C events (light orange/green) or as a single A>B->C event (dark orange/green), both yielding equivalent sequenceness. Shaded areas denote the standard error across participants”

      For example, regarding sequence length, is it possible that TDLM would detect multiple parts of a longer sequence independently, meaning that the high density needed to detect replay is actually not quite so dense? (example: if 20 four-step sequences (A to B to C to D to E) were sampled by TDLM such that it recorded each transition separately, that would lead to a density of 80 events/min).

      Indeed, this is an interesting proposal. We intentionally kept our simulation close to the way previous simulations were set-up (i.e. Liu & Dolan et al 2021, Liu & Mattar 2021) by simulating one-step transitions and simulated them such that there is no overlap between separate events (e.g. by defining a refractory period). If the duration of replay is increased then we would also need to increase the length of the refractory period, resulting in a reduced upper limit of how much replay can occur in a 1-minute time window. This in turn would approximate roughly the same number of transitions that can be inserted into the resting state and, as detailed above, would yield the same results. Nevertheless, as we chose to use replay density and not transition density as a marker, the density would be reduced, even if the number of transitions stay the same. We have added an analysis using multi-step replay to the supplement and discuss its implications and caveats. In the main discussion we have added the following segment:

      “Similarly, in our simulation, for simplicity and to keep consistency with previousstimulations, we restricted replay events to span two reactivation events. While the characteristics of replay as measured by TDLM are unknown, it is conceivable that several steps can be replayed within one replay event. We show that the vanilla version of TDLM is fundamentally sensitive to the number of single-step transitions alone, and disregards if these are replayed chained or chunked (Appendix A.2 and Appendix A Figure 2). Nevertheless, if the number of reactivation events chained within a replay event increases, TDLMs sensitivity is increased relative to the replay density and thresholds are reached earlier (see Appendix A Figure 4). See Appendix A.4 for a simulation of multi-step replay events and our discussion of the caveats.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please label the various significance thresholds in the legend of Figure 3.

      We have labelled all the thresholds in the figure legends.

      Reviewer #2 (Recommendations for the authors):

      I think that some of the clarity is hampered because there is a bit too much reliance on explanations from the previous paper using this task, which hampers clarity in the paper. For example, Figure 1 is not particularly useful for understanding the study in its current form; I found myself relying almost exclusively on Supplementary Figure 1 (which is from the previous paper). I'd recommend presenting some version of SF1 in the main text instead. Another example of this overreliance on the previous paper is that, as far as I can tell, the present paper never explicitly states which transitions are being tested in TDLM. In the prior work, it states "all allowable graph transitions", and so I assumed this was the same here, but the paper should standalone without having to go back to the other study. I'd recommend that the authors revise the paper in these and other places where the previous paper is mentioned.

      Thanks for raising this point! We were uncertain ourselves how to deal with the overlap in content and did not want to bloat the paper or plagiarize ourselves too much. On the advice of the referee have implemented the following to improve the manuscript and reduce a reliance on the previous paper:

      Supplement Figure 1 is indeed crucial to understanding the experiment. We have moved it to the methods section under Methods: Procedure

      Added more stimulus description to the Methods: Localizer section

      Included more details about the localizer and graph learning that were missing before

      We have added the note about which transitions we were looking for in the Methods section. Additionally, we have added this information to the Results section of Study 1.

      There are also a few typos I noticed:

      (1) Line 73: "during in the context of."

      (2) Line 287: " to exploring the."

      We fixed the typos.

      Reviewer #3 (Recommendations for the authors):

      (1) Why did the authors choose an 80ms state-to-state time lag for their simulation? I believe they should make the reason for this decision clear in the main text.

      Indeed, this point was also raised by the other reviewer. We have added a sentence to the main text about the rationale behind this decision:

      “We chose this timepoint (80 millisecond state-to-state lag) as its sequenceness value was close to zero in the baseline condition as well as being distant to the observed alpha rhythms of the participants (which varied between ~9-11 Hz). Additionally, we subtracted the mean sequenceness value of the baseline at 80 millisecond lag such that any simulation effects would, on average, start at zero sequenceness.“

      Additionally, we have added some further explanation to the Methods section.

      “This time lag (80 msec) was chosen in order to isolate precisely an effect of the experimentally inserted sequenceness. Thus, we chose a lag at which the mean baseline sequenceness was close to zero and where the correlation with behaviour was low. Additionally, we subtracted the mean sequenceness value (at 80 milliseconds) at baseline from the specific lag recorded for each participant, such that simulation effects would be initialized at zero sequenceness on average enabling any effects to be attributed purely to inserted replay. Additionally, we excluded time lags too close to the alpha rhythms of participants (which varied between ~9-11 Hz) or lags which would have a harmonic with the rhythm.“

      (2) Line 168: Can the authors define what these conservative and liberal criteria are in the text?

      We have added definitions of the criteria in the text. The text now reads:

      “[..] significance thresholds (conservative, i.e. the maximum sequenceness across all permutations and timepoints or liberal criteria, i.e. the 95% percentile of aforementioned sequenceness).”

      (3) Line 478: "calculate" instead of "calculated".

      (4) Figure 7 D: y-axis is labeled "70 ms" I believe it should be labeled 80 ms.

      Thanks, we fixed the two typos.

      (5) With replay defined as sequential reactivation at a compressed temporal timescale, many of the iEEG citations (lines 54-55) do not demonstrate replay (they show stimulus reinstatement or ripple activity, but not sequential replay). Replay studies in humans using intracranial methods have been mostly limited to those measuring single-unit activity, a good example being Vaz et al., 2020 (https://www.science.org/doi/10.1126/science.aba0672).

      We agree that, under a strict definition articulated by Genzel et al. that defines replay as sequential reactivation, many prior human iEEG studies are better described as stimulus reinstatement or ripple-related activity rather than true sequence replay. We have revised the text accordingly and now highlight the few intracranial microelectrode studies that demonstrate replay of firing sequences at the cellular/ensemble level in humans (Eichenlaub et al., 2020; Vaz et al., 2020), distinguishing these from macro-scale iEEG work providing indirect evidence alone.

      The revised paragraph now reads:

      “Replay has been shown using cellular recordings across a variety of mammalian model organisms (Hoffman & McNaughton, 2002; Lee & Wilson, 2002; Pavlides & Winson, 1989). Replay studies in humans using intracranial recordings are few, but include work demonstrating compressed replay of firing-pattern sequences in motor cortex during rest (Eichenlaub et al., 2020) as well as single-unit replay of trialspecific cortical spiking sequences during episodic retrieval (Vaz et al., 2020). By contrast, most iEEG studies report stimulus-specific reinstatement or ripple-locked activity changes without explicit demonstration of temporally compressed sequential replay (Axmacher et al., 2008; Staresina et al., 2015). As these methods are only applied under restricted clinical circumstances, such as during pre-operative neurosurgical assessments, this limits opportunities to investigate human replay. Therefore, this gives urgency to efforts aimed at developing novel methods to investigate human replay non-invasively.”

      (6) The expectations about replay frequency are grounded in literature on hippocampal replay sequences. However, MEG captures signals from across the entire brain, and the hippocampal contribution is likely relatively weak compared to all other signals. This raises an important question: is TDLM genuinely unable to detect replay at physiological (i.e., hippocampal) levels, or is it instead detecting a different form of sequential reactivation - possibly involving cortex or other regions - that may occur more frequently? More broadly, when we have evidence of replay from TDLM, do we believe it is the same thing as replay of CA1 place cell spiking sequences, as detected in rodents? Commenting on this distinction would help further develop theories of replay and what TDLM is measuring.

      This is indeed an important point that has garnered relatively little attention. While there is some evidence of a relation to hippocampal replay in form of high-frequency power increase in the hippocampus, ultimately it is not possible to know without intracranial recordings, as signal strength from those regions is rather poor in MEG.

      We have added the following segment to the manuscript that discusses these issues:

      “However, while we are using indices of SWRs as a proxy for replay density estimation, the relationship between hippocampal replay and replay detected by TDLM remains uncertain. While current decoding approaches measure replay-like phenomena on cortical sites, previous papers have reported a power increase in hippocampal areas coinciding with replay episodes as detected by TDLM. Nevertheless, it is conceivable that cortical replay found by TDLM could occur independently of hippocampal replay and SWRs and be generated by different mechanisms. Some TDLM-studies find a replay state-to-state time lag of above 100 ms, much slower than e.g. previously reported place cell replay. Future studies should employ simultaneous intracranial and cortical surface recordings to establish the relationship between hippocampal replay and replay found by TDLM.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

      We thank reviewer for the careful reading of our manuscript, the accurate summary of the prevailing model, and the positive assessment of the rigor of our measurements. We agree that much prior literature reports increased oxygen consumption following LDH inhibition, and we recognize that our finding—coordinated suppression of glycolysis, the TCA cycle, and OXPHOS—differs from this prevailing interpretation. We address below the reviewer’s main concern regarding the 6-hour time point and clarify the conceptual scope of our study.

      (1) Scope: steady-state metabolic regulation versus immediate transient effects

      The reviewer raises an important point that many metabolic perturbations can trigger rapid, transient responses within seconds to minutes, whereas our measurements were performed after sustained LDH inhibition. We agree that very early time points would be required if the primary goal were to isolate the most immediate, proximal consequence of LDH inhibition before downstream propagation. However, the objective of our study is different: we aim to characterize the metabolic steady state re-established after sustained inhibition of LDH activity, because this adapted steady state is more relevant for understanding long-term metabolic consequences and therapeutic outcomes of LDH inhibition in cancer cells.

      (2) Genetic LDHA/LDHB knockout: comparison of two steady states

      A related point applies to the LDHA/LDHB knockout models. We fully agree that the knockout process necessarily involves a temporal perturbation during cell line generation and adaptation. Nevertheless, the experimental comparison in our study is explicitly between two steady states: the baseline steady state of control cells and the steady state achieved after stable genetic disruption of LDHA or LDHB. The observation that LDHA or LDHB knockout alone had minimal effects on glycolysis and respiration indicates that partial reduction of LDH activity can be compensated in a steady-state manner, consistent with the exceptionally high catalytic capacity of LDH in cancer cells relative to upstream rate-limiting enzymes.

      (3) LDH-activity-dependent quantitative relationships support stable metabolic states

      Importantly, our conclusions do not rely on a single inhibitor condition at a single time point. Rather, we established quantitative steady-state relationships between residual LDH activity and pathway behavior across a wide range of LDH inhibition. These LDH-activity-dependent data strongly support that the system resides in stable metabolic states at different degrees of LDH activity, rather than reflecting non-specific collapse due to prolonged stress.

      Specifically, we observed that when LDH activity was reduced from 100% to approximately ~9% (e.g., by genetic perturbation and partial pharmacologic inhibition), glucose consumption and lactate production remained essentially unchanged, indicating maintenance of a steady-state glycolytic flux despite substantial LDH inhibition. Only when LDH activity was further reduced below this threshold did glycolytic flux decrease in a graded manner, consistent with a nonlinear control structure (Figure 8 A & B)).

      Likewise, the isotope tracing results showed distinct LDH-activity-dependent transitions in TCA cycle labeling patterns. Over the range in which LDH activity decreased from 100% to ~9%, the [<sup>13</sup>C<sub>6</sub>]glucose-derived labeling pattern of citrate remained largely unchanged, whereas deeper inhibition led to a decrease in m2 citrate with a compensatory rise in higher-order citrate isotopologues, consistent with altered flux entry versus cycling/retention in the TCA cycle (Figure 8C). Similarly, [<sup>13</sup>C<sub>5</sub>]glutamine tracing revealed that deeper LDH inhibition reduced the direct m5 contribution, accompanied by corresponding shifts in other isotopologues (Figure 8D). These graded, quantitative transitions—rather than an abrupt global failure—support the interpretation of distinct metabolic steady states across LDH activity levels, linking LDH inhibition to changes in both glycolysis and mitochondrial metabolism.

      (4) Reconciling discrepancies with prior studies

      We agree that multiple prior studies have reported increased oxygen consumption or enhanced oxidative metabolism following LDH inhibition in cancer cells. However, we note that this prevailing notion often persists because LDH inhibition is frequently discussed by analogy to the classical Pasteur and Crabtree effects, in which cells toggle between fermentation and respiration depending on oxygen and glucose availability. We believe this analogy can be misleading.

      In the Pasteur effect, the metabolic shift is primarily driven by oxygen limitation, i.e., restriction of the terminal electron acceptor for the mitochondrial electron transport chain, which enforces reliance on fermentation. In the Crabtree effect, high glucose availability suppresses respiration through regulatory mechanisms while glycolysis is strongly activated. Both phenomena are fundamentally controlled by oxygen availability and respiratory capacity, rather than by inhibition of a specific cytosolic enzyme.

      By contrast, LDH inhibition is mechanistically distinct: it directly perturbs cytosolic redox recycling by limiting NADH-to-NAD<sup>+</sup> regeneration and can therefore constrain upstream glycolytic flux (particularly at GAPDH) and reshape pathway thermodynamics. Under conditions where LDH inhibition sufficiently limits effective NAD<sup>+</sup> availability and reduces glycolytic flux into pyruvate, the downstream consequence is reduced carbon input into the TCA cycle and suppressed OXPHOS—consistent with our experimental measurements. We therefore suggest that divergent outcomes reported across studies likely reflect differences in residual LDH activity, cell-type–specific metabolic wiring, and the extent to which glycolytic flux remains sustained versus becoming redox-limited upstream, rather than a universal Pasteur/Crabtree-like “switch” from fermentation to respiration. Accordingly, interpreting LDH inhibition as a Pasteur/Crabtree-like toggle may oversimplify the biochemical consequences of disrupting cytosolic NAD<sup>+</sup> regeneration.

      We have revised the Discussion to clarify this conceptual distinction and to avoid relying on comparisons that are not mechanistically equivalent to LDH inhibition.

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      We appreciate the reviewer's critical comment. In Figure 3C, there is no accumulation of F6P or G6P, which are upstream of PFK1. This is because the PFK1-catalyzed reaction sets a significant thermodynamic barrier. Even with treatment using 30 μM GNE-140, the ∆G<sub>PFK1</sub> (Gibbs free energy of the PFK1-catalyzed reaction) remains -9.455 kJ/mol (Figure 3D), indicating that the reaction is still far from thermodynamic equilibrium, thereby preventing the accumulation of F6P and G6P.

      We agree with the reviewer that hexokinase inhibition may play a role, this requires further investigation.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      We agree with the reviewer’s comment. In this study, we did not explore how the inhibition of LDH affects pyruvate incorporation into the TCA cycle. As this mechanism was not investigated, we have titled the study:

      "Elucidating the Kinetic and Thermodynamic Insights into the Regulation of Glycolysis by Lactate Dehydrogenase and Its Impact on the Tricarboxylic Acid Cycle and Oxidative Phosphorylation in Cancer Cells."

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      This issue also concerned me during the study. However, given the high reproducibility of the data, we consider it is true, but requires explanation. The PGAM-catalyzed reaction is tightly linked to both upstream and downstream reactions in the glycolytic pathway. In glycolysis, three key reactions catalyzed by HK2, PFK1, and PK are highly exergonic, providing the driving force for the conversion of glucose to pyruvate. The other reactions, including the one catalyzed by PGAM, operate near thermodynamic equilibrium and primarily serve to equilibrate glycolytic intermediates rather than control the overall direction of glycolysis, as previously described by us (J Biol Chem. 2024 Aug8;300(9):107648).

      The endergonic nature of the PGAM-catalyzed reaction does not prevent it from proceeding in the forward direction. Instead, the directionality of the pathway is dictated by the exergonic reaction of PFK1 upstream, which pushes the flux forward, and by PK downstream, which pulls the flux through the pathway. The combined effects of PFK1 and PK may account for the observed endergonic state of the PGAM reaction.

      However, if the PGAM-catalyzed reaction were isolated from the glycolytic pathway, it would tend toward equilibrium and never surpass it, as there would be no driving force to move the reaction forward.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

      GNE-140 treatment increased the labeling of TCA cycle intermediates by [<sup>13</sup>C<sub>6</sub>]glucose but decreased the OXPHOS rate, we consider the conflicting results as an 'anomaly' that warrants further explanation. To address this, we analyzed the labeling pattern of TCA cycle intermediates using both [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine. Tracing the incorporation of glucose- and glutamine-derived carbons into the TCA cycle suggests that LDH inhibition leads to a reduced flux of glucose-derived acetyl-CoA into the TCA cycle, coupled with a decreased flux of glutamine-derived α-KG, and a reduction in the efflux of intermediates from the cycle. These results align with theoretical predictions. Under any condition, the reactions that distribute TCA cycle intermediates to other pathways must be balanced by those that replenish them. In the GNE-140 treatment group, the entry of glutamine-derived carbon into the TCA cycle was reduced, implying that glucose-derived carbon (as acetyl-CoA) entering the TCA cycle must also be reduced, or vice versa.

      This step-by-step investigation is detailed under the subheading "The Effect of LDHB KO and GNE-140 on the Contribution of Glucose Carbon to the TCA Cycle and OXPHOS" in the Results section in the manuscript.

      In the Discussion, we emphasize that caution should be exercised when interpreting isotope tracing data. In this study, treatment of cells with GNE-140 led to an increase labeling percentage of TCA cycle intermediates by [<sup>13</sup>C<sub>6</sub>]glucose (Figure 5A-E). However, this does not necessarily imply an increase in glucose carbon flux into TCA cycle; rather, it indicates a reduction in both the flux of glucose carbon into TCA cycle and the flux of intermediates leaving TCA cycle. When interpreting the data, multiple factors must be considered, including the carbon-13 labeling pattern of the intermediates (m1, m2, m3, ---) (Figure 5G-K), replenishment of intermediates by glutamine (Figure 5M-V), and mitochondrial oxygen consumption rate (Figure 5W). All these factors should be taken into account to derive a proper interpretation of the data.

      Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

      We thank the reviewer’s comment and the following are clarification of the conceptual framework, the quantitative methodology, and the experimental basis supporting our conclusions.

      (1) “It is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle… leads to upregulation of TCA/OXPHOS… (authors claim lowered glycolysis leads to lower TCA/OXPHOS)”

      This framing is not accurate in the context of our study. PDK inhibition and LDH inhibition are fundamentally different perturbations. PDK inhibition directly promotes mitochondrial pyruvate oxidation by enabling PDH flux, whereas LDH inhibition primarily perturbs cytosolic redox balance (free NADH/NAD<sup>+</sup>) and thereby constrains upstream glycolytic reactions, particularly the GAPDH step. Therefore, the metabolic outcomes of these interventions are not expected to be identical and should not be treated as interchangeable.

      Importantly, we do not “ignore” prior studies proposing increased OXPHOS after LDH inhibition; we explicitly cite and summarize this prevailing interpretation in the Introduction. Our study was motivated precisely because this interpretation does not resolve key quantitative inconsistencies, including (i) the large mismatch between glycolytic flux and mitochondrial oxidative capacity, and (ii) the exceptionally high catalytic capacity of LDH relative to upstream rate-limiting glycolytic enzymes. These constraints raise a mechanistic question: how does LDH inhibition actually suppress glycolytic flux in intact cancer cells, and what are the consequences for TCA cycle and OXPHOS?

      Our central contribution is the identification of a biochemical mechanism supported by integrated measurements of fluxes, metabolite concentrations, redox state, and reaction thermodynamics: LDH inhibition increases free NADH/NAD<sup>+</sup>, decreases free NAD<sup>+</sup> availability, inhibits GAPDH, drives accumulation/depletion patterns in glycolytic intermediates, shifts Gibbs free energies of near-equilibrium reactions (PFK1–PGAM segment), suppresses pyruvate production, and consequently reduces carbon input into TCA cycle and OXPHOS. These analyses are not provided by most prior work and directly address the mechanistic gap.

      (2) Lactate signaling (Thompson/Chouchani) and metabolic modeling (Titov/Rabinowitz)

      These research directions are valuable, but they address questions that are different from the one investigated here. Our manuscript focuses on steady-state biochemical control of metabolic flux by LDH inhibition through redox-linked kinetics and pathway thermodynamics.

      (3) Pyruvate in RPMI

      Pyruvate in standard medium does not invalidate our conclusions. All experimental comparisons were performed under identical conditions across groups, and the major conclusions rely on orthogonal measurements including glycolytic flux (glucose consumption/lactate production), OCR profiling, and isotope tracing with [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>] glutamine, which directly quantify carbon entry into lactate and TCA cycle intermediates. These tracer-based results are not confounded by unlabeled extracellular pyruvate in a way that would reverse the mechanistic conclusions.

      (4) LDH activity assay in homogenates and “many enzymes can react with NADH”

      This concern is overstated. In the LDH assay, substrates are pyruvate + NADH, and the measured signal reflects NADH oxidation coupled to pyruvate reduction. In cell lysates, LDH is uniquely abundant and catalytically efficient for this reaction pair, and the inhibitor-response behavior matches the known LDHA/LDHB selectivity of GNE-140 and the cellular phenotypes. Thus, the assay is mechanistically specific in this context.

      (5) Enzyme-coupled metabolite assays and request for LC–MS validation

      The reviewer’s implication that enzyme-coupled assays are intrinsically unreliable is incorrect. Enzymatic cycling assays are a widely used quantitative approach when performed with proper specificity and calibration, and they are particularly useful for labile glycolytic intermediates that are challenging to quantify reproducibly by MS without specialized quenching, derivatization, and isotope dilution standards.

      We agree that MS-based quantification is valuable, and we have developed LC–MS methods for selected metabolites. However, absolute quantification of these intermediates remains technically difficult due to the inherent limitation of this method and, in our hands, did not provide uniformly robust performance for all intermediates required for thermodynamic analysis.

      (6) Units (“mM”)

      The metabolite concentration units are correct.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      If the goal is to investigate the direct impact of LDH inhibition, then in my opinion, most of these experiments need to be repeated at a very early time point immediately after or a few minutes after LDH inhibition. I understand that this is a tremendous amount of work that the authors might not want to pursue. I do want to highlight that the quality of the experiments performed in this work is impressive. I hope the authors continue investigating this subject and look forward to reading their future manuscripts on this topic.

      We thank the reviewer for this thoughtful and constructive comment and for the positive assessment of the experimental quality of our work.

      We fully agree that measurements at very early time points after LDH inhibition would be required if the goal were to isolate an immediate, proximal molecular event occurring before downstream propagation. However, the primary objective of our study is not to dissect a single instantaneous biochemical consequence of LDH inhibition, but rather to characterize the metabolic steady state that is re-established after sustained suppression of LDH activity, which we believe is more relevant for understanding the long-term metabolic and therapeutic consequences of LDH inhibition in cancer cells.

      (1) Scope: steady-state metabolic regulation versus immediate transient effects

      The reviewer raises an important point that many metabolic perturbations can trigger rapid, transient responses within seconds to minutes, whereas our measurements were performed after sustained LDH inhibition. We agree that very early time points would be required if the primary goal were to isolate the most immediate, proximal consequence of LDH inhibition before downstream propagation. However, the objective of our study is different: we aim to characterize the metabolic steady state re-established after sustained inhibition of LDH activity, because this adapted steady state is more relevant for understanding long-term metabolic consequences and therapeutic outcomes of LDH inhibition in cancer cells.

      (2) Genetic LDHA/LDHB knockout: comparison of two steady states

      A related point applies to the LDHA/LDHB knockout models. We fully agree that the knockout process necessarily involves a temporal perturbation during cell line generation and adaptation. Nevertheless, the experimental comparison in our study is explicitly between two steady states: the baseline steady state of control cells and the steady state achieved after stable genetic disruption of LDHA or LDHB. The observation that LDHA or LDHB knockout alone had minimal effects on glycolysis and respiration indicates that partial reduction of LDH activity can be compensated in a steady-state manner, consistent with the exceptionally high catalytic capacity of LDH in cancer cells relative to upstream rate-limiting enzymes.

      (3) LDH-activity-dependent quantitative relationships support stable metabolic states

      Importantly, our conclusions do not rely on a single inhibitor condition at a single time point. Rather, we established quantitative steady-state relationships between residual LDH activity and pathway behavior across a wide range of LDH inhibition. These LDH-activity-dependent data strongly support that the system resides in stable metabolic states at different degrees of LDH activity, rather than reflecting non-specific collapse due to prolonged stress.

      Specifically, we observed that when LDH activity was reduced from 100% to approximately ~9% (e.g., by genetic perturbation and partial pharmacologic inhibition), glucose consumption and lactate production remained essentially unchanged, indicating maintenance of a steady-state glycolytic flux despite substantial LDH inhibition. Only when LDH activity was further reduced below this threshold did glycolytic flux decrease in a graded manner, consistent with a nonlinear control structure.

      Likewise, the isotope tracing results showed distinct LDH-activity-dependent transitions in TCA cycle labeling patterns. Over the range in which LDH activity decreased from 100% to ~9%, the [<sup>13</sup>C<sub>6</sub>]glucose-derived labeling pattern of citrate remained largely unchanged, whereas deeper inhibition led to a decrease in m2 citrate with a compensatory rise in higher-order citrate isotopologues, consistent with altered flux entry versus cycling/retention in the TCA cycle. Similarly, [<sup>13</sup>C<sub>5</sub>]glutamine tracing revealed that deeper LDH inhibition reduced the direct m5 contribution, accompanied by corresponding shifts in other isotopologues. These graded, quantitative transitions—rather than an abrupt global failure—support the interpretation of distinct metabolic steady states across LDH activity levels, linking LDH inhibition to changes in both glycolysis and mitochondrial metabolism.

      Reviewer #2 (Recommendations For The Authors):

      All in all, the authors would benefit from collaboration with a group more well-versed in quantitative aspects of metabolism (such as Metabolic Control Analysis) and modelling methods (such as flux analysis) to boost the interpretation and impact of their really nice data set.

      We sincerely thank the reviewer for this insightful and constructive suggestion. We fully agree that collaboration with groups specializing in quantitative metabolic analysis, such as Metabolic Control Analysis and flux modeling, would further expand the interpretative depth and broader impact of this work.

      The primary objective of the present work, however, was not to construct a global mathematical model, but to experimentally dissect the biochemical mechanism by which LDH inhibition coordinately suppresses glycolysis, the TCA cycle, and OXPHOS, integrating enzyme kinetics with thermodynamic constraints at steady state. Within this scope, we focused on experimentally demonstrable relationships between LDH activity, redox balance, GAPDH perturbation, thermodynamic shifts in near-equilibrium reactions, and emergent flux suppression.

      We fully recognize the power of MCA and related modeling approaches in formalizing control coefficients and system-level sensitivities, and we view our dataset as particularly well suited to support such future analyses. We therefore see this work as providing a robust experimental platform upon which more comprehensive quantitative modeling can be built, either in future studies or through collaboration with specialists in metabolic modeling.

      Reviewer #3 (Recommendations For The Authors):

      We sincerely thank the reviewer for the important suggestions.

      (1) I strongly disagree that "regulation of glycolytic flux".. "remained largely unexplored.”

      Our original wording was meant to emphasize not the absence of prior work on glycolytic flux regulation, but rather that the specific biochemical mechanism by which LDH regulates glycolytic flux—particularly through the integrated effects of enzyme kinetics, redox balance, and thermodynamic constraints within the pathway—has not been fully elucidated.

      To avoid any ambiguity or overstatement, we have revised the relevant text to more precisely reflect this intent. The revised wording now reads:

      “This study elucidates a biochemical mechanism by which lactate dehydrogenase influences glycolytic flux in cancer cells, revealing a kinetic–thermodynamic interplay that contributes to metabolic regulation.”

      We believe this revised phrasing more accurately acknowledges prior work while clearly defining the specific mechanistic contribution of the present study.

      (2) Very confusing in the Introduction section: "If LDH is inhibited at the LDH step..”

      We sincerely thank the reviewer for pointing out the potential confusion caused by the phrase “If LDH is inhibited at the LDH step” in the Introduction.

      Our intention was to contrast two conceptual models of LDH inhibition. The first is the conventional view, in which the effect of LDH inhibition is assumed to be confined to the LDH-catalyzed reaction itself, leading primarily to local accumulation of pyruvate and its redirection toward mitochondrial metabolism. The second, which is supported by our data, is that LDH inhibition initiates a system-wide biochemical response, perturbing redox balance, upstream enzyme kinetics, and the thermodynamic state of the glycolytic pathway, ultimately resulting in coordinated suppression of glycolysis, the TCA cycle, and OXPHOS.

      We agree that the original phrasing was ambiguous and potentially misleading. To improve clarity, we have revised the text as follows:

      “If the effect of LDH inhibition were confined solely to its catalytic step…”

      (3) The entire introduction part when the authors attempt to explain how decreased glycolysis will lead to decreased mitochondrial respiration is confusing.

      We would like to clarify that the Introduction does not attempt to explain how decreased glycolysis leads to decreased mitochondrial respiration. Rather, the final paragraph of the Introduction is intended to highlight an unresolved conceptual inconsistency in the existing literature and to motivate the central question addressed in this study.

      Specifically, we summarize the prevailing view that LDH inhibition redirects pyruvate toward mitochondrial metabolism and enhances oxidative phosphorylation, and then point out that this interpretation is difficult to reconcile with quantitative considerations, such as the large disparity between glycolytic and mitochondrial flux capacities and the excess catalytic activity of LDH relative to upstream glycolytic enzymes. These observations are presented to emphasize that the biochemical mechanism linking LDH inhibition to changes in glycolysis and mitochondrial respiration has not been fully resolved.

      Importantly, the Introduction does not propose a mechanistic explanation for the observed suppression of mitochondrial respiration; rather, it poses this as an open question, which is then systematically addressed through experimental analysis in the Results section.

      (4) Line 144: "which is 81(HeLa-LDHAKO) -297(HeLa-Ctrl) times"- here and in many other places wording is confusing to the reader.

      Our intention was to emphasize the significant redundancy of LDH activity relative to hexokinase (HK), the first rate-limiting enzyme in the glycolysis pathway, in cancer cells.

      Specifically, we wanted to express that in HeLa-Ctrl cells, the total LDH activity is 297 times that of HK activity; while in HeLa-LDHAKO cells, although the total LDH activity decreased, it was still 81 times that of HK activity. This data comes from supplement Table 1 in the paper and aims to provide quantitative evidence for "why knocking out LDHA or LDHB alone is insufficient to significantly affect glycolysis flux," because the remaining LDH activity is still far higher than the HK activity at the pathway entrance, sufficient to maintain flux.

      Based on your suggestion, we rewrite it in the revised draft with a more specific statement: "...the total activity of LDH in HeLa cells is very high, which is 297-fold higher than the first rate-limiting enzyme HK activity in HeLa-Ctrl cells and 81-fold higher in HeLa-LDHAKO cells.”

      (5) Line 153: "in the following four aspects:"- but what are these aspects, the text below has no corresponding subtitles, etc.

      Our intention was to indicate that after LDHA or LDHB knockout alone failed to affect the glycolysis rate, we further explored its potential impact on the glycolytic pathway from four deeper perspectives: the glucose carbon to pyruvate and lactate, the glucose carbon to subsidiary branches of glycolysis, the concentration of glycolytic intermediates and the thermodynamic state of the pathway, and the redox state of cytosolic free NADH/NAD<sup>+</sup>.

      Following your valuable suggestion, we have now added the aforementioned clear subtitles to these four aspects in the revised manuscript.

      (6) Lines 193, another example of the very confusing statement: "The results suggested that the loss of total LDH concentration was compensated.."

      The actual catalytic activity (reaction rate) of LDH is determined by both its enzyme concentration and substrate concentration (pyruvate and NADH). When the total LDH protein concentration (enzyme amount) in the cell is reduced through gene knockout, the reaction equilibrium is disrupted. To maintain sufficient lactate production flux to support a high glycolysis rate, the cell compensates by increasing the concentration of one of the substrates—free NADH (as shown in Figure 1I). This results in an increased substrate concentration, despite a reduction in the amount of enzyme, thus partially maintaining the overall reaction rate.

      We have revised the original statement to more accurately describe this kinetic equilibrium process: "The decrease in total LDH concentration was counterbalanced by a concomitant increase in the concentration of its substrate, free NADH, thereby maintaining the reaction velocity.”

      (7) Line 222-223: "did not or marginally significantly affect....”

      Our intention is to reflect the complexity of the data in Figure 1. Specifically: Regarding "did not affect": This means that there were no statistically significant differences in most key parameters, such as glycolytic flux (glucose consumption rate, lactate production rate). Regarding "or marginally significantly affected": This means that in a few indicators, although statistical calculations showed p-values less than 0.05, the absolute value of the difference was very small, with limited biological significance.

      To clarify this, we rewrite it as: "...did not significantly affect glucose-derived pyruvate entering into TCA cycle, neither significantly affect mitochondrial respiration, although statistically significant but minimal changes were observed in a few specific parameters (e.g., m3-pyruvate% in medium).”

      (8) It is very confusing to use the same colors for three GNE-140 drug concentrations (Figure 2a-b) and for 3 different cell lines right next to each other (Figure 2c-d).

      The figures have been revised accordingly.

      (9) Lines 263-273: nothing is new here as oxidized NAD+ is required for run glycolysis and LDH inhibition/KO leads to a high NADH/NAD+ ratio; Also below it is well known that reductive stress blocks serine biosynthesis;

      It is well established that oxidized NAD<sup>+</sup> is required for glycolysis, that LDH inhibition or knockout increases the NADH/NAD<sup>+</sup> ratio, and that reductive stress can suppress serine biosynthesis. We did not intend to present these observations as novel.

      The key point of this section is not the qualitative requirement of NAD<sup>+</sup> for GAPDH, but rather the mechanistic alignment between LDH inhibition, changes in free NAD<sup>+</sup> availability, and the emergence of GAPDH as a flux-controlling step within the glycolytic pathway under steady-state conditions. Previous studies have largely treated the increase in NADH/NAD<sup>+</sup> following LDH inhibition as a correlative or downstream effect, without directly demonstrating how this redox shift quantitatively propagates upstream to reorganize glycolytic flux distribution and thermodynamic driving forces.

      In our study, we explicitly link LDH inhibition to (i) an increase in free NADH/NAD<sup>+</sup> ratio, (ii) inhibition of GAPDH activity in intact cells, (iii) accumulation of upstream glycolytic intermediates, (iv) suppression of serine biosynthesis from 3-phosphoglycerate, and critically, (v) coordinated shifts in the Gibbs free energies of reactions between PFK1 and PGAM. This integrated kinetic–thermodynamic framework goes beyond the established qualitative understanding of NAD<sup>+</sup> dependence and provides a pathway-level mechanism by which LDH activity controls glycolytic flux.

      (10) Lines 368-370: "... we reached an alternative interpretation of the data.."- does not provide much confidence.

      Our intention was to prudently emphasize that we proposed a new interpretation based on detailed data, differing from conventional views. Our interpretation is grounded in key and consistent evidence from dual isotope tracing experiments using [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine: The [<sup>13</sup>C<sub>6</sub>]glucose tracing data: the labeling pattern of citrate, the starting product of TCA cycle, showed a significant decrease in m+2 %. This directly reflects a reduction in the flux of newly generated acetyl-CoA from glucose entering the TCA cycle. Simultaneously, the sum of other isotopologues % (m+1/ m+3/ m+4/m+5/m+6) increased, indicating a longer retention time of the labeled carbon in the cycle, implying a simultaneous decrease in the flux of cycle intermediates effluxed for biosynthesis. [<sup>13</sup>C<sub>5</sub>]Glutamine tracing data: the labeling pattern of α-ketoglutarate showed a decrease in m+5 %, indicating a reduction in glutamine replenishment flux. The pattern of change in the total percentage of other isotopologues % (m+1/ m+2/ m+3/m+4) also supports the conclusion of reduced intermediate product efflux.

      These two sets of data corroborate each other, pointing to a unified conclusion: LDH inhibition not only reduces carbon source inflow into the TCA cycle but also decreases intermediate product efflux, leading to a decrease in overall cycle activity. Therefore, our "alternative interpretation" is a well-supported and more consistent explanation of our overall experimental results. We revise the original wording to: "Integrated analysis of dual isotope tracing data demonstrates that LDH inhibition reduces both influx and efflux of the TCA cycle..."

      (11) Lines 418-421: This entire discussion on how TCA cycle activity is decreased upon LDH inhibition is very confusing. I also would like to see these tracer studies when ETC is inhibited with different inhibitors.

      We would like to clarify that the mitochondrial respiration rate data presented in Figure 5W are based on studies using different ETC inhibitors, and the cell treatment conditions (including culture time, etc.) for these oxygen consumption measurements are consistent with the conditions for the [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine isotope tracing experiments (Figure 5A-V). Therefore, the changes in TCA cycle flux revealed by the tracing data and the inhibition of OXPHOS rate shown by the respiration measurements are mutually corroborating evidence from the same experimental conditions.

      (12) Figure 6F, G - very limited representation of growth curves, why not perform these experiments with all corresponding cell lines and over multiple days. Especially since proliferation arrest vs cell death was implicated.

      We have provided the growth curves of the HeLa-Ctrl and HeLa-LDHAKO cell lines under the corresponding treatments in Figure 6—figure supplement 1, as a supplement to Figure 6F, G (HeLa-LDHBKO cells). The choice of 48 hours as the cutoff observation point is based on clear biological evidence: under the stress of hypoxia (1% O<sub>2</sub>) combined with GNE-140 treatment, HeLa-LDHBKO cells experienced substantial death within 24 to 48 hours, at which point the differences in the growth curves were already very significant.

      (13) Move most of the Supplementary tables into an Excel file - so values can be easily accessed.

      We have compiled the tables into an Excel file and submitted it along with the revised manuscript as supplementary material.

      (14) Consider changing colors to more appealing- especially jarring is a bright blue, red, black combination on many bar graphs.

      We have adjusted the color scheme of the figures (especially the bar graphs) in the paper, and have submitted them with the revised manuscript.

      (15) Double check y-axis on multiple graphs it says "mM".

      We have checked y-axis, the unit (mM) is correct.

      (16) Instead TCA cycle use the TCA cycle.

      In the revised manuscript, TCA cycle is used.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34<sup>+</sup>Sca-1<sup>+</sup> dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns or comments.

      We sincerely thank the reviewer for the positive evaluation of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The Authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injected metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, vena cava inferior and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The Authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the Authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has some concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results, and suggests that the conclusions of the paper may be critically viewed. Namely, at this point, it is still not fully clear that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit or these are fine portal branches that connect the larger portal veins into the adjacent sinusoid. Also, in my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomics (instead of data mining in existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations). Yet, the existence of such structures with a distinct molecular profile cannot be excluded. Further research with advanced imaging and omics techniques (such as high resolution volume imaging, and spatial transcriptomics/proteomics) are needed to reproduce these initial findings.

      We thank the reviewer for the thoughtful and constructive comments. In response to the reviewer’s concerns regarding the anatomical and molecular definition of the periportal lamellar complex (PLC), we have further clarified the scope and methodological boundaries of the present study in the revised manuscript.

      Regarding the key question raised by the reviewer—namely, whether the PLC represents an independent anatomical or functional unit, or merely small portal venous branches connecting larger portal veins to adjacent sinusoids—we provide below a more detailed explanation of the criteria used to define the PLC in this study. The identification of the PLC is primarily based on periportal structures that can be reproducibly recognized by three-dimensional imaging across multiple mice, exhibiting a relatively consistent spatial distribution within the periportal region. The PLC could be stably observed across different MCNP dye color assignments and independent experimental batches. In addition, three-dimensional CD31 immunofluorescence consistently revealed vascular-associated signal distributions in the same periportal region, indirectly supporting its spatial association with the periportal vascular system.

      At the morphological level, the PLC appears as a periportal vasculature-associated structure distributed around the main portal vein trunk and maintains a relatively consistent spatial proximity to portal veins, bile ducts, and neural components in three-dimensional space. This highly conserved spatial organization across multiple tissue systems supports the anatomical positioning of the PLC as a relatively distinct structural tissue unit within the periportal region.

      The present study primarily focuses on a descriptive characterization of the three-dimensional anatomical organization and spatial relationships of the PLC based on volumetric imaging and vascular labeling strategies. As a complementary exploratory analysis, we reanalyzed endothelial cell populations potentially associated with the PLC using existing liver single-cell transcriptomic datasets. This analysis was intended to provide molecular-level information consistent with the structural observations and to offer preliminary clues to its potential biological functions, rather than to independently define the PLC at the spatial level or to functionally validate it.

      We fully acknowledge the value of spatial transcriptomic and spatial proteomic technologies in revealing molecular heterogeneity within tissue architecture. However, under current technical conditions, these approaches are largely dependent on thin tissue sections and are limited by spatial resolution and signal mixing effects, which still pose challenges for resolving periportal structures with pronounced three-dimensional continuity, such as the PLC. In the future, further integration of high-resolution volumetric imaging with spatial omics technologies may enable a more refined understanding of the molecular features and potential functions of the PLC at higher spatial resolution.

      Reviewer #3 (Public review):

      Summary:

      In the revised version of the manuscript authors addressed multiple comments, clarifying especially the methodological part of their work and PLC identification as a novel morphological feature of the adult liver portal veins. Tet is now also much clearer and has better flow.

      The additional assessment of the smartSeq2 data from Pietilä et al., 2025 strengthens the transcriptomic profiling of the CD34+Sca1+ cells and the discussion of the possible implications for the liver homeostasis and injury response. Why it may suffer from similar bias as other scRNA seq datasets - multiple cell fate signatures arising from mRNA contamination from proximal cells during dissociation, it is less likely that this would happen to yield so similar results.

      Nevertheless, a more thorough assessment by functional experimental approaches is needed to decipher the functional molecules and definite protein markers before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems.

      The work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of the Elife readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell subpopulation for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the careful and constructive comments regarding the functional validation of cell populations associated with the PLC. The central aim of this study is to establish and validate a novel volumetric imaging and vascular labeling strategy and to apply it to the periportal region of the liver, thereby revealing previously underappreciated structural organizational patterns at the three-dimensional level, rather than to perform a systematic functional validation of specific cellular subpopulations.

      We agree that the precise roles of the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cell subpopulation in the formation and function of the periportal lamellar complex (PLC) have not been directly addressed through functional intervention experiments in the present study. Our conclusions are primarily based on three-dimensional imaging and spatial distribution analyses, which reveal a stable and consistent spatial association between this cell population and the PLC structure, but are not intended to independently support causal or functional inferences. The underlying functional mechanisms remain to be elucidated in future studies using genetic or functional perturbation approaches.

      In light of these considerations, we have further refined the relevant statements in the revised manuscript to more clearly define the functional scope and limitations of the current study in the Discussion section, and to avoid functional interpretations that extend beyond the direct support of the data. At the same time, we consider functional validation of the PLC to be an important and promising direction for future investigation.

      It should be emphasized that the present study is not primarily designed to provide direct functional validation, but rather to systematically characterize the three-dimensional structural features of the periportal lamellar complex (PLC) and its cellular associations using volumetric imaging and vascular labeling approaches. At this stage, we mainly provide spatial and histological evidence for the organizational relationship between the PLC structure and the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cell population, while their specific roles in PLC formation and functional regulation await further investigation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I highly appreciate the Authors' endeavors to improve the manuscript. I am enlisting those points (from my original review) where I still have further comments.

      (2) I would suggest this sentence:

      "...the liver has evolved a highly complex and densely organized ductal vascular-neuronal network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7]."

      We thank the reviewer for the valuable suggestion. We have revised the relevant sentence accordingly, and the revised wording is as follows:

      “The liver has evolved a highly complex and densely organized vascular–biliary–neural network, primarily composed of the portal venous system, central venous system, hepatic arterial system, biliary system, and the intrahepatic autonomic neural network.”

      (3) I suggest renaming 'clearing efficiency' to 'clearing time', and revise the last sentence like:

      '...The results showed that the average transmittance increased by 20.12% in 1mm-thick cleared tissue slices.'

      We thank the reviewer for this helpful suggestion. Accordingly, we have replaced the term “clearing efficiency” with “clearing time” and revised the final sentence to reflect this change. The revised wording is as follows:

      “The results showed that the average transmittance increased by 20.12% in cleared tissue slices with a thickness of 1 mm.”

      (4) While the dye perfusion was indeed on full lobe, FigS1F also seems to be rather a thick section instead of a full 3d reconstruction. This is OK, but please, be clear and specific about this in the respective part of the ms.

      We thank the reviewer for the careful review and detailed comments. We would like to clarify that Fig. S1F shows whole-lobe imaging of the mouse left liver lobe obtained after dye perfusion at the whole-liver scale, rather than an image derived from a thick tissue section. Although this image does not represent a three-dimensional reconstruction, it does reflect imaging of the entire left liver lobe at the macroscopic level.

      In addition, for the reviewer’s reference, we have provided in this response a representative image of a 200 μm-thick liver tissue section to directly illustrate the morphological differences between thick-section imaging and whole-lobe imaging. We note that the third and fourth panels in Fig. 1G of the main text already show local imaging results from 200 μm-thick sections; in contrast, the comparative image provided here presents a larger field of view and overall morphology. To avoid redundancy, this additional image is included solely for clarification in the present response and has not been incorporated into the revised manuscript or the supplementary materials.

      (11) Regarding the 'transmission quantification':

      'Regarding the comparative quantification of different clearing methods, as the reviewer noted, nearly all aqueous or organic solvent based clearing techniques can achieve relatively uniform transparency in 1 mm thick tissue sections, so differences at this thickness are limited.'

      So, based on all these, I think, measuring/comparisons of clearing efficacy in the present form are kind of pointless --- one may consider omitting this part.

      We thank the reviewer for the valuable comments. The purpose of the transmittance quantification in this study was not to provide a comprehensive comparison among different tissue-clearing methods, but rather to serve as a quantitative reference supporting the optimization of the Liver-CUBIC protocol. Accordingly, we have narrowed and clarified the relevant statements in the revised manuscript to define their scope and avoid overinterpretation.

      The revised text now reads as follows:

      “Importantly, Liver-CUBIC treatment did not induce significant tissue expansion (Figure 1B–D). In addition, quantitative transmittance measurements in 1-mm-thick cleared tissue slices showed an average increase of 20.12% (P < 0.0001; 95% CI: 19.14–21.09; Figure 1E).”

      Author response image 1.

      (16) It is OK, but please, indicate this clearly in the Methods/Results because in its present form it may be confusing for the reader: which color means what.

      We thank the reviewer for this helpful request for clarification. We agree that the previous wording may have caused confusion regarding the meaning of different MCNP colors. Accordingly, we have revised the Methods section and the relevant figure legends to clearly state that the color assignment of MCNP dyes is not fixed across different experiments or figures. The use of different colors serves solely for visualization and presentation purposes, facilitating the distinction of anatomical structures in multichannel and three-dimensional imaging, and does not indicate any fixed or intrinsic correspondence between a specific color and a particular vascular or ductal system. We believe that this clarification will help prevent misinterpretation and improve the overall clarity of the manuscript.

      (17) Still I think the hepatic artery is extremely shrunk, while the portal vein is extremely dilated. Please, note that in the referring figure (from Adori et al), hepatic artery and portal vein are ca 50 micrometers and 250 micrometers in diameter, respectively. In your figure, as I see, ca. 9-10 micrometers and 125 micrometers, respectively. This means 5x (Adori) vs. 13-14x differences (you). I would not say that this is necessarily problematic --- but may reflect some perfusion issues that may be good to consider.

      We thank the reviewer for the careful comparison and acknowledge the quantitative differences pointed out. Compared with the study by Adori et al., the diameter ratio between the hepatic artery and the portal vein in our images does indeed differ to some extent. We believe that this discrepancy primarily arises from methodological differences in imaging and analysis strategies between the two studies.

      In the work by Adori et al., periportal vasculature identification and three-dimensional segmentation were mainly based on 488 nm autofluorescence signals acquired from inverted tissues. This signal predominantly reflects the overall outline of periportal tissue regions rather than direct imaging of the vascular lumen itself. Consequently, the measured “vessel diameter” largely represents a spatial domain delineated by surrounding periportal structures, and does not necessarily correspond to the actual or functional luminal diameter of the vessel.

      In contrast, the present study employed fluorescent MCNP dye perfusion under low perfusion pressure, combined with tissue clearing and three-dimensional optical imaging. Under these experimental conditions, the measured vessel diameters more closely reflect the perfusable luminal space of vessels in a fixed state, rather than their maximally dilated diameter, and are not defined by the morphology of surrounding tissues. This distinction is particularly relevant for the hepatic artery: as a high-resistance, smooth muscle–rich vessel, its diameter is highly sensitive to perfusion pressure and post-excision changes in vascular tone. In comparison, the portal vein exhibits greater compliance and is relatively less affected by these factors.

      Based on these methodological differences, the observation of relatively smaller apparent hepatic arterial diameters—and consequently a higher arterial-to-portal vein diameter ratio—under dye perfusion–based optical imaging conditions is an expected outcome. Importantly, the primary focus of the present study is the identification and characterization of the periportal lamellar complex (PLC) as a three-dimensional lamellar tissue structure that can be stably and reproducibly recognized across different samples and imaging conditions, rather than absolute comparisons of vascular diameters.

      (21) After the presented documentation, I still have some concerns that the 'periportal lamellar complex (PLC)' that the Authors describe is really a distinct anatomical or functional unit. The confocal panel in Fig. 4F is nice and high quality. However, as far as I see, it shows that CD34+/Sca-1+ immunostaining is not specific for the presumptive PLCs in the peri-portal region. Instead, Sca-1 immunoreactivity is highly abundant also in the midzone --- to which the supposed PLCs do not extend, according to the cartoon shown in panel D, same figure. Notably, this questions also the specificity of the single cell analysis.

      We thank the reviewer for this detailed and important comment regarding the specificity of CD34<sup>+</sup>/Sca-1<sup>+</sup> markers and the definition of the periportal lamellar complex (PLC).

      It should be emphasized that the PLC is not defined on the basis of any single molecular marker, but rather by a reproducible periportal lamellar anatomical structure consistently revealed by three-dimensional imaging across multiple samples. The co-expression of CD34 and Sca-1 is interpreted within this clearly defined anatomical context and is used to characterize the molecular features of endothelial cells associated with the PLC structure.

      As shown in Fig. 4F, the co-expression of CD34 and Sca-1 delineates a continuous, lamellar endothelial structure surrounding the portal vein. In contrast, outside the periportal region—including the midlobular areas—Sca-1 or CD34 expression can also be detected, but these signals appear scattered and discontinuous, lacking an organized lamellar topology.

      In the single-cell transcriptomic analysis, we treated CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cells as an operational population to explore molecular features that may be enriched in the microenvironment of the periportal lamellar complex (PLC). Importantly, this analysis was intended to provide molecular clues associated with the PLC, rather than to precisely assign spatial locations or identities to individual cells.

      Occasional isolated Sca-1<sup>+</sup> signals detected outside the periportal region do not affect the anatomical definition of the PLC, nor do they alter the interpretation of the single-cell analysis. These analyses serve to provide supportive and exploratory molecular information for the structural identification of the PLC, rather than constituting decisive spatial evidence.

      (23) '....In the manuscript, we have carefully stated that this analysis is exploratory in nature and have avoided overinterpretation. In future studies, high-resolution spatial omics approaches will be invaluable for more precisely delineating the molecular characteristics of these fine structures.'

      I do not find these statements either in the Discussion or in the Results. I must reiterate my opinion that the applied methodical approach in the single cell transcriptomics part has severe limitations, and the readers must be aware of this.

      We thank the reviewer for this further comment. We understand and acknowledge the reviewer’s concerns regarding the methodological limitations of single-cell transcriptomic analyses, and we agree that these limitations should be clearly communicated to readers in the main text.

      We acknowledge that in the previous version of the manuscript, the exploratory nature of the single-cell transcriptomic analysis and its methodological boundaries were discussed only in the response to reviewers and were not explicitly stated in the manuscript itself. We thank the reviewer for pointing out this omission. In the revised manuscript, we have now added explicit clarifications in the main text to prevent potential overinterpretation of these results.

      In the present study, our primary effort is focused on the descriptive characterization of the three-dimensional anatomical organization and spatial relationships of the PLC using volumetric imaging and vascular labeling strategies. As a complementary exploratory analysis, we reanalyzed existing liver single-cell transcriptomic datasets to examine endothelial cell populations exhibiting PLC-associated features, and performed differential gene expression and Gene Ontology enrichment analyses. Importantly, these results are intended to provide molecular-level support for the structural identification of the PLC and to offer preliminary insights into its potential biological functions. Accordingly, we have narrowed the presentation and interpretation of the single-cell analysis in both the Results and Discussion sections of the revised manuscript.

      In addition, we have expanded the Discussion to address the limitations of current spatial transcriptomic approaches in validating a continuous three-dimensional structure such as the PLC. Most existing spatial transcriptomic methods rely on two-dimensional tissue sections of 8–10 μm thickness, whereas identification of the PLC depends on three-dimensional imaging of tissue volumes with thicknesses of ≥200 μm, making reliable reconstruction of its spatial continuity from single sections challenging. Furthermore, because each spatial transcriptomic capture spot often encompasses multiple adjacent cells, signal mixing effects further limit precise resolution of specific periportal microstructures.

      Overall, we agree with the reviewer’s central point that the limitations of single-cell transcriptomic analyses should be clearly understood by readers. By explicitly clarifying the methodological boundaries and refining the related statements in the main text, we believe this concern has now been adequately addressed in the revised manuscript. We thank the reviewer for identifying this omission, which has helped to improve the rigor and clarity of the study.

      Reviewer #3 (Recommendations for the authors):

      (1) While interesting observations, suitable for discussion, the following sections are speculations, given that no functional characterization of PLC importance has been performed yet. This is the most felt when commenting on the role in hematopoiesis, which transiently takes place in the liver during embryogenesis (Khan et al 2016) but ceases to exist after ligation of the umbilical inlet. Adult Liver hematopoiesis remains controversial, and more solid evidence would need to be presented to support its existence in PLC regions.

      265 - These findings suggest that the Periportal Lamellar Complex (PLC) is not only a morphologically and spatially distinct, low-permeability vascular unit surrounding the portal vein, but also likely serves as a critical nexus connecting the portal vein, hepatic artery, and liver sinusoids. Thus, the PLC constitutes a key node within the interactive vascular network of the mouse liver.

      We thank the reviewer for the comments and suggestions regarding the potential functional interpretation of the periportal lamellar complex (PLC), particularly its possible association with hematopoietic function. We would like to clarify that the statement on page 265 was intended solely to describe the structural characteristics and spatial organization of the PLC within the periportal vascular network. Specifically, the original wording aimed to summarize the morphological features of the PLC and its spatial relationships among the portal vein, hepatic artery, and hepatic sinusoids.

      Nevertheless, to minimize potential misunderstanding, we have revised this section to avoid unnecessary functional implications. The revised text now reads:

      “These results suggest that the periportal lamellar complex (PLC) is a morphologically and spatially distinct vascular structure that surrounds the portal vein and may serve as a key organizational node coordinating the spatial relationships among the portal vein, hepatic artery, and hepatic sinusoids. Accordingly, the PLC represents an important structural element within the interactive vascular network of the mouse liver.”

      This revision preserves the structural significance of the PLC while avoiding overinterpretation of its functional roles.

      (2) The same is true also for this section, following Figure 3 - no functional experiment tested this. For example, diphtheria toxin is expressed in the CD34+Sca1+ population. Or at least a careful mapping of the developing liver, which would indicate if the PLC precedes or follows the BD development.

      356 as a spatial positional cue guiding bile duct growth and branching but also as a regulatory node involved in coordinating bile drainage from the hepatic lobule into the biliary network.

      To avoid potential misunderstanding, we have further refined and revised the statements in the manuscript regarding the functional interpretation of the periportal lamellar complex (PLC) and its relationship to bile duct development. We agree that cell ablation strategies are of great importance for functional validation studies. However, it should be noted that CD34 and Sca-1 are relatively broadly expressed markers during liver development, labeling multiple endothelial, mesenchymal, and progenitor cell populations, and their expression is not restricted to the PLC. Owing to this broad expression pattern, ablation of CD34<sup>+</sup>Sca-1<sup>+</sup> cell populations would likely exert widespread effects on vascular and stromal structures, thereby complicating the distinction between direct PLC-specific effects and secondary developmental alterations. As such, this strategy may present technical limitations for specifically dissecting the role of the PLC in bile duct development. At the same time, given that the primary objective of this study is the systematic characterization of the three-dimensional anatomical features and spatial organization of the PLC, we have correspondingly revised the manuscript to restrict statements regarding the relationship between the PLC and bile ducts to spatial associations supported by the current data. Specifically, our results show that primary bile ducts run along the main portal vein trunk, secondary bile ducts exhibit directed branching toward the PLC region, and terminal bile duct branches tend to spatially cluster in the vicinity of the PLC, thereby forming a reproducible periportal spatial arrangement. Based on these observations, the PLC delineates a relatively conserved anatomical microenvironment within the portal region, whose spatial position is closely associated with the organization and terminal distribution of the intrahepatic bile duct network.

      We believe that these revisions more accurately reflect the experimental evidence and the defined scope of the present study.

      (3) The following statement ought to be rephrased or skipped, considering that CD34 and Sca1 (Ly6a) are markers of periportal endothelial cells (Pietilä et al., 2025, Gómez-Salinero et al., 2022) and as shown by the authors in their own Fig. 6D. In this context and the context of the CCL4 experiments, a "simple" proliferative progenitor portal vein endothelial cell phenotype, suggested also by the presence of DLL4 (Fig5A) and JAG1 (Pietilä et al., 2025) (Benedito et al., 2009) ought to be considered.

      409 Notably, CD34 and Sca-1 (Ly6a) were co-expressed exclusively within PLC structures surrounding the portal vein, but absent from central vein ECs and midzonal LSECs (Figure 4F).

      We thank the reviewer for pointing out the potential imprecision in this wording. We agree that both CD34 and Sca-1 (Ly6a) are well-established markers of periportal endothelial cells, as previously reported (Pietilä et al., 2025; Gómez-Salinero et al., 2022), and as also illustrated in Fig. 4F of our study.

      Accordingly, the original statement suggesting that CD34 and Sca-1 are co-expressed exclusively within the PLC structure may indeed represent an overinterpretation. Following the reviewer’s suggestion, we have revised the relevant text on page 409 by removing the exclusive phrasing (“only in”) and by emphasizing instead that CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cells are enriched in periportal regions associated with the PLC, rather than being specific to or confined within the PLC.

      In addition, in the context of the CCl<sub>4</sub>-induced liver fibrosis model, we agree with the reviewer that the observed expression of DLL4 and JAG1 under fibrotic conditions is more appropriately interpreted as reflecting an activated or proliferative periportal endothelial progenitor–like phenotype, rather than defining a novel endothelial lineage. The corresponding statements in the revised manuscript have been adjusted accordingly.

      (4) Again, these concluding sentences are based on correlative evidence of mRNA expression and literature but not experimental evidence.

      436 These findings suggest that this unique endothelial cell subset in the periportal region may possess dual regulatory functions in both metabolic and hematopoietic modulation

      441 results suggest that PLC endothelial cells may not only regulate periportal microcirculatory blood flow but also help establish a specialized microenvironment that potentially supports periportal hematopoietic regulation, contributing to stem cell recruitment, vascular homeostasis, and tissue repair.

      We thank the reviewer for this thoughtful comment. We agree that these statements are primarily based on transcriptomic correlation analyses and support from previous literature, rather than direct functional experimental evidence.

      Accordingly, in the revised manuscript, we have appropriately toned down and adjusted the relevant concluding statements to more accurately reflect their inferential nature. The revised wording emphasizes associations and potential involvement, rather than definitive functional roles. These changes preserve the overall scientific interpretation while aligning the level of inference more closely with the available evidence.

      The revised text now reads:

      “Finally, we found that the main trunk of the PLC is primarily composed of CD34<sup>+</sup>Sca-1<sup>+</sup>CD31<sup>+</sup> endothelial cells (Fig. 4J). These CD34<sup>+</sup>Sca-1<sup>+</sup> double-positive cells are mainly distributed in the basal region of the PLC structure and exhibit molecular features associated with hematopoiesis. Taken together, these results suggest that PLC endothelial cells may contribute to the establishment of a local microenvironment related to periportal hematopoietic regulation and may play potential roles in stem cell recruitment and maintenance of vascular homeostasis.”

      (5) The following part is speculative and based on re-analysis from the dataset that was gathered after 6 more weeks of CCL4 treatment (12weeks Su et al., 2021), then in the linked experiments from the manuscript. And should be moved to discussion or removed.

      504 Moreover, single-cell transcriptomic re-analysis revealed significant upregulation of bile duct-related genes in the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelium of PLC in fibrotic liver, with notably high expression of Lgals1 (Galectin-1) and Hgf (Figure 5G). Previous studies have shown that Galectin-1 is absent in normal liver parenchyma but highly expressed in intrahepatic cholangiocarcinoma (ICC), correlating with tumor dedifferentiation and invasion (Bacigalupo, Manzi, Rabinovich, & Troncoso, 2013; Shimonishi et al., 2001). Additionally, hepatocyte growth factor (HGF), particularly in combination with epidermal growth factor (EGF) in 3D cultures, promotes hepatic progenitor cells to form bile duct-polarized cystic structures (N. Tanimizu, Miyajima, & Mostov, 2007). Together, these findings suggest the PLC endothelium may act as a key regulator of bile duct branching and fibrotic microenvironment remodeling in liver fibrosis.

      Collectively, our results demonstrate that the PLC, situated between the portal vein and periportal sinusoidal endothelium, constitutes a critical vascular microenvironmental unit. It may not only colocalize with bile duct branches under normal physiological conditions, but also through its basal CD34<sup>+</sup>Sca-1<sup>+</sup> double-positive endothelial cells, potentially orchestrate bile duct epithelial proliferation, branching morphogenesis, and bile acid transport homeostasis via multiple signaling pathways. Particularly during liver fibrosis progression, the PLC exhibits dynamic structural extension, serving as a spatial scaffold facilitating terminal bile duct migration and expansion into the hepatic parenchyma (Figure 5H). These findings highlight the PLC endothelial cell population and the vascular-bile duct interface as key regulatory hubs in bile duct regeneration, tissue repair, and pathological remodeling, providing novel cellular and molecular insights for understanding bile duct-related diseases such as ductular reaction, cholangiocarcinoma, and cholestatic disorders, and offering potential targets for therapeutic intervention.

      We thank the reviewer for this careful and thought-provoking comment. We understand and agree with the reviewer’s assessment that this section involves a degree of inference, as the analysis is based on a re-analysis of a previously published single-cell transcriptomic dataset from a CCl<sub>4</sub>-induced liver fibrosis model (Su et al., 2021), rather than on experimental data directly generated in the present study.

      In response to the reviewer’s suggestion, we have carefully re-examined and revised the relevant paragraphs. Without altering the overall structure of the manuscript, we have appropriately moderated the wording to clarify that these results primarily describe the transcriptional features of PLC-associated CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cells under fibrotic conditions, and their associations with bile duct–related gene expression, rather than providing direct functional evidence for their roles in bile duct branching or microenvironmental remodeling.

      In addition, we have explicitly clarified in the main text the data source and methodological limitations of the single-cell transcriptomic analysis, and emphasized that these findings should be interpreted in conjunction with the spatial information revealed by three-dimensional imaging. Through these revisions, we aim to retain the value of this analysis in providing complementary molecular insight into PLC characteristics, while avoiding potential over-interpretation of its functional implications.

      Formal suggestions:

      (6) The following sentence would benefit from being more clearly written.

      263 - The formation of PLC structures in the adventitial layer may participate in local blood flow regulation, maintenance of microenvironmental homeostasis.

      We thank the reviewer for this helpful suggestion. The sentence has been revised to improve clarity by correcting the parallel structure and refining the wording.

      The formation of PLC structures in the adventitial layer may participate in local blood flow regulation and the maintenance of microenvironmental homeostasis.

      (7) The following sentence is misleading as it implies cell sorting, and "subsetted" rather than "sorted" should be used.

      414 Based on this, we sorted CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial populations from the total liver EC pool (Figure 4G).

      Thank you for your comment.

      We have revised the term as suggested. This avoids the misleading implication of physical sorting, as our operation was analytical subsetting of the target subpopulation.

      We appreciate your careful review.

      (8) Correct typos, especially in the results section related to Fig. 6. and formatting issues in the discussion.

      730 Morphologically, the PLC shares features with previously described telocytes (TCs)- 731 a recently identified class of interstitial cells in the liver observed via transmission electron

      We thank the reviewer for pointing out this textual error. In the submitted version, the sentence describing the morphological similarity between the PLC and previously reported telocytes was inadvertently interrupted due to a punctuation issue. This has now been corrected to ensure sentence integrity and consistent formatting.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Xu et al. focuses on the impact of clathrin-independent endocytosis in cancer cells on T cell activation. In particular, by using a combination of biochemical approaches and imaging, the authors identify ICAM1, the ligand for T cell-expressed integrin LFA-1, as a novel cargo for EndoA3-mediated endocytosis. Subsequently, the authors aim to identify functional implications for T cell activation, using a combination of cytokine assays and imaging experiments.

      They find that the absence of EndoA3 leads to a reduction in T cell-produced cytokine levels. Additionally, they observe slightly reduced levels of ICAM1 at the immunological synapse and an enlarged contact area between T cells and cancer cells. Taken together, the authors propose a mechanism where EndoA3-mediated endocytosis of ICAM1, followed by retrograde transport, supplies the immunological synapse with ICAM1. In the absence of EndoA3, T cells attempt to compensate for suboptimal ICAM1 levels at the synapse by enlarging their contact area, which proves insufficient and leads to lower levels of T cell activation.

      Strengths:

      The authors utilize a rigorous and innovative experimental approach that convincingly identifies ICAM1 as a novel cargo for Endo3A-mediated endocytosis.

      Weaknesses:

      The characterization of the effects of Endo3A absence on T cell activation appears incomplete. Key aspects, such as surface marker upregulation, T cell proliferation, integrin signalling and most importantly, the killing of cancer cells, are not comprehensively investigated.

      We agree with the reviewer that the effects of EndoA3 depletion on T cell activation were not characterized enough. In new data presented in Fig.S4G-J, we explored additional activation markers and proliferation parameters. We didn’t observe any difference for the surface markers PD-1, CD137 and Tim-3 between LB33-MEL EndoA3+ cells treated with control and EndoA3 siRNAs. Regarding proliferation (Fig. S4J), although the proliferation index seems slightly lower upon EndoA3 depletion, we didn’t observe any significant difference either. Degranulation has also been monitored (Fig. S4K), but we didn’t observe any significant differences. In the new Fig. 3F however, we performed chromium release assays to assess the killing of cancer cells. Very interestingly, we observed an ~15% higher lysis of LB33-MEL EndoA3+ cells after EndoA3 depletion, when compared to the control condition at a ratio of 3:1 T cells:target cells (where the maximal effect is observed). These data are further discussed in the discussion section (new §6-9).

      As Endo- and exocytosis are intricately linked with the biophysical properties of the cellular membrane (e.g. membrane tension), which can significantly impact T-cell activation and cytotoxicity, the authors should address this possibility and ideally address it experimentally to some degree.

      Evaluating changes in the biophysical properties of cancer cell plasma membrane upon EndoA3 depletion is not trivial. An indirect way to address this question is by observing the area and shape of cells after siRNA treatment. In the new data added in the new Fig. S4B-D, we compared the area, aspect ratio and roundness of LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs. While we observed a slight cell area reduction upon EndoA3 depletion, no significant changes were observed regarding the aspect ratio and the roundness. Hence, we think that the biophysical properties of cancer cells are not drastically modified by EndoA3 depletion.

      Crucially, key literature relevant to this research, addressing the role of ICAM1 endocytosis in antigen-presenting cells, has not been taken into consideration.

      We thank the reviewer for this important point. We have now considered and cited the relevant literature (Discussion, Page no.9).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Xu et al. studies the relevance of endophilin A3-dependent endocytosis and retrograde transport of immune synapse components and in the activation of cytotoxic CD8 T cells. First, the authors show that ICAM1 and ALCAM, known components of immune synapses, are endocytosed via endoA3-dependent endocytosis and retrogradely transported to the Golgi. The authors then show that blocking internalization or retrograde trafficking reduces the activation of CD8 T cells. Moreover, this diminished CD8 T cell activation resulted in the formation of an enlarged immune synapse with reduced ICAM1 recruitment.

      Strengths:

      The authors show a novel EndoA3-dependent endocytic cargo and provide strong evidence linking EndoA3 endocytosis to the retrograde transport of ALCAM and ICAM1.

      Weaknesses:

      The role of EndoA3 in the process of T cell activation is shown in a cell that requires exogenous expression of this gene. Moreover, the authors claim that their findings are important for polarized redistribution of cargoes, but failed to show convincingly that the cargoes they are studying are polarized in their experimental system. The statistics of the manuscript also require some refinement.

      We fully acknowledge that the requirement for exogenous expression of EndoA3 in our immunological model represents a limitation of our study. Unfortunately, it remains challenging to identify cancer cell lines for which autologous CD8 T cells are available and that endogenously express all molecular players investigated (in particular EndoA3). At this stage, we do not have access to any other cancer cell line/autologous CD8⁺ T cell pairs that are sufficiently well characterized. In future studies, it would be valuable to investigate tumor types with high endogenous EndoA3 expression (such as glioblastomas, gliomas, and head and neck cancers) for which autologous CD8 T cells could be obtained, but this remains technically challenging.

      To address the reviewer’s second point regarding polarized redistribution of cargoes, we have added new data in the new Figure 4 and Movies S8-9. Using high-speed spinningdisk live-cell confocal microscopy, we captured the movement of ICAM1-positive tubulovesicular carriers in cancer cells at the moment of contact with CD8 T cells. Capturing such events is technically challenging, as T cell–cancer cell contacts form randomly and transiently. Successful imaging requires that the cancer cell be well spread and express ICAM1–GFP at an optimal level (as it is transiently expressed as a GFP-tagged construct), while acquisition must occur precisely at the moment when the T cell initiates contact. Despite these technical constraints, we successfully imaged early stages of immune synapse formation, enabling visualization of ICAM1 vesicular transport.

      The data reveal a flux of ICAM1-positive carriers emerging from the perinuclear region (corresponding to the Golgi area) and moving toward the contact site with the CD8 T cell, with fusion events of vesicles occurring at the developing immune synapse. AI-based segmentation and tracking analyses showed that ICAM1-positive carrier trajectories were predominantly oriented toward the forming immune synapse, whereas carriers moving toward other cellular regions were markedly less frequent. These results provide direct evidence for polarized ICAM1 transport via vesicular trafficking toward the immune synapse.

      Reviewer #3 (Public review):

      Summary:

      Shiqiang Xu and colleagues have examined the importance of ICAM-1 and ALCAM internalization and retrograde transport in cancer cells on the formation of a polarized immunological synapse with cytotoxic CD8+ T cells. They find that internalization is mediated by Endophilin A3 (EndoA3) while retrograde transport to the Golgi apparatus is mediated by the retromer complex. The paper is building on previous findings from corresponding author Henri-François Renard showing that ALCAM is an EndoA3dependent cargo in clathrin-independent endocytosis.

      Strengths:

      The work is interesting as it describes a novel mechanism by which cancer cells might influence CD8+ T cell activation and immunological synapse formation, and the authors have used a variety of cell biology and immunology methods to study this. However, there are some aspects of the paper that should be addressed more thoroughly to substantiate the conclusions made by the authors.

      Weaknesses:

      In Figure 2A-B, the authors show micrographs from live TIRF movies of HeLa and LB33MEL cells stably expressing EndoA3-GFP and transiently expressing ICAM-1-mScarlet. The ICAM-1 signal appears diffuse across the plasma membrane while the EndoA3 signal is partially punctate and partially lining the edge of membrane patches. Previous studies of EndoA3-mediated endocytosis have indicated that this can be observed as transient cargo-enriched puncta on the cell surface. In the present study, there is only one example of such an ICAM-1 and EndoA3 positive punctate event. Other examples of overlapping signals between ICAM-1 and EndoA3 are shown, but these either show retracting ICAM1 positive membrane protrusions or large membrane patches encircled by EndoA3. While these might represent different modes of EndoA3-mediated ICAM-1 internalization, any conclusion on this would require further investigation.

      We agree with the reviewer that the pattern of cargoes during endocytosis (puncta vs large patches) as observed by live-cell TIRF microscopy may be confusing. Actually, a punctate pattern has been observed quasi systematically when we monitored the uptake of endogenous cargoes via antibody uptake assays (whatever the imaging approach: TIRF, spinning-disk, classical confocal or lattice light-sheet microscopy). For example:

      - ALCAM: Fig.1e-h, Supplementary Figure 5 and Supplementary Movies 1-3 and 6 in Renard et al. 2020, https://doi.org/10.1038/s41467-020-15303-y; Fig.1D and Movie 2 in Tyckaert et al. 2022, https://doi.org/10.1242/jcs.259623.

      - L1CAM: Fig.2 and 3D, Movies S1-4 in Lemaigre et al. 2023, https://doi.org/10.1111/tra.12883.

      In rare examples, bigger clusters of antibodies were observed, where EndoA3 was observed to surround them, delineate them in a “lasso-like” pattern, and the clusters were progressively taken up:

      - ALCAM: Supplementary Movie 4 in Renard et al. 2020, https://doi.org/10.1038/s41467-020-15303-y.

      However, bigger patches of cargoes were more often observed when uptake was observed using transient expression of GFP-/mCherry-tagged versions of cargoes. In these cases, EndoA3 was predominantly observed to delineate cargo patches as a “lasso-like” pattern, progressively triming those patches leading to endocytosis. For example:

      - L1CAM: Fig.3E, Movie S5-7 in Lemaigre et al. 2023, https://doi.org/10.1111/tra.12883.

      - We also observed this pattern with CD166-GFP (unpublished).

      The fact that we observed rather patches than punctate patterns upon transient expression of fluorescently-tagged constructs of cargoes is likely due to the elevated expression level of the cargoes.

      Therefore, the patchy pattern observed for ICAM1 and ALCAM, transiently expressed in fusion with fluorescent proteins, and surrounded by EndoA3 in Fig.2A-B and old Movies S1-3, is not surprising. Of note, upon anti-ALCAM antibody uptake, we observed a more punctate pattern (Fig.2C), as previously described. Unfortunately, the lower quality of commercial anti-ICAM1 antibody did not allow us to proceed to uptake assays as for ALCAM.

      Regarding Fig.S2 and old Movies S4-5, we agree with the reviewer that these data may be misleading, as they represent phenomena happening at protrusions and contact zones between two adjacent cells. We have now replaced these images with other examples where we avoid contact zones (Fig.S2 and new Movies S5-7).

      These different patterns (patches vs dots) are still unexplained at the current stage, and may indeed represent different modes of endocytosis. We think these various patterns may depend on the abundance/expression level of cargoes and their degree of clustering. This will be investigated in future studies. Still, whatever the pattern, these data demonstrate and confirm the association between EndoA3 and cargoes (such as ICAM1 or ALCAM), even in the absence of antibodies.

      Moreover, in Figure 2C-E, uptake of the previously established EndoA3 endocytic cargo ALCAM is analyzed by quantifying total internal fluorescence in LB33-MEL cells of antibody labelled ALCAM following both overexpression and siRNA-mediated knockdown of EndoA3, showing increased and decreased uptake respectively. Why has not the same quantification been done for the proposed novel EndoA3 endocytic cargo ICAM-1? Furthermore, if endocytosis of ICAM-1 and ALCAM is diminished following EndoA3 knockdown, the expression level on the cell surface would presumably increase accordingly. This has been shown for ALCAM previously and should also be quantified for ICAM-1.

      As correctly pointed by the reviewer, anti-ICAM1 antibody uptake assays would have been great. We have tried to do them many times. Unfortunately, all commercial antibodies we tested did not yield satisfying results in uptake experiments. Either the labeling was too week/non-specific, or the antibody was not effectively stripped from the cell surface by acid washes, i.e. the acid-wash conditions required for efficient stripping were too harsh for the cells to tolerate. We have tried other approaches using the same commercial antibody which do not require acid washes (loss of surface assays by FACS, or uptake assays using surface protein biotinylation) or based on insertion of an Alfa-tag in the extracellular part of ICAM1 by CRISPR-Cas9 and detection of ICAM1 with an antiAlfa-tag nanobody (unpublished approach; collaboration with the lab of Prof. Leonardo Almeida-Souza, University of Helsinki, who developed the approach), but without success. However, we were more successful with the SNAP-tag-based approach to follow retrograde transport, for which the commercial anti-ICAM1 antibody worked properly. In Fig. 1F, we could show that retrograde transport of ICAM1 (and thus most likely its endocytosis step) was significantly decreased upon EndoA3 depletion in HeLa cells, indirectly demonstrating that ICAM1 is effectively an EndoA3-dependent cargo.

      Regarding the fact that surface level of ICAM1 should increase upon perturbation of EndoA3-mediated endocytosis, we agree with the reviewer that this could be an expected result. However, this is not necessarily systematic, as the surface level of a protein cargo is always the result of a balance between its endocytosis, recycling to plasma membrane, and lysosomal degradation. We also have to take into account the neosynthesized protein flux. One must also consider that multiple endocytic mechanisms exist in parallel, and that the perturbation of one mechanism (EndoA3-mediated CIE, here) may be partially compensated by others, as cargoes can often be taken up via multiple endocytic doors. Hence, an increased abundance at the cell surface is not always guaranteed upon endocytosis perturbation. Anyway, we measured the cell surface level of both ICAM1 and ALCAM in LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs (Fig. S4E-F). Only minor differences were observed.

      In Figure 4A the authors show micrographs from a live-cell Airyscan movie (Movie S6) of a CD8+ T cell incubated with HeLa cells stably expressing HLA-A*68012 and transiently expressing ICAM1-EGFP. From the movie, it seems that some ICAM-1 positive vesicles in one of the HeLa cells are moving towards the T cell. However, it does not appear like the T cell has formed a stable immunological synapse but rather perhaps a motile kinapse. Furthermore, to conclude that the ICAM-1 positive vesicles are transported toward the T cell in a polarized manner, vesicles from multiple cells should be tracked and their overall directionality should be analyzed. It would also strengthen the paper if the authors could show additional evidence for polarization of the cancer cells in response to T-cell interaction.

      A similar point was raised by reviewer #2. We have revised this section accordingly. In the new Fig. 4 and Movies S8-9, we replaced the live-cell Airyscan confocal data with highspeed spinning-disk confocal imaging data, enabling a more accurate analysis of cargo polarized redistribution and at a higher time resolution.

      Using this approach, we captured the movement of ICAM1-positive tubulo-vesicular carriers in cancer cells at the moment of contact with CD8 T cells. Capturing such events is technically challenging, as T cell–cancer cell contacts form randomly and transiently. Successful imaging requires that the cancer cell be well spread and express ICAM1–GFP at an optimal level (as it is transiently expressed as a GFP-tagged construct), while acquisition must occur precisely at the moment when the T cell initiates contact. Despite these technical constraints, we successfully imaged early stages of immune synapse formation, enabling visualization of ICAM1 vesicular transport.

      The data reveal a flux of ICAM1-positive carriers emerging from the perinuclear region (corresponding to the Golgi area) and moving toward the contact site with the CD8 T cell, with fusion events of carriers occurring at the developing immune synapse.

      AI-based segmentation and tracking analyses showed that ICAM1-positive carrier trajectories were predominantly oriented toward the forming immune synapse, whereas carriers moving toward other cellular regions were markedly less frequent. These results provide direct evidence for polarized ICAM1 transport via vesicular trafficking toward the immune synapse.

      Finally, in Figures 4D-G, the authors show that the contact area between CD8+ T cells and LB33-MEL cells is increased in response to siRNA-mediated knockdown of EndoA3 and VPS26A. While this could be caused by reduced polarized delivery of ICAM-1 and ALCAM to the interface between the cells, it could also be caused by other factors such as increased cell surface expression of these proteins due to diminished endocytosis, and/or morphological changes in the cancer cells resulting from disrupted membrane traffic. More experimental evidence is needed to support the working model in Figure 4H.

      Regarding the cell surface expression of both ICAM1 and ALCAM, as already explained above, only minor differences were observed (Fig. S4E-F). Regarding morphological changes of cancer cells upon EndoA3 depletion (Fig. S4B-D), we compared the area, aspect ratio and roundness of LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs. While we observed a slight cell area reduction upon EndoA3 depletion, no significant changes were observed regarding the aspect ratio and the roundness. Cancer cell morphology is thus not drastically modified by EndoA3 depletion. All these new data are now discussed in the manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers discussed the paper and all agreed it was incomplete in supporting the conclusions. Additional data needed to support the conclusions were:

      (1) Better characterisation of Endo3A-expressing and knock-down cells such as morphology, ICAM-1, and ALCAM surface levels to name two parameters.

      As discussed above, we have now added new data addressing these points:

      - Morphology: Fig. S4B-D

      - ICAM1 and ALCAM surface levels: Fig. S4E-F These new data are discussed in the main text.

      (2) Better characterisation of the ICAM-1 polarisation process. Does this require interaction with LFA-1 can ICAM-1 be delivered to the synapse without this?

      As discussed above, we have now added new data better addressing the characterization of ICAM1 polarized trafficking to the immune synapse, that can be found in the new Fig. 4 (high-speed spinning-disk confocal imaging of ICAM1 trafficking upon conjugate formation between CD8 T cell and cancer cell). The text has been modified accordingly. The dependency on LFA-1 has not been addressed directly, but we may suppose it is indeed important as (i) it has already been addressed in other cellular systems by previous studies (Jo et al. 2010), and (ii) we observed a denser flux of ICAM1-positive carriers in the cancer cell toward regions involved in immune synapses with CD8 T cells, than other regions. As we didn’t address this question more directly in our study, we briefly mentioned this point in the Discussion section.

      (3) Better characterisation of T cell response- activation markers, cytotoxicity assays.

      As discussed above, we have now added new data addressing these points:

      - Cell surface activation markers: Fig. S4G-I

      - Proliferation: Fig. S4J

      - Degranulation: Fig. S4K

      - Cytotoxic activity: Fig. 3F

      These new data are discussed in the main text.

      (4) Citing relevant literature.

      The relevant literature (in particular the paper by Jo et al. 2010) is now cited and discussed.

      (5) Number of donors evaluated - is it true there was only one blood donor? For human studies better to have key results on >4 donors.

      Our immunological working model indeed originates from a single patient (Baurain et al., 2000), from whom both a cancer cell line (LB33-MEL) and autologous CD8 T cells were derived. These CD8 T cells specifically recognize an HLA molecule presenting a defined antigenic peptide (MUM-3) on the surface of the cancer cells. This provides us with a unique and fully natural experimental system that allows us to faithfully reconstitute cytotoxic T lymphocyte (CTL)-mediated killing of cancer cells in vitro.

      Using CD8 T cells from other donors would not be meaningful in this context, as they would not recognize the LB33-MEL cells. Conversely, testing the same CD8 T cells on other cancer cell lines requires engineering these lines to express the appropriate HLA molecule and to be exogenously pulsed with the correct antigenic peptide – which is precisely what we did with the HeLa cell line.

      Therefore, increasing the number of donors would require obtaining both cancer cell lines and CD8 T cells from each donor, ideally with evidence that the donor’s T cells recognize their own tumor cells. This is technically challenging and not trivial, although it would indeed be highly valuable to diversify immunological models in future studies.

      Importantly, the high specificity of our autologous co-culture system, where cancer cells interact with their naturally matched CD8 T cells, offers clear advantages over commonly used in vitro models such as Jurkat (T) and Raji (B) cell lines, which rely on artificial stimulation with a superantigen to enforce immunological synapse formation and T cell activation.

      (6) How does the binding of antibodies to ICAM-1 and ALCAM impact their trafficking?

      As IgG antibodies are bivalent and can bind two target antigens, they may induce clustering, which could in turn affect endocytosis. To address this concern, we performed an uptake assay based on surface protein biotinylation using a cleavable biotin reagent (with a reducible linker). Briefly, after allowing endocytosis for different time intervals, cell surface–exposed biotins were removed by treatment with the cellimpermeable reducing agent MESNA, while internalized (endocytosed) biotinylated proteins remained protected. These internalized proteins were then recovered by affinity purification on streptavidin resin and analyzed by Western blot to detect the protein of interest.

      Importantly, this uptake assay can be performed in the absence or presence of an anticargo antibody, allowing assessment of its potential influence on endocytosis. Author response image 1 shows the results for ALCAM uptake in HeLa cells, with and without anti-ALCAM antibody:

      Author response image 1.

      Antibody binding to an extracellular epitope of ALCAM increases its endocytosis. HeLa cellsurface proteins were biotinylated on ice using EZ-Link Sulfo-NHS-SS-Biotin (Pierce) and then incubated at 37 °C for the indicated times to allow endocytosis. Internalization was assessed in the absence or presence of an anti-ALCAM antibody (Ab) added to the extracellular medium. Endocytosis was stopped by returning the cells to ice, and surface-exposed biotin was removed by treatment with the cell-impermeable reducing agent MESNA. Internalized, MESNA-resistant biotinylated proteins were affinity-purified on streptavidin resin and analyzed by Western blot to detect ALCAM. The “unstripped” condition shows the total amount of ALCAM at the cell surface at the beginning of the experiment (signal at ~95 kDa). Quantification of the time course (normalized to the no-antibody condition) shows increased ALCAM endocytosis in the presence of antibody at 15 and 30 min. Blot is representative of two independent experiments; quantifications include data from both experiments.

      We observed that the anti-ALCAM antibody slightly enhanced ALCAM uptake. A similar experiment was attempted for ICAM1, but we were unable to detect the protein by Western blot using the available commercial antibody.

      Although this outcome was expected, it highlights a potential caveat in using antibodies to monitor endocytosis. Alternative tools such as nanobodies, while monovalent and theoretically less perturbing, are not yet available for many cargo proteins and may still influence cargo conformation or dynamics. Therefore, antibodies remain the current gold standard in endocytosis studies. Nevertheless, data obtained with antibodies should always be validated by complementary approaches that do not rely on antibody binding, as we have done in this study (e.g. live-cell imaging of fluorescently tagged proteins).

      The work is of interest and we look forward to your response/revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for submitting your manuscript which I had the pleasure to review. While I enjoyed your work, I feel that it would strongly benefit by addressing the following points:

      (1) In-depth characterization of T cell responses upon Endo3A depletion: The characterization should be expanded to include surface marker upregulation, T cell proliferation, and, most importantly, tumor cell cytotoxicity. I was wondering if the incomplete characterization of T-cell responses is due to limited supplies of antigenspecific T-cells? My understanding is that these cells have been derived from a single patient. This also raises concerns in terms of reproducibility as all data are practically from a single biological replicate. My suggestion would be to use an additional system of specific cell-cell contacts to complement the current findings. For instance, HeLa cells could be transfected to express CD19 or EpCAM, for both of which bispecific T cell engagers (Invivogen) exist that would allow specific contact formation, thereby allowing the study of the effect of Endo3A depletion across T cells from different donors and through a more complete set of assays.

      We refer the reviewer to our responses above, where these points have been addressed in detail. We sincerely thank the reviewer for the excellent suggestion of transfecting HeLa cells with CD19 or EpCAM and using bispecific T-cell engagers. However, after careful consideration, we concluded that this approach falls outside the scope of the present study, which was specifically designed to investigate the most natural system, cancer cells and their autologous CD8 T cells. We nevertheless appreciate this insightful suggestion and will certainly consider it for future studies.

      (2) Alterations in membrane tension as an alternative explanation: Endo- and exocytosis have been found to influence the biophysical properties of cells, such as membrane tension (e.g., Djakbaravo et al., 2021, PMID: 33788963), which in turn influences their susceptibility to cytotoxic T cells with lower tension corresponding to reduced cytotoxicity (e.g., Basu & Whitlock, 2016, PMID: 26924577). Thus, interference with endocytic pathways could arguably lead to changes in membrane tension that could contribute to the observed effects. These possible effects should be discussed and addressed experimentally to a degree. While measuring membrane tension directly requires specialized expertise (e.g., tether pulling experiments) and is not within the scope of this study, membrane tension affects cell spreading and actin organization. Thus, I would suggest conducting a thorough comparative phenotypical and morphological characterization of the Endo3A+ and Endo3A- cancer cells to estimate the possible effect of changes in membrane tension (if any) on the results.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (3) Citation and consideration of earlier work: Jo & Kwon et al., 2010 (PMID: 20681010) have previously shown that ICAM1 undergoes clathrin-independent recycling and repolarization to the immunological synapse in APCs. Furthermore, they provided evidence that actin-based transport, but not lateral diffusion, together with recycling is crucial for the repolarization of ICAM1 to the immunological synapse. This important earlier work has to be cited. Actin-based transport on the cell surface has not been considered in the current manuscript. In light of these earlier findings, it is unclear in Figure 4A if ICAM1 is delivered to the T cell from within- or from the surface of the cancer cell. I would suggest changing the imaging modalities in this experiment to be able to differentiate cell surface from internal ICAM1, e.g., by detaching the cancer cells from the surface as has been done in Fig. 4B, E, and F.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      (1) The authors should be more careful with their claims about the importance of their results for cell polarity as their evidence for this is scarce (i.e. The live-cell imaging in Figure 4A is not quantified and the ICAM1 polarization effect shown in figure 4B-C is, albeit significant, small and not very convincing).

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (2) The absence (or very low expression) of EndoA3 on the LB33-MEL cell suggests that EndoA3-mediated recycling of immune synaptic components is not required for T-cell activation. The fact that EndoA3 exogenous expression in LB33-MEL cells leads to increased cytokine production in T cells is, however, interesting.

      We fully agree with the reviewer’s observation. Although EndoA3 is not expressed in some cellular contexts, its cargoes may still be present. It is therefore reasonable to assume that alternative endocytic mechanisms can compensate for its absence. It is now widely accepted that many cargoes can be internalized through multiple endocytic routes, and that the relative contribution of each pathway depends strongly on the cellular and physiological context.

      For example, we have shown that ALCAM and L1CAM, although primarily internalized via clathrin-independent pathways, present a minor fraction (< 25%) undergoing clathrinmediated endocytosis (Renard et al., 2020; Lemaigre et al., 2023). Moreover, we observed that inhibition of macropinocytosis enhances EndoA3-mediated endocytosis of ALCAM, indicating a crosstalk between specific EndoA3-mediated clathrin-independent endocytosis (CIE) and non-specific macropinocytosis (Tyckaert et al., 2022).

      Thus, even in the absence of EndoA3, its cargoes are likely internalized through alternative endocytic routes. Nonetheless, our data clearly demonstrate that EndoA3 expression markedly enhances the endocytosis and intracellular trafficking of its cargoes, ultimately leading to modified CD8 T cell responses.

      (3) For the statistics in bar graphs (graphs 1C, D, E &F; 3E, 3F, S1C-I, and S3C), one cannot have all values for controls simply normalized to 1. This procedure hides the variance for the controls between each replicate and makes any statistics meaningless.

      We thank the reviewer for this important remark. Regarding Figures 1C–F, S1C–I, and S3C, which correspond to quantifications from Western blots, it is standard practice to normalize the quantification to a control condition set to 1 (or 100%). Absolute signal intensities cannot be directly compared across different blots due to the variability inherent to this semi-quantitative technique. For this reason, we chose to keep the data presented in normalized form. However, we agree that this type of data require the careful choice of a convenient statistical analysis approach. Here, we choose one-sample T tests, allowing to test the hypothesis that the various siRNA conditions are different from 100% (the normalized value of the siCtrl condition). We adapted the statistical analysis accordingly in the different figures mentioned.

      Regarding old Figures 3E–F (now Fig. 3E and 3G), which correspond to IFNγ secretion assays, we agree that representing IFNγ secretion as a fold change relative to a control condition may obscure inter-experimental variability. However, this format was intentionally chosen to facilitate data interpretation, as IFNγ secretion was quantified by ELISA and also displayed inter-experimental variability. For completeness, we now provide below the corresponding graphs showing absolute IFNγ concentrations, which retain the information on inter-experimental variability (Author response image 2). As you can see, the overall conclusions remain unchanged.

      Author response image 2.

      IFNg secretion data corresponding to Fig. 3E and 3G, expressed in absolute values (pg/mL)

      Minor comments:

      (1) What happens to surface and total levels of ICAM1 and ALCAM in the retromer or EndoA3 knockdown/overexpression conditions? This information would put the effects described into context.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (2) The authors should clearly indicate that BFA means bafilomycin A in the figure legend or methods.

      BFA corresponds to Brefeldin A. We have now clarified this information in legends and methods.

      (3) In the sentence: "These data demonstrate that retromer-mediated retrograde transport is critical for trafficking ALCAM and ICAM1 to the Golgi and that this process requires the full secretory capacity of the TGN." What do the authors mean by full secretory capacity?

      We have modified the sentence: “Together, these data demonstrate that retromermediated retrograde transport is critical for trafficking ALCAM and ICAM1 to the Golgi and that this process requires efficient secretion from the TGN (as evidenced by the involvement of Rab6).”

      (4) The method used for retrograde transport seems to be a variation of the original protocol (reference 43). The manuscript would benefit from a thorough explanation of this assay, rather than citing the original protocol.

      We did not modify the original SNAP-tag–based protocol used to monitor retrograde transport. A comprehensive methodological paper has been published (ref. 44), and we have followed it strictly. Additionally, we briefly summarized the rationale of the approach in Figure 1A and in the first paragraph of the Results section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper by Karimian et al proposes an oscillator model tuned to implement binding by synchrony (BBS*) principles in a visual task. The authors set out to show how well these BBS principles explain human behavior in figure-ground segregation tasks. The model is inspired by electrophysiological findings in non-human primates, suggesting that gamma oscillations in early visual cortex implement feature-binding through a synchronization of feature-selective neurons. The psychophysics experiment involves the identification of a figure consisting of gabor annuli, presented on a background of gabor annuli. The participants' task is to identify the orientation of the figure. The task difficulty is varied based on the contrast and density of the gabor annuli that make up the figure. The same figures (without the background) are used as inputs to the oscillator model. The authors report that both the discrimination accuracy in the psychophysics experiment and the synchrony of the oscillators in the proposed model follow a similar "Arnold Tongue" relationship when depicted as a function of the texture-defining features of the figure. This finding is interpreted as evidence for BBS/gamma synchrony being the underlying mechanism of the figure-ground segregation.

      Note that I chose to use "BBS" over gamma synchrony (used by the authors) in this review, as I am not convinced that the authors show evidence for synchronization in the gamma-band.

      We thank the reviewer for their careful assessment of our manuscript and useful comments that we believe have served to strengthen our work.

      Strengths:

      The design of the proposed model is well-informed by electrophysiological findings, and the idea of using computational modeling to bridge between intracranial recordings in non-human primates and behavioral results in human participants is interesting. Previous work has criticized the BBS synchrony theory based on the observation that synchronization in the gamma-band is highly localized and the frequency of the oscillation depends on the visual features of the stimulus. I appreciate how the authors demonstrate that frequency-dependence and local synchronization can be features of BBS, and not contradictory to the theory. As such, I feel that this work has the potential to contribute meaningfully to the debate on whether BBS is a biophysically realistic model of feature-binding in visual cortex.

      Weaknesses:

      I have several concerns regarding the presented claims, assessment of meaning and size of the presented effects, particularly with regard to the absence of a priori defined effect sizes.

      Firstly, the paper makes strong claims about the frequency-specificity (i.e., gamma synchrony) and anatomical correlates (early visual cortex) of the observed effects. These claims are informed by previous electrophysiological work in non-human primates but are not directly supported by the paper itself. For instance, the title contains the word "gamma synchrony", but the authors do not demonstrate any EEG/MEG or intracranial data in from their human subjects supporting such claims, nor do they demonstrate that the frequencies in the oscillator model are within the gamma band. I think that the paper should more clearly distinguish between statements that are directly supported by the paper (such as: "an oscillator model based on BBS principles accounts for variance in human behavior") and abstract inferences based on the literature (such as "these effects could be attributed to gamma oscillations in early visual cortex, as the model was designed based on those principles").

      We thank the reviewer for this helpful comment and agree that the scope of our claims should be clearly delineated between what is directly supported by our data and what is theoretically inferred from prior literature.

      We revised the Abstract, Introduction, and early Discussion to moderate the strength of our statements and make the distinction explicit. The revised title now emphasizes that our study tests principles derived from prior work on gamma synchrony rather than directly demonstrating gamma activity in humans. Throughout the text, we use more cautious phrasing that highlights potential mechanisms and theoretical predictions. The intention of our study was not to position synchrony as the only viable mechanism of figure–ground perception. Rather, our goal was to reinvigorate it as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We updated phrasing throughout the manuscript to make this clearer and avoid overstating the study’s contribution.

      Importantly, our model is not agnostic with respect to frequency band. Oscillator frequencies exhibited by model units are within the gamma range by design. Frequency emerges directly from the contrast within each oscillator’s receptive field, following an empirically established relationship between stimulus contrast and gamma frequency. To our knowledge, such a robust, quantitative relationship between stimulus features to exact oscillation frequency has not been consistently demonstrated for other frequency bands. This relationship yields gamma-band frequencies for all contrasts used in our simulations. The model is thus indeed a gamma oscillator model of V1, not a generic instantiation of Binding by Synchrony (BBS) principles.

      That said, we fully agree with the reviewer that our study cannot demonstrate a direct link between gamma synchrony in visual cortex and human behavior. Our behavioral and modeling results instead show that synchronization principles derived from gamma-band physiology in V1 can predict perceptual performance patterns. We now make this distinction explicit throughout the revised manuscript.

      Secondly, unlike the human participants, the model strictly does not perform figure-ground segregation, as it only receives the figure as an input.

      We thank the reviewer for the opportunity to clarify our modeling approach. We chose not to model the background to reduce computational cost, since including it requires a substantially larger number of oscillators without changing the model’s predictions. The model thus indeed only receives the figure region as input. We aimed to test the local grouping mechanism predicted by TWCO, rather than to simulate a full figure–ground segregation process including a read-out stage. Our model therefore isolates the conditions under which local synchrony emerges within the figure region, assuming that a downstream read-out mechanism (not explicitly modeled here) would detect regions of coherent activity. The exact nature of such a read-out mechanism was beyond the scope of our work.

      To confirm that our simplified model is a valid proxy, we ran additional simulations including the background and found that a coherent figure assembly reliably emerges, as can be seen in the phase-locking patterns relative to a reference oscillator at the center of the figure. This validates that the principles of local grouping we studied in isolation hold even when the figure is embedded in a noisy surround. We have added an explicit note in the Results (paragraph 2) that we only simulate the figure and added Supplementary Figure S1 showing the additional simulations.

      Finally, it is unclear what effect sizes the authors would have expected a priori, making it difficult to assess whether their oscillator model represents the data well or poorly. I consider this a major concern, as the relationship between the synchrony of the oscillatory model and the performance of the human participants is confounded by the visual features of the figure. Specifically, the authors use the BBS literature to motivate the hypothesis that perception of the texture-defined figure is related to the density and contrast heterogeneity of the texture elements (gabor annuli) of the figure. This hypothesis has to be true regardless of synchrony, as the figure will be easier to spot if it consists of a higher number of high-contrast gabors than the background. As the frequency and phase of the oscillators and coupling strength between oscillators in the grid change as a function of these visual features, I wonder how much of the correlation between model synchrony and human performance is mediated by the features of the figure. To interpret to what extent the similarity between model and human behavior relies on the oscillatory nature of the model, the authors should find a way to estimate an empirical threshold that accounts for these confounding effects. Alternatively, it would be interesting to understand whether a model based on competing theories (e.g., Binding by Enhanced Firing, Roelfsema, 2023) would perform better or worse at explaining the data.

      We thank the reviewer for these insightful and constructive comments, which have prompted additional analyses that we believe substantially strengthen our work. The reviewer raises two main points: (1) the need for a benchmark to assess our model’s performance, and (2) the concern that the relationship between model synchrony and behavior might be a non-causal “confound” of the visual features. We address each point below.

      (1) Benchmarking model performance

      We agree that it is important to assess how well our model performs relative to the data and included this in the original manuscript. We did not predefine an absolute good fit threshold because absolute agreement depends on irreducible noise and inter-subject variability, making a universal cutoff arbitrary. Instead, we had benchmarked model performance in two complementary ways. First, the noise ceiling shown in Figure 5 provides an empirical benchmark for the maximum fit any model could achieve on our data. Simulated Arnold tongues (based on synchrony) approach this ceiling achieving 89% of possible similarity for correlation and 79% of possible similarity for weighted Jaccard similarity, respectively. Second, the parameter sweep (Figure 3) situates our model’s performance within the broader parameter space. It shows that the model, whose key parameters were fixed a priori from independent macaque neurophysiological data, lies close to the optimal regime for explaining the human data. It also provides an estimate of the lower bound (worst-performing point) on the fit that a misspecified model implementing the identical mechanism would achieve. Our model with fixed a priori parameters does 1.41 times better than a misspecified model for the correlation fit metric and 3 times better for weighted Jaccard similarity.

      (2) Synchrony as mechanism vs. potential confound

      We appreciate the reviewer’s suggestion to test whether synchrony explains behavior beyond stimulus features. In our framework, synchrony is a near-deterministic function of the manipulated stimulus features given fixed model parameters. As a result, synchrony and the stimulus features are collinear (R<sup>2</sup>≈0.8) leaving no independent variance for synchrony to explain once stimulus features are included. Adding both into one statistical model yields unstable coefficients and no out-of-sample improvement.

      Mechanistically, we believe the relevant question is not whether synchrony explains behavior beyond stimulus features but whether synchrony is the correct transformation of the stimulus features to reproduce the behavioral pattern. Please note that in our design we ensured that mean contrast and luminance are identical in the figure and the background such that there are not more high-contrast Gabors in the figure than in the background. We did this with the aim to render mean contrast not a relevant feature. However, there are more high-contrast Gabors in the background, and it is conceivable that the absence of such high contrasts in the figure drives the detection/discrimination of the figure. We therefore agree that testing alternative models would further clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model from which we derived synchrony. First, average firing rates inside the figure and second, the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison based on out-of-sample predictions. While rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison. We added a new subsection comparing synchrony to rate-based alternatives in the Results (paragraphs 7-9), including additional Bayesian analyses and LOO-CV model comparison. Please note that the model comparison we added to the manuscript provides an additional benchmark beyond the map-level ceiling analysis. It indicates that the mapping from stimulus features to behavior via synchrony generalizes best without requiring an a priori good-fit threshold.

      We agree that formally comparing our model to a sophisticated rate-based alternative, such as an instantiation of the Binding by Enhanced Firing model, is an important direction for future work. However, it remains an open and non-trivial question whether such a model could quantitatively reproduce the precise shape of the behavioral Arnold tongue that emerges from the systematic manipulation of our stimulus parameters. Implementing and parameterizing such a model in a comparable, biologically grounded framework is a substantial undertaking that lies beyond the scope of the current study. Therefore, our goal here was not to claim exclusivity for synchrony-based mechanisms, but rather to re-evaluate their plausibility by showing that features often seen as limitations (stimulus dependence and frequency heterogeneity) are, in fact, essential characteristics of the TWCO framework that can predict complex behavioral outcomes.

      We would also like to clarify that our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). Demonstrating that these features predict behavior is therefore not trivial but constitutes a first empirical confirmation that the core TWCO variables match perception.

      Apart from adding analyses of additional rate-based readouts of our model, we also refined our discussion of the relationship between these and a synchrony-based mechanism.

      Reviewer #2 (Public review):

      The authors aimed to investigate whether gamma synchrony serves a functional role in figure-ground perception. They specifically sought to test whether the stimulus-dependence of gamma synchrony, often considered a limitation, actually facilitates perceptual grouping. Using the theory of weakly coupled oscillators (TWCO), they developed a framework wherein synchronization depends on both frequency detuning (related to contrast heterogeneity) and coupling strength (related to proximity between visual elements). Through psychophysical experiments with texture discrimination tasks and computational modeling, they tested whether human performance follows patterns predicted by TWCO and whether perceptual learning enhances synchrony-based grouping.

      We thank the reviewer for their thoughtful and constructive review. We believe the comments have served to improve our work.

      Strengths:

      (1) The theoretical framework connecting TWCO to visual perception is innovative and well-articulated, providing a potential mechanistic explanation for how gamma synchrony might contribute to both feature binding and separation.

      (2) The methodology combines psychophysical measurements with computational modeling, with a solid quantitative agreement between model predictions and human performance.

      (3) In particular, the demonstration that coupling strengths can be modified through experience is remarkable and suggests gamma synchrony could be an adaptable mechanism that improves with visual learning.

      (4) The cross-validation approach, wherein model parameters derived from macaque neurophysiology successfully predict human performance, strengthens the biological plausibility of the framework.

      Weaknesses:

      (1) The highly controlled stimuli are far removed from natural scenes, raising questions about generalisability. But, of course, control (almost) excludes ecological validity. The study does not address the challenges of natural vision or leverage the rich statistical structure afforded by natural scenes.

      We agree with the reviewer that the insights of the present study are limited to texture stimuli and have made adjustments in the Discussion (final two paragraphs) to avoid claiming generalizability to natural stimuli. We have also adjusted the title to specifically limit our results to texture stimuli. To establish the principles of TWCO, we needed tight control over the stimulus, but are intrigued by the idea to investigate natural scenes. We have added to our Discussion (paragraph 9) that future should evaluate to what extent the principles we investigate here apply to natural scenes. Synchrony-based mechanisms have been successfully used for image segmentation tasks in machine vision, showing that the proposed mechanism can in principle work for natural scenes.

      (2) The experimental design appears primarily confirmatory rather than attempting to challenge the TWCO framework or test boundary conditions where it might fail.

      We thank the reviewer for this important point. Our primary motivation was to address the neurophysiological properties of gamma synchrony that have been suggested to severely challenge the binding by synchrony mechanism. Particularly the strong dependence of gamma oscillations and synchrony on stimulus features. Our goal was to show that from the perspective of TWCO, these challenges become expected components of the mechanism. In essence, we wanted to promote a conceptual shift that converts what pushes a theory to its limit into something that is actually its central tenet. To facilitate this shift, we designed the experiment to directly test this core tenet.

      While our approach was designed to test a central prediction of TWCO rather than explicitly challenge its boundaries, we respectfully argue that it was far from a simple confirmatory experiment. The design incorporated high-risk elements that provided considerable room for both the theory and our model to fail. First, the core prediction itself was non-obvious and highly specific. We did not simply test whether contrast heterogeneity and grid coarseness affect perception. We tested the stronger hypothesis that they would reflect a specific, interactive trade-off (the behavioral Arnold tongue) as specified by TWCO. Second, our modeling approach was deliberately constrained to provide a further stringent test. We did not post-hoc optimize the model's key parameters to fit our behavioral data. Instead, we fixed them a priori based on independent neurophysiological data from macaques. This was a high-risk choice, as a mismatch between a priori model predictions and the human data would have seriously challenged the framework's generalizability.

      We agree that future research should further challenge TWCO. For instance, by using stimuli that require segregating several objects simultaneously or objects that cover more extensive regions of the visual field.

      (3) Alternative explanations for the observed behavioral effects are not thoroughly explored. While the model provides a good fit to the data, this does not conclusively prove that gamma synchrony is the actual mechanism underlying the observed effects.

      We agree that our results do not conclusively show that gamma synchrony is the actual mechanism underlying figure-ground segregation. We admit that the original phrasing used throughout the manuscript was too strong and gave the impression that we wanted to establish exactly that. However, the goal of our work was only to reinvigorate gamma synchrony as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We have revised the title and made adjustments throughout the manuscript to better reflect this more moderate goal.

      Additionally, we added tests of alternatives (Results, paragraphs 7–9) to clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model. First, we extracted average firing rates inside the figure. Second, we computed the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison between these two and synchrony based on out-of-sample predictions. While the rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison.

      (4) Direct neurophysiological evidence linking the observed behavioral effects to gamma synchrony in humans is absent, creating a gap between the model and the neural mechanism.

      We agree that the model only provides a how-possibly account linking stimulus features to performance. Showing that the brain actually relies on this mechanism would require showing that cortical synchrony mediates the effect of stimulus features on behavior beyond firing rates. Collecting such data would constitute a major effort that would go beyond the scope of this study. We acknowledge the need for electrophysiological data and the mediation analysis in the updated Discussion.

      Achievement of Aims and Support for Conclusions:

      The authors largely achieved their primary aim of demonstrating that human figure-ground perception follows patterns predicted by TWCO principles. Their psychophysical results reveal a behavioral "Arnold tongue" that matches the synchronization patterns predicted by their model, and their learning experiment shows that perceptual improvements correlate with predicted increases in synchrony.

      The evidence supports their conclusion that gamma synchrony could serve as a viable neural grouping mechanism for figure-ground segregation. However, the conclusion that "stimulus-dependence of gamma synchrony is adaptable to the statistics of visual experiences" is only partially supported, as the study uses highly controlled artificial stimuli rather than naturalistic visual statistics, or shows a sensitivity to the structure of experience.

      Likely Impact and Utility:

      This work offers a fresh perspective on the functional role of gamma oscillations in visual perception. The integration of TWCO with perceptual learning provides a novel theoretical framework that could influence future research on neural synchrony.

      The computational model, with parameters derived from neurophysiological data, offers a useful tool for predicting perceptual performance based on synchronization principles. This approach might be extended to study other perceptual phenomena and could inspire designs for artificial vision systems.

      The learning component of the study may have a particular impact, as it suggests a mechanism by which perceptual expertise develops through modified coupling between neural assemblies. This could influence thinking about perceptual learning more broadly, but also raises questions about the underlying mechanism that the paper does not address.

      Additional Context:

      Historically, the functional significance of gamma oscillations has been debated, with early theories of temporal binding giving way to skepticism based on gamma's stimulus-dependence. This study reframes this debate by suggesting that stimulus-dependence is exactly what makes gamma useful for perceptual grouping.

      The successful combination of computational neuroscience and psychophysics is a significant strength of this study.

      The field would benefit from future work extending (if possible) these findings to more naturalistic stimuli and directly measuring neural activity during perceptual tasks. Additionally, studies comparing predictions from synchrony-based models against alternative mechanisms would help establish the specificity of the proposed framework.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In a joint discussion to integrate the peer reviews and agree on the eLife recommendations, both reviewers agreed that the work is valuable, but they were on the fence about whether the strength of evidence was incomplete or solid, eventually settling on incomplete. The reviewers make several recommendations for improving these ratings, which I (Reviewing Editor) have organised into 3 points below, with point 1 of particular importance. Underneath the summary, please see the individual recommendations of the reviewers.

      (1) Strengthen evidence for the unique role of gamma synchrony in explaining the data, and ensuring claims are directly supported by relevant data:

      Reviewers 2 and 3 both note the lack of direct evidence for gamma involvement, and reviewer 2 observes that the fit with behaviour may trivially be explained by a relationship between contrast heterogeneity and grid coarseness without need for oscillation. The reviewers felt that the approach of fitting the model to human data could be strengthened to help address this issue - and they offer various solutions, e.g., more principled a-priori criteria around good vs bad fit of the model to both main task and training data, and comparison to alternative binding models (Reviewer 2), identifying and testing boundary conditions of the model (Reviewer 3). There is also the possibility of collecting direct human neurophysiological evidence linking the behavioural data to neural mechanisms. Our discussion also highlighted the need to weaken claims (including in the title) where links are not directly demonstrated by methods from the present study, e.g., resting on indirect comparisons to primate literature.

      We agree with the editor and reviewers that this was a critical point. To address it, we have made several major revisions.

      As suggested, we have weakened claims where the links are not directly demonstrated by our data. The title has been revised to be more specific, and we have carefully edited the abstract, introduction, and discussion to distinguish between our model's predictions and direct neurophysiological evidence.

      To address the concern that our model's fit might be trivially explained by visual features, we have performed a new analysis comparing the synchrony-based readout to two alternative rate-based readouts from the same V1 simulations. This new comparison shows that the synchrony readout provides a superior out-of-sample prediction of human behavior.

      While a full implementation of a competing theory like "Binding by Enhanced Firing" would be a valuable next step, we note that parameterizing such a model in a comparably grounded framework is a substantial undertaking beyond the scope of the present study. Our new analysis provides an important first step in this direction.

      (2) Make explicit and address the limitations of the stimuli:

      Include that the model is not extracting the figure from the background, and the controlled stimuli may limit generalizability.

      To address the concern that our model was not performing true figure-ground extraction, we performed a new set of simulations that included both the figure and the immediate background. The results confirm that synchrony dynamics within the figure region are not affected by the presence of the background. We added these validation results as supplementary materials. We have additionally made the modeling choice and its justification more explicit in the Results and Methods sections.

      We have revised the Discussion to be more explicit about the limitations of using highly controlled texture stimuli. We now clearly state that our findings are specific to this context and that further research is required to determine if these principles generalize to the segregation of objects in natural scenes.

      (3) Some clarifications to make more accessible:

      Include the figure explaining the framework (Reviewers 1&2), and also the model details (Reviewer 2).

      We have revised Figure 1 and its caption to more clearly illustrate the links from TWCO principles to their neural implementation in V1 and the resulting behavioral predictions.

      We have expanded the Methods section to provide a more detailed and accessible description of the model's construction. We now clarify precisely how the oscillator grid was defined in visual space, how eccentricity-dependent receptive field sizes were implemented, and how these were mapped onto a retinotopic cortical surface to determine coupling strengths.

      Reviewer #1 (Recommendations for the authors):

      (A) Major concerns:

      (1) My main concern:

      My main concern is the repeated claims that the observed findings can be attributed to gamma synchrony in the early visual cortex. I find this claim misleading as the authors do not report any electrophysiological data that directly supports such claims. As stated in my public review, I feel that the authors should be clear about direct evidence versus more abstract inferences based on the literature.

      In particular, I recommend changing claims about "gamma synchrony" to "Binding by Synchrony" That being said, the authors can outline that the model was built under the assumption that this synchrony is mediated by gamma in early visual cortex, but I don't think it should be part of their main conclusions.

      We appreciate that TWCO’s general principles are frequency-agnostic and can be viewed as binding by synchrony in a broad sense. Our work, however, specifically instantiates these principles in V1 gamma: the model reflects TWCO dynamics together with V1 anatomy/physiology and the well-established contrast–frequency relationship in the gamma range (which, to our knowledge, has not been demonstrated with comparable specificity for other bands). In that sense, it is a gamma oscillator model of V1, rather than a generic BBS instantiation. Moreover, stimulus dependencies often cited as challenges to BBS have been used in particular to argue against gamma; showing that these very dependencies are integral to the TWCO mechanism is central to our contribution, and we therefore keep our conclusions focused on the gamma-specific instantiation tested here.

      (2) Mediation of the observed effects by the visual features of the figure:

      The authors motivate the hypothesis that BBS predicts that the perception of texture-defined objects depends on the density of texture elements and their contrast heterogeneity. This hypothesis seems trivial as those are the features that distinguish figure from ground. I think it would be important to clarify how this hypothesis is unique to BBS and not explained by competing theories, such as Binding by Enhanced Firing (Roelfsema, 2023). The authors should be clear about what part of the hypothesis is not trivial based on the task and clearly attributable to oscillators and synchrony.

      Our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). We agree that grid coarseness (element distance) is an established facilitator of figure–ground perception. By contrast, contrast heterogeneity (feature variance) is less commonly emphasized as a figure–ground cue, compared to mean-based cues, but follows directly from TWCO’s frequency detuning. Importantly, mean contrast and luminance were matched exactly between figure and background in our stimuli. Demonstrating that contrast heterogeneity and grid coarseness not only independently affect figure-ground perception, but reflect a trade-off where higher heterogeneity needs to counteracted by reduced grid coarseness in the way TWCO specifies is therefore non-obvious and provides an initial empirical indication that the core TWCO variables might shape perception. We also agree that alternative models would further clarify the unique explanatory value of synchrony. In the revised manuscript, we compare rate-based readouts (mean figure rate; figure–background rate difference) with the synchrony readout from the same simulations. Rate difference indeed constitutes a predictor of performance, but the synchrony readout showed a larger effect and was preferred by out-of-sample model comparison.

      Using a linear model, the authors assess the relationship between discrimination accuracy and synchrony. Did the authors also include the factors grid coarseness and contrast heterogeneity in this model? Again, as both the task performance (as shown by the GEE analysis) and oscillatory synchrony depend on these features, the relationship between model and behavioral performance will be mediated by the visual features.

      Thank you for raising this. In our framework, detuning (via contrast heterogeneity) and coupling (via grid coarseness) are the inputs, synchrony is the proposed mechanistic mediator, and behavior is the output. Because synchrony in our model is a (near-)deterministic function of the manipulated features under fixed parameters, a joint features+synchrony regression is statistically ill-posed (perfect multicollinearity up to numerical error) and cannot add information. A proper mediation test would require trial-wise neural measurements of synchrony in the same task, which we do not have and acknowledge as a limitation in the Discussion. Accordingly, we show that both the features themselves (reflecting TWCO principles) and model-derived synchrony (realizing the proposed pathway) account for behavior.

      We agree this does not establish a unique contribution of synchrony. To probe alternatives, we added rate-based readouts and a model comparison to the revised manuscript. These additional analyses indicate that synchrony outperforms simple rate-based mappings. We do not claim this rules out more sophisticated rate-based mechanisms. Our aim is to demonstrate that synchrony is a viable, behaviorally informative readout for downstream processing. We do not assert it is the only mechanism the brain uses. Synchrony had been discounted due to its stimulus dependence; our results are intended to rule it back in. We have made changes throughout the manuscript to better reflect this more modest aim.

      (3) Goodness of fit measures are not established a prior:

      I have described this concern in my public review. It is hard to assess what the authors would have interpreted as a good or a bad fit, especially without accounting for the confound in the relationship between oscillator synchrony and behavior. Similarly, when assessing the similarity between the behavioral and dynamic Arnold Tongues across different coupling parameters, the authors found that the chosen parameters (based on macaque data) were not optimal. They offer the explanation that the human cortex has a lower coupling decay than the macaque cortex, and the similarity is higher for lower values of coupling decay. While this explanation is not entirely implausible, it is unclear where an oscillator model with human values would be in the presented plot, as the authors didn't estimate those values from the human studies. Moreover, the task used in the Lowet et al., 2017 paper is very different from the task presented here, which could also account for differences. Overall, the explanation appears hand-wavy considering the lack of empirically defined goodness of fit measures.

      Thank you for these concerns.

      We did indeed not provide a priori thresholds for what would be considered good fit. Instead, we used two complementary benchmarks; namely noise ceilings and parameter exploration. The former provides an upper bound on what any model (not just ours but based on completely different mechanisms) could achieve given our data. The parameter sweep provides an indication how well our concrete model can maximally fit the data and how bad it can be based on possible parameters. These benchmarks are more informative than a fixed a-priori cutoff, which would depend on unknown noise and inter-subject variability. Both the noise ceiling and the parameter exploration indicate that our model, using a priori fixed parameters, performs well. Additionally, we redid all our statistical analyses after z-normalizing every predictor to provide easier interpretation of effect sizes.

      Regarding the reason that key model parameters were not optimal, we believe our interpretation to be plausible. We agree that we currently do not have data to estimate the exact human decay factor and hence cannot establish how much model fit would be affected. However, the parameter exploration in Figure 3 shows that small to modest reductions in decay would improve model fit. We discuss this now in the revised manuscript.

      The reviewer’s suggestion is intriguing. While Lowet et al. (2017) used a different task, the parameters we took from their work (decay rate and maximum coupling) are intended to reflect anatomical properties and thus should not be task-dependent. That said, Lowet et al. ‘s data carry uncertainty, so our estimates may not be exact; we note this explicitly in the revised Discussion. Whether a different task would have yielded better parameter estimates is difficult to determine, but we considered Lowet’s paradigm appropriate because it was designed to target the same V1 anatomical and physiological properties that map onto TWCO.

      I have concerns about a similar confound in the training effects. If I'm not mistaken, the Hebbian Learning rule encourages synchronization between the oscillators in the grid. As such, it causes synchronization to increase over several simulations. Clearly, the task performance of the participants also improves over the sessions. Again, an empirical threshold would be required to assess whether the similarity in learning between model and performance goes beyond what is expected based on learning alone. How much of these effects can be attributed to the model being oscillatory?

      The reviewer is correct that, in our framework, learning operates via changes in coupling that increase synchrony. Enhanced synchrony is the proposed (and in our model also the actual) pathway by which learning impacts behavior. We agree that learning could, in principle, act through pathways other than synchrony. Demonstrating this would not be achieved by a mediation analysis here, because that requires independent, trial-level neural measurements of the candidate pathways (synchrony and alternatives). In the absence of such data, the appropriate approach would be model comparison between competing mechanistic readouts. We have added such a model comparison for a synchrony readout versus two rate-based readouts derived from the same simulations for the first session; i.e., focusing on the pathway from stimulus features to behavior. However, a similar model comparison is not possible for learning. As we show in the supplementary materials, rate-based readouts of our V1 model are not at all affected by coupling strength. As such, they are insensitive to changes in coupling and are thus not viable as alternative mechanisms to explain performance changes due to learning. A fair test of rate-based alternatives would require building a detailed rate-based figure–ground segregation model that predicts session-wise changes. We agree that this is an important next step but it is also substantial undertaking beyond the scope of the present study.

      (4) Similarly, for the comparison of the Arnold Tongue in the transfer session and the early session:

      In the first part of the Results section, it says: "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli. We evaluated whether this assumption holds for our human participants using the transfer session following the main training period. [...] If learning is indeed local, participants' performance in the transfer session should resemble that of early training sessions, indicating a reset in performance for the new retinal location."

      The authors find that a model fit to session 3 explains the data in the transfer session best and consider this as evidence for the above-stated expectation. Again, it is unclear where the cutoff would have been for a session to be declared as early or late. For instance, had the participants only performed 4 sessions, would the performance be best explained by session 3 or session 1?

      A high number of statistical tests are used, which, firstly, need to be corrected for multiple comparisons (did the authors do this?). Secondly, I feel that the regression models could be improved. For instance, the authors fit one model per session and then assess how well each model explains the variance in the transfer session. I think the authors might want to opt for one model with the regressors contrast heterogeneity, grid coarseness, and session (and their interaction). Using this approach, the authors would still be able to assess which session predicts the data best. Similarly, interindividual variability could be accounted for by adding participant-specific random effects to the model (and using a mixed model), instead of fitting individual models per participant.

      We agree the “early vs late” cutoff was underspecified. In the revision, we predefine Session 2 as the early-learning reference, excluding Session 1 to avoid familiarization/response–mapping effects. We then fit a single Bayesian hierarchical model with contrast heterogeneity, grid coarseness, and session, plus a transfer indicator, and participant-level random effects. This allows us to place the transfer session on the same scale as training and to test a) whether the transfer session precedes the state in session 2 via the posterior contrast P(βtransfer<βSess2) and b) whether it is indistinguishable from the state in session two using an equivalence test derived from the fitted model. We find that the transfer session is equivalent to session 2. We added this updated analysis of the transfer session in the Results (paragraph 15).

      In response to the suggestion to use a hierarchical regression model for analyzing the transfer session, we have decided to use such a model for all our analyses in a Bayesian framework. In this Bayesian framework, inference is based on the joint posterior (credible intervals/equivalence) of all predictors in a model and additional post-hoc multiplicity corrections are not required.

      (5) Questions regarding the model:

      What does it mean that the grid was "defined in visual space"? How biologically plausible with regard to the retinotopy and organization of the oscillators do the authors claim the model to be?

      We are happy to clarify this point. We have a total of 400 oscillators reflecting neural assemblies in V1. We start by defining a regular, 20x20, grid of the receptive field (RF) centers of these oscillators inside the figure region. Each oscillator is then also assigned a RF size based on the eccentricity of its RF center. We use the threshold-linear relationship between RF eccentricity and RF size reported in [1] to assign RF sizes. Each oscillator thus has an individual, eccentricity-dependent, RF size.

      For the coupling between oscillators, we need to know their cortical distances. We obtain these by first determining the cortical location of each oscillator through a complex-logarithmic topographic mapping of neuronal receptive field coordinates onto the cortical surface [2,3]. For this mapping, we use human parameter values estimated by [4]. From these cortical locations, we then compute pairwise Euclidean distances.

      The model thus captures realistic retinotopy, eccentricity-dependent RF sizes, and distance-dependent coupling on the cortical surface. We have adjusted our Methods to make these steps clearer.

      (1) Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature neuroscience, 14(9), 1195-1201.

      (2) Balasubramanian, M., & Schwartz, E. L. (2002). The isomap algorithm and topological stability. Science, 295(5552), 7. https://doi.org/10.1126/science.1066234

      (3) Schwartz, E. L. (1980). Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding. Vision Research, 20(8), 645–669. http://www.sciencedirect.com/science/article/pii/0042698980900905

      (4) Polimeni, J. R., Hinds, O. P., Balasubramanian, M., van der Kouwe, A. J. W., Wald, L. L., Dale, A. M., & Schwartz, E. L. (2005). Two-dimensional mathematical structure of the human visuotopic map complex in V1, V2, and V3 measured via fMRI at 3 and 7 Tesla. Journal of Vision, 5(8), 898. https://doi.org/10.1167/5.8.898

      Similarly, do the authors claim that each gabor annuli stimulates a single receptive field in V1?

      We hope that with the additional explanation above, it is clearer that there is not a one-to-one mapping. Each oscillator samples the local image by pooling over all Gabor annuli that overlap its receptive field (partially or fully) and computes the average contrast within its RF. Conversely, a single annulus typically overlaps multiple RFs and contributes to each in proportion to the overlap.

      I am unsure how the oscillators were organized, if not retinotopically. How is the retinotopic input fed into the non-retinotopically arranged oscillators?

      We hope that with the additional explanation above, it is clearer that the network is strictly retinotopic.

      The frequency of each oscillator changes according to ω=2πv with ν=25+0.25C. How were the values for the linear regression in v chosen? Reference?

      The slope and intercept parameters for this equation were first reported in [5]. We added the reference to the Methods.

      (5) Lowet, E., Roberts, M., Hadjipapas, A., Peter, A., van der Eerden, J., & De Weerd, P. (2015). Input-dependent frequency modulation of cortical gamma oscillations shapes spatial synchronization and enables phase coding. PLoS computational biology, 11(2), e1004072.

      (6) Hebbian Learning Rule:

      I am confused about how the effective learning rate E= ∈t is calculated. It is said that it is estimated based on the similarity between the second experimental session and the distribution of synchrony after letting the model learn. How can the model learn without knowing epsilon and t?

      We agree with the reviewer that our procedure to estimate the effective learning rate requires further clarification. We performed a nested grid search. Essentially, we let the model learn between session 1 and 2 with each of 25 candidate effective learning rates and evaluate how well each of them allow the model to fit performance in session 2. We then select the best effective learning rate and create a new, smaller, grid around this value and repeat that procedure. In total we perform 5 nested grids to arrive at the final effective learning rate. We expanded the explanation in the Methods.

      (B) Minor concerns:

      (1) Small N: 2/3 of the studies that were cited to justify the small sample were notably different from the current experiment, i.e., Intoy 2020 is an eye movement task, Lange 2020 is a memory task (Tesileanu 2020 is more similar). I think a power analysis would be great to support, as the sample size seems quite low

      Our study uses a within-subject design with ~750 trials per session (≈6,000 total) per participant, analyzed with a hierarchical model that pools information across trials and participants. To assess adequacy, we ran a simulation-based design analysis using the fitted hierarchical model (i.e., post hoc, based on the observed variance components). This analysis indicated a detection probability >90% for all key effects. We now report the results of this design analysis in the (Supplementary Table 1) and note this in the Results (paragraph 1).

      Regarding the literature context, we agree the cited studies are not identical to ours; we referenced them to illustrate a common practice (small N with many trials) when targeting low-level, early-visual mechanisms. Intoy (pattern/contrast sensitivity) and Lange (perceptual learning in early vision) share that focus, while Tesileanu is methodologically closest.

      (2) Figure 1 could be more informative and better described in the text. The authors often don't refer to the panels in Figure 1. Maybe it would help to swap a and b to describe the Arnold tongue first? It might also be a good idea to add the coupling strength and frequency detuning axes

      We have swapped panels a and b and now refer to each panel in the main text to enhance clarity.

      (3) Values of rho (distance - is this degrees visual angle)? Do the authors assume that the size of the stimuli corresponds to receptive fields in V1? If so, how is this justified?

      The center-to-center distance between any pair of neighboring annuli is indeed expressed in degrees of visual angle. Rho is a scaling factor for this distance. With rho=1, the center-to-center distance corresponds to the diameter of the annuli; i.e., they touch but do not overlap each other. We do not assume any relation between the size of receptive fields and the size of the annuli. Receptive field sizes in our model are purely determined by their eccentricity and each oscillator can have several annuli within its receptive field while each annulus can fall within several overlapping receptive fields of different oscillators. We believe that the schematic illustration in Figure 1 might have given the impression that each oscillator sees exactly one annulus and added a note that this is not the case and merely an oversimplification to illustrate the relationship between contrast and intrinsic frequency.

      (4) Some equations are embedded in the text, and some are not. It might be easier to find the respective equation if they all have an index. For instance, the authors mention the psychometric function that relates model synchrony and performance in the results section. It would be easier to find if it had an index that the authors could refer to.

      We moved this equation as well as the contrast intrinsic frequency mapping from inline to displayed and numbered them.

      (5) Is there a reference for "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli"? (If so, it should be cited.)

      We added references supporting this assumption.

      (6) Figure 2b: colorbar missing label.

      We added the label.

      Reviewer #2 (Recommendations for the authors):

      Cool work!

      (1) The reader would benefit from (a single) comprehensive figure that visually explains the entire conceptual framework-from TWCO principles to neural implementation to behavioural predictions-accessible to readers without specialised knowledge of oscillatory dynamics. This will give the paper a greater impact.

      We have adjusted Figure 1 in accordance with suggestions made by reviewer 1 and added further explanations to the caption and the Introduction to enhance clarity on how the principles of TWCO relate to neural implementation.

      (2) I think this paper would benefit from the audience eLife provides, but the paper could move closer to the audience.

      (3) Pride comes before the fall, but I am not the most uninformed reader, and it took me some effort to process everything.

      Thank you, we took this to heart. In the Introduction, we now state more explicitly how each variable is operationalized and how these map onto TWCO with improved reference to relevant panels in the schematic figure. We agree the framework is conceptually dense. TWCO principles reach the stimuli through specific V1 anatomy and physiology, so there are several links to keep in mind. Our goal with the revised introduction and figure is to make those links better visible.

      (4) You could consider discussing potential implications for understanding perceptual disorders characterized by altered neural synchrony (e.g., schizophrenia, autism) and how your learning paradigm might inform perceptual training interventions.

      Thank you for this suggestion. We have added that TWCO might provide a new lens to study perceptual disorders to the Discussion. We provide a concrete example of the relation between grouping, gamma synchrony (in light of TWCO) and lateral connectivity in schizophrenia

      (5) I think this paper has real strength, but rather than dispersing limitations throughout the discussion, create a dedicated section that systematically addresses ecological validity, alternative explanations, and generalisability concerns. This will also preempt criticism.

      We appreciate the suggestion. Our preference is to discuss limitations in context, next to the specific results they qualify, so readers see why each limitation matters and how it affects interpretation. Nevertheless, paragraph 7 on page 20 summarizes most limitations in a single paragraph.

    1. It’s important for educators to have a sense of what race and ethnicity are due to our potential for subconscious racial biases as teachers of MLs.1 While some MLs and their educators may share a common racial or ethnic identity, many do not. As white educators ourselves who have been granted many unearned privileges, we (the book authors) must become aware of and reflect on what these biases and privileges might mean for our practice as teachers. No matter what our racial identity and ethnicity, all of us need to approach this work with humility.

      This passage emphasizes the need for students to reflect on their own biases and identities. I think this connects strongly to culturally responsive teaching because educators must be aware of how their perspectives influence their teaching practices. Reflection and humility allow teachers to create more equitable learning environments for multilingual learners. This makes me think about how ongoing professional development could support teachers in recognizing and addressing these biases.

    1. For an example of public shaming, we can look at late-night TV host Jimmy Kimmel’s annual Halloween prank, where he has parents film their children as they tell the parents tell the children that the parents ate all the kids’ Halloween candy. Parents post these videos online, where viewers are intended to laugh at the distress, despair, and sense of betrayal the children express. I will not link to these videos which I find horrible, but instead link you to these articles:

      I think that children often find distress in many thing that don't warrant it and it may be humorous to so to see them worry about things that aren't serious but I personally don't like this Jimmy Kimmel prank. The intention of the adults' here is to cause distress to kid for laughter alone and thats not fair or kind and it shouldn't be okay just because they are children. Posting this sort of content online could also have negative mental and social effects on a kid too.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the recruitment order and assembly of the Cdv proteins during Sulfolobus acidocaldarius archaeal cell division using a bottom-up reconstitution approach. They employed liposome-binding assays, EM, and fluorescence microscopy with in vitro reconstitution in dumbbellshaped liposomes to explore how CdvA, CdvB, and the homologues of ESCRT-III proteins (CdvB, CdvB1, and CdvB2) interact to form membrane remodeling complexes.

      The study sought to reconstitute the Cdv machinery by first analyzing their assembly as two subcomplexes: CdvA:CdvB and CdvB1:CdvB2ΔC. The authors report that CdvA binds lipid membranes only in the presence of CdvB and localizes preferentially to membrane necks. Similarly, the findings on CdvB1:CdvB2ΔC indicate that truncation of CdvB2 facilitates filament formation and enhances curvature sensitivity in interaction with CdvB1. Finally, while the authors reconstitute a quaternary CdvA:CdvB:CdvB1:CdvB2 complex and demonstrate its enrichment at membrane necks, the mechanistic details of how these complexes drive membrane remodeling by subcomplexes removal by the proteasome and/or CdvC remain speculative.

      Although the work highlights intriguing similarities with eukaryotic ESCRT-III systems and explores unique archaeal adaptations, the conclusions drawn would benefit from stronger experimental validation and a more comprehensive mechanistic framework.

      Strengths:

      The study of machinery assembly and its involvement in membrane remodeling, particularly using bottom-up reconstituted in vitro systems, presents significant challenges. This is particularly true for systems like the ESCRT-III complex, which localizes uniquely at the lumen of membrane necks prior to scission. The use of dumbbell-shaped liposomes in this study provides a promising experimental model to investigate ESCRT-III and ESCRT-III-like protein activity at membrane necks.

      The authors present intriguing evidence regarding the sequential recruitment of ESCRT-III proteins in crenarchaea-a close relative of eukaryotes. This finding suggests that the hierarchical recruitment characteristic of eukaryotic systems may predate eukaryogenesis, which is a significant and exciting contribution. However, the broader implications of these findings for membrane remodeling mechanisms remain speculative, and the study would benefit from stronger experimental validation and expanded contextualization within the field.

      We thank the Referee for his/her appreciation of our work.

      Weaknesses:

      This manuscript presents several methodological inconsistencies and lacks key controls to validate its claims. Additionally, there is insufficient information about the number of experimental repetitions, statistical analyses, and a broader discussion of the major findings in the context of open questions in the field.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #2 (Public review):

      Summary:

      The Crenarchaeal Cdv division system represents a reduced form of the universal and ubiquitous ESCRT membrane reverse-topology scission machinery, and therefore a prime candidate for synthetic and reconstitution studies. The work here represents a solid extension of previous work in the field, clarifying the order of recruitment of Cdv proteins to curved membranes.

      Strengths:

      The use of a recently developed approach to produce dumbbell-shaped liposomes (De Franceschi et al. 2022), which allowed the authors to assess recruitment of various Cdv assemblies to curved membranes or membrane necks; reconstitution of a quaternary Cdv complex at a membrane neck.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      The manuscript is a bit light on quantitative detail, across the various figures, and several key controls are missing (CdvA, B alone to better interpret the co-polymerisation phenotypes and establish the true order of recruitment, for example) - addressing this would make the paper much stronger. The authors could also include in the discussion a short paragraph on implications for our understanding of ESCRT function in other contexts and/or in archaeal evolution, as well as a brief exploration of the possible reasons for the discrepancy between the foci observed in their liposome assays and the large rings observed in cells - to better serve the interests of a broad audience.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #3 (Public review):

      Summary:

      In this report, De Franceschi et al. purify components of the Cdv machinery in archaeon M. sedula and probe their interactions with membrane and with one-another in vitro using two main assays - liposome flotation and fluorescent imaging of encapsulated proteins. This has the potential to add to the field by showing how the order of protein recruitment seen in cells is related to the differential capacity of individual proteins to bind membranes when alone or when combined.

      Strengths:

      Using the floatation assay, they demonstrate that CdvA and CdvB bind liposomes when combined. While CdvB1 also binds liposomes under these conditions, in the floatation assay, CdvB2 lacking its C-terminus is not efficiently recruited to membranes unless CdvAB or CdvB1 are present. The authors then employ a clever liposome assay that generates chained spherical liposomes connected by thin membrane necks, which allows them to accurately control the buffer composition inside and outside of the liposome. With this, they show that all four proteins accumulate in necks of dumbbell-shaped liposomes that mimic the shape of constricting necks in cell division. Taken altogether, these data lead them to propose that Cdv proteins are sequentially recruited to the membrane as has also been suggested by in vivo studies of ESCRT-III dependent cell division in crenarchaea.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      These experiments provide a good starting point for the in vitro study the interaction of Cdv system components with the membrane and their consecutive recruitment. However, several experimental controls are missing that complicate their ability to draw strong conclusions. Moreover, some results are inconsistent across the two main assays which make the findings difficult to interpret:

      (1) Missing controls.

      Various protein mixtures are assessed for their membrane-binding properties in different ways. However, it is difficult to interpret the effect of any specific protein combination, when the same experiment is not presented in a way that includes separate tests for all individual components. In this sense, the paper lacks important controls. For example, Fig 1C is missing the CdvB-only control. The authors remark that CdvB did not polymerise (data not shown) but do not comment on whether it binds membrane in their assays. In the introduction, Samson et al., 2011 is cited as a reference to show that CdvB does not bind membrane. However, here the authors are working with protein from a different organism in a different buffer, using a different membrane composition and a different assay. Given that so many variables are changing, it would be good to present how M. sedula CdvB behaves under these conditions.

      We thank the referee for raising this point. We have now added these data in Figure 1C. Indeed it turns out that CdvB from M. sedula exhibits clear membrane binding on its own in a flotation assay.

      Similarly, there is no data showing how CdvB alone or CdvA alone behave in the dumbbell liposome assay.

      Without these controls, it's impossible to say whether CdvA recruits CdvB or the other way around. The manuscript would be much stronger if such data could be added.

      We have now added these data in Figure 1E, 1F and 1G. Overall, we can confirm that CdvA binds the membrane better in the presence of CdvB (although both proteins can bind the membrane on their own). Both proteins appear to recognize the curved region of the membrane neck.

      (2) Some of the discrepancies in the data generated using different assays are not discussed.

      The authors show that CdvB2∆C binds membrane and localizes to membrane necks in the dumbbell liposome assay, but no membrane binding is detected in the flotation assay. The discrepancy between these results further highlights the need for CdvB-only and CdvA-only controls.

      We have now added these controls in Figure 1. In addition, we would like to clarify that the flotation assay and the SMS dumbbell assay serve different purposes and are not directly comparable in quantitative terms. In the flotation assay, all the protein present as input is eventually recovered and visualized. Thus, quantitative information on the proportion of the fraction of the total protein bound to lipids can be inferred from this assay. The SMS assay, in contrast, provides a very different kind of information. Because of the particular protocol required to generate dumbbells (De Franceschi, 2022), the total amount of protein in the inner buffer in dumbbells is not accurately defined, because protein that is not correctly reconstituted (e.g. which aggregates while still in the droplet phase) will interfere with vesicle generation, with the result that dumbbell with such aggregates is generally not formed in the first place. This renders it impossible to draw any quantitative conclusions about the proportion of the sample bound to lipids. The SMS is therefore not directly comparable to the flotation assay, and it is rather complementary to it. Indeed, the purpose of the SMS is to provide information about curvature selectivity of the protein.

      (3) Validation of the liposome assay.

      The experimental setup to create dumbbell-shaped liposomes seems great and is a clever novel approach pioneered by the team. Not only can the authors manipulate liposome shape, they also state that this allows them to accurately control the species present on the inside and outside of the liposome. Interpreting the results of the liposome assay, however, depends on the geometry being correct. To make this clearer, it would seem important to include controls to prove that all the protein imaged at membrane necks lie on the inside of liposomes. In the images in SFig3 there appears to be protein outside of the liposome. It would also be helpful to present data to show test whether the necks are open, as suggested in the paper, by using FRAP or some other related technique.

      We thank the Referee for his/her appreciation. The proteins are encapsulated inside the liposomes, not outside of them. While Figure S3 might give the appearance that there is some protein outside, this is actually just an imaging artifact. Author response image 1 (below) explains this: When the membrane and protein channel are shown separately, it is clear that the protein cluster that appeared to be ‘outside’ actually colocalizes with an extra small dumbbell lobe (yellow arrowhead). The protein appeared to be outside of it because (1) the protein fluorescent signal is stronger than the signal from the membrane, and (2) there is a certain time delay in the acquisition of the two channels (0.5-1 second), thus the membrane may have slightly shifted out of focus when the fluorescence was being acquired. We are confident that the protein is inside in these dumbbells because the procedure for preparing the dumbbells requires extensive emulsification by pipetting, which requires ≈ 1 minute. This time is more than sufficient for proteins with high affinity for the membrane, like ESCRT and Cdv, to bind the membrane. For an example of how fast binding under confinement can be, please see movie 2 from this paper: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.

      Moreover, in many instances, we observed that the protein is inside because, by increasing the gain in the images post-acquisition, a clear protein signal appear in the lumen (see Author response image 2).

      Author response image 1.

      Separate channels showing colocalization of protein and lipids (adapted from Figure S3). The zoom-in shows separate channels, highlighting that the CdvB2 cluster that seems to be ‘outside the dumbbell’ actually colocalizes with the small terminal lobe of the dumbbell, indicating that the protein is encapsulated within that lobe.

      Author response image 2.

      Residual protein present inside lumen of dumbbells as visualized by increasing the brightness post-acquisition.

      We are not sure what the referee means by “test whether the necks are open, as suggested in the paper”. We are confident that the lobes of dumbbells originated from a single floppy vesicle, and were therefore mutually connected with an open neck (at least at the onset of the experiment). We have performed extensive FRAP assays on dumbbells in previous papers (De Franceschi et al., ACS nano 2022 and De Franceschi et al., Nature Nanotech 2024) which unequivocally proved that these chains of dumbbells are connected with open necks. We now also performed a few FRAP assay with reconstituted Cdv proteins, which confirmed this point. We have added a movie of such an experiment to the manuscript (Movie 1).

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (4) Quantification of results from the liposome assay.

      The paper would be strengthened by the inclusion of more quantitative data relating to the liposome assay. Firstly, only a single field of view is shown for each condition. Because of this, the reader cannot know whether this is a representative image, or an outlier? Can the authors do some quantification of the data to demonstrate this? The line scan profiles in the supplemental figures would be an example of this, but again in these Figures only a single image is analyzed.

      The images that we showed are indeed representative. The dumbbells that are generated by the SMS approach contain an “internal control”: in each dumbbell, the protein has the option of localizing at the neck or localizing elsewhere in the region of flat membrane. We see consistently that Cdv proteins have a strong preference for localizing at the neck.

      We would recommend that the authors present quantitative data to show the extent of co-localization at the necks in each case. They also need a metric to report instances in which protein is not seen at the neck, e.g. CdvB2 but not CdvB1 in Fig2I, which rules out a simple curvature preference for CdvB2 as stated in line 182.

      While the request for better quantitation is reasonable, this would require carrying out very significant new experiments at the microscope, which is rendered near-impossible since both first authors left the lab on to new positions.

      Secondly, the authors state that they see CdvB2∆C recruited to the membrane by CdvB1 (lines 184-187, Fig 2I). However, this simple conclusion is not borne out in the data. Inspecting the CdvB2∆C panels of Fig 2I, Fig3C, and Fig3D, CdvB2∆C signal can be seen at positions which don't colocalize with other proteins. The authors also observe CdvB2∆C localizing to membrane necks by itself (Fig 2E). Therefore, while CdvB1 and CdvB2∆C colocalize in the flotation assay, there is no strong evidence for CdvB2∆C recruitment by CdvB1 in dumbbells. This is further underscored by the observation that in the presented data, all Cdv proteins always appear to localize at dumbbell necks, irrespective of what other components are present inside the liposome. Although one nice control is presented (ZipA), this suggests that more work is required to be sure that the proteins are behaving properly in this assay. For example, if membrane binding surfaces of Cdv proteins are mutated, does this lead to the accumulation of proteins in the bulk of the liposome as expected?

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have an affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then? We estimate that the simple answer is that, in this particular case, there are more clusters than there are necks, so some of the clusters must necessarily localize somewhere else.

      Author response image 3.

      Current Figure 2H, where clusters that are double-positive for both CdvB1 and CdvB2ΔC are indicated by yellow arrowheads, while cluster that apparently only contain CdvB2ΔC are indicated by red arrowheads. It is observed that all the double-positive clusters are localized at necks.

      (5) Rings.

      The authors should comment on why they never observe large Cdv rings in their experiments. In crenarchaeal cell division, CdvA and CdvB have been observed to form large rings in the middle of the 1 micron cell, before constriction. Only in the later stages of division are the ESCRTs localized to the constricting neck, at a time when CdvA is no longer present in the ring. Therefore, if the in vitro assay used by the authors really recapitulated the biology, one would expect to see large CdvAB rings in Figs 1EF. This is ignored in the model. In the proposed model of ring assembly (line 252), CdvAB ring formation is mentioned, but authors do not discuss the fact that they do not observe CdvAB rings - only foci at membrane necks. The discussion section would benefit from the authors commenting on this.

      The referee is correct: it is intriguing that we don’t see micron-sized rings for CdvA and CdvB. We do note that our EM data (Fig.S1) show that CdvA in its own can form rings of about 100-200nm diameter, well below the diffraction limit, that could well correspond to the foci that we optically resolve in Figure 1. We now added a brief comment on this to the manuscript on lines 256-264.

      (6) Stoichiometry

      It is not clear why 100% of the visible CdvA and 100% of the the visible CdvB are shifted to the lipid fraction in 1C. Perhaps this is a matter of quantification. Can the authors comment on the stoichiometry here?

      We agree that this was unclear. Since that particular gel was stained by coumassie, the quantitative signals might be unreliable, and hence we have repeated this experiment using fluorescently labelled proteins, which show indeed a less extreme distribution. This was also done to make the data more uniform, as requested by the referees.

      (7) Significance of quantification of MBP-tagged filaments.

      Authors use tagging and removal of MBP as a convenient, controllable system to trigger polymerisation of various Cdv proteins. However, it is unclear what is the value and significance of reporting the width and length of the short linear filaments that are formed by the MBP-tagged proteins. Presumably they are artefactual assemblies generated by the presence of the tag?

      Providing a measure of the changes induced by MBP removal, in fact, validates that this actually has an effect. But perhaps this places too much emphasis on the short filaments. We now opted for a compromise, removing the quantification of the width and length of short filaments formed by MBPtagged protein from the text, but keeping the supplementary figure showing their distribution as compared to the other filaments (Figure S2E, SF).

      Similar Figure 2C doesn't seem a useful addition to the paper.

      We removed panel 2C, and now merely report these values in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest the authors perform a deeper discussion about their findings, such as what are the evolutionary implications, how they think lipids from these archaea may affect the recruitment process,...

      Because there is no exact homology between Archaea Cdv proteins and Eukaryotic ESCRT-III proteins, we do not feel our work brings new evolutionary implications beyond what we already state in the manuscript. We also dis not perform experiments using Archaea lipids, thus we would rather not speculate on how they may potentially affect the recruitment of Cdv proteins.

      In general, the manuscript lacks information regarding some scale bars, number of experimental repetitions (n or N), statistical analysis when needed, information about protein concentrations used in their assays.

      We have now added this information in the manuscript.

      Below, I provide a list of comments that I think the authors should address to improve the manuscript:

      (1) Line 113-114: The authors test protein-membrane interactions using flotation assays with positively curved SUV membranes but encapsulate proteins in dumbbell-shaped liposomes with negative curvature at the connecting necks. Might the use of membranes with opposite curvatures affect the recruitment process? Since the proteins are fluorescently labeled, I suggest testing recruitment using flat giant unilamellar vesicles or supported lipid bilayers (with zero curvature) to validate their findings.

      We thank the referee for this suggestion. Please do note that we are not claiming in our paper that Cdv proteins recognize negative curvature. We merely observe that they localize at necks. The neck of a dumbbell exhibits the so-called “catenoid” geometry, which is characterized by having both positive and negative curvature.

      Experimentally, on the SUVs, we now realize there was a mistake in the method section: In the flotation assay we in fact used multilamellar vesicles, not SUVs, precisely for the reason mentioned by the referee. We apologize for the oversight and have now corrected this in the methods. Multilamellar vesicles are not characterized by a strong positive curvature as SUVs do, but we do agree that they likely don’t have negative curvature there either. Because of the heterogeneous nature of the multilamellar vesicles, they provide a binding assay that was rather independent of the curvature. Complementary to the flotation assay, the SMS approach was employed to reveal the curvature preference of proteins.

      Finally, we performed the experiment on large GUVs suggested by the referee using CdvB as an example, but this turned out to be inconclusive because the protein forms clusters: these clusters may be creating local curvature at the nanometer scale, which cannot be resolved by optical microscopy (Author response image 4). This is quite typical for proteins that recognize curvature (cf. for instance: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.)

      Author response image 4.

      Fluorescently labelled CdvB bound to giant unilamellar vesicle. The protein was added in the outer buffer. CdvB forms distinct clusters, which may generate a local region of high membrane curvature.

      (2) Line 138-139: How is His-ZipA binding the membrane? Wouldn't Ni<sup>2+</sup>-NTA lipids be required? If not, how is the binding achieved?

      Indeed, NTA-lipids were present. This is now stated both in the legend and in the methods.

      (3) In the encapsulated protein assays, why does the luminal fluorescence intensity of the encapsulated protein sometimes appear similar to the bulk fluorescence signal? Since only a small fraction of the protein assembles at membrane necks, shouldn't the luminal pool of unbound protein show higher fluorescence intensity inside the liposomes?

      We thank the referee for raising this point and giving us the opportunity to explain this. The reason is that Cdv proteins have a very high affinity for the neck, and when they cluster at the neck the fluorescence intensity of the cluster is many times higher than the background fluorescence. Because we were interested in imaging the clusters and avoiding overexposing them, we adjusted the imaging conditions accordingly, with the result that the fluorescence from both the lumen and the bulk is at very low level.

      By choosing different imaging conditions, however, it can be actually seen that the signal inside the lumen is clearly higher than the bulk: this can be seen for instance in Author response image 2, where the brightness has been properly adjusted.

      (4) Line 184-185: In Fig. 2I, some CdvB2ΔC puncta seem independent of CdvB1 and are not localized at membrane necks. How many such puncta exist? For example, in the provided micrograph, 2 out of 5 clusters are independent of CdvB1. This proportion is significant. Could the authors quantify the prevalence of these structures and discuss why they form?

      We thank the referee for giving us the opportunity to explain this apparent discrepancy. We’ll like to stress the fact that CdvB2ΔC and CdvB1 form an obligate heterodimer: in all our experiments, without exception, we find that they form a strong complex when we mix the two proteins. This is true both in dumbbells and in flotation assays.

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then?

      (5) Figure 1E and 1F: Why do lipids accumulate and colocalize with the proteins? How can the authors confirm lumen connectivity between vesicles? Performing FRAP assays could validate protein localization and enrichment at the lumen of the membrane necks.

      At first sight, indeed some lipid enrichment seems to be observed at the neck between lobes of dumbbells.

      This is, however, an imaging artifact due to the fact that the neck is diffraction limited. As shown in the Author response image 5, we are acquiring the membrane signal from both lobes at the neck region, and therefore the signal is roughly double, hence the apparent lipid enrichment.

      Author response image 5.

      Schematic illustrating that the neck between two lobes is smaller than the diffraction limit of optical microscopy (the size of a typical pixel is indicated by the green square). Because of this technical limitation, the fluorescence intensity of the membrane at the neck is twice that of a single membrane.

      The referee is correct in pointing out that these images do not prove that the lobes are connected, and that FRAP assays is the only way to prove this point. However, in previous papers we have confirmed extensively that in chains of dumbbells the lobes are connected:

      - De Franceschi N, Pezeshkian W, Fragasso A, Bruininks BMH, Tsai S, Marrink SJ, Dekker C. Synthetic Membrane Shaper for Controlled Liposome Deformation. ACS Nano. 2022 Nov 28;17(2):966–78. doi: 10.1021/acsnano.2c06125.

      - De Franceschi N, Barth R, Meindlhumer S, Fragasso A, Dekker C. Dynamin A as a one-component division machinery for synthetic cells. Nat Nanotechnol. 2024 Jan;19(1):70-76. doi: 10.1038/s41565023-01510-3.

      Random sticking of liposomes would also generate clusters of vesicles, not linear chains. We now provide also a Movie (Movie 1) supporting this point.

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (6) Why didn't the authors use the same lipid composition, particularly the same proportion of negatively charged lipids, on the SUVs of the flotation assays and on the dumbbell-shaped liposomes?

      In flotation assays, it is typical to use a relatively large proportion of negatively charged lipids, to promote protein binding. This is because the aim is to maximize membrane coverage by the protein. The SMS procedure to generate dumbbell-shaped GUVs is completely different, however. Rather than covering the membrane with protein, the idea is to reduce the amount of protein to a minimum, so that any curvature preference can be best visualized. This is e.g. routinely done in tube pulling experiments, for the same reason (See for instance Prévost C, Zhao H, Manzi J, Lemichez E, Lappalainen P, Callan-Jones A, Bassereau P. IRSp53 senses negative membrane curvature and phase separates along membrane tubules. Nat Commun. 2015 Oct 15;6:8529. doi: 10.1038/ncomms9529).

      (7) Line 117-119: The suggestion that polymer formation between CdvA and CdvB facilitates membrane recruitment is intriguing. However, fluorescence microscopy experiments could better elucidate whether there is sequential recruitment of CdvB followed by CdvA, or if these proteins form a heteropolymer composite for membrane binding. Can CdvB bind membranes independently, or does this require synergy between CdvA and CdvB.

      We thank the referee for prompting us to perform this experiment. As we now show in Figure 1C, CdvB indeed is able to bind the membrane independently of CdvA. Whether this happens sequentially or simultaneously is an interesting question, but one that is impossible to address with either the SMS or the flotation assay, because in both cases we can only observe the endpoint of the recruitment.

      We would also like to clarify one specific experimental detail. Perhaps unsurprisingly, the results from the flotation assay are dependent on the way the assay is performed. In particular, we observed that the same protein can exhibit a different binding profile depending on whether it is being loaded either at the top or at the bottom of the gradient. This can be seen in Author response image 6. This is counterintuitive, since once the equilibrium is reached, the result should only depend on the density of the sample. We performed an overnight centrifugation (> 16 hours) on a short tube (< 3 cm tall), thus equilibrium is being reached (which is corroborated by the fact that CdvB1 and CdvB2 can float to the top of the gradient within this timespan, as shown in Figure 2C, 2E, 2G). We ascribe the difference between top and bottom loading to the fact that, when the sample is loaded at the bottom, it has to be mixed with a concentrated sucrose solution, while in the case of loading from the top, this is not done.

      In literature, both loading from top and from bottom have been used:

      - Lata S, Schoehn G, Jain A, Pires R, Piehler J, Gottlinger HG, Weissenhorn W. Helical structures of ESCRTIII are disassembled by VPS4. Science. 2008 Sep 5;321(5894):1354-7. doi: 10.1126/science.1161070

      - Moriscot C, Gribaldo S, Jault JM, Krupovic M, Arnaud J, Jamin M, Schoehn G, Forterre P, Weissenhorn W, Renesto P. Crenarchaeal CdvA forms double-helical filaments containing DNA and interacts with ESCRT-III-like CdvB. PLoS One. 2011;6(7):e21921. doi: 10.1371/journal.pone.0021921.

      - Senju Y, Lappalainen P, Zhao H. Liposome Co-sedimentation and Co-flotation Assays to Study LipidProtein Interactions. Methods Mol Biol. 2021;2251:195-204. doi: 10.1007/978-1-0716-1142-5_14. In performing the flotation assay for CdvB1 and CdvB2ΔC, or when using all 4 proteins together, we loaded the sample at the bottom, and we could detect reproducible binding to liposomes (Figures 2D, 2F, 2H, 3A). However, CdvB does not bind the membrane when loaded at the bottom. Thus, for the experiments shown in figure 1C, we loaded the proteins at the top. This experimental setup allowed us to highlight that CdvB indeed induce a stronger interaction between CdvA and the membrane.

      Author response image 6.

      CdvB binding to multilamellar vesicles in a flotation assay. In the left panel, the sample was loaded at the top of the sucrose gradient; in the right panel it was loaded at the bottom.

      (8) Line 165-173: The authors claim that filament curvature differs between CdvB2ΔC alone and the CdvB1:CdvB2ΔC complex. Are these differences statistically significant? What is the sample size (N)? Furthermore, how do the authors confirm interactions between these proteins in the absence of membranes based solely on EM micrographs?

      We can confirm that the filaments are composed by both proteins, because the filaments have different curvature when both proteins are present. However, as requested by referee 3, point (7), we removed the quantification of curvature from panel 2C. We report the N number in the text.

      (9) Line 121-123: Are the authors referring to positive or negative membrane curvatures? The cited literature suggests ESCRT-III proteins either lack curvature preferences (e.g., Snf7, CHMP4B) or prefer high positive curvature (e.g., late ESCRT-III subunits). This is confusing since the authors later test recruitment to negatively curved necks.

      We do not claim that Cdv proteins prefer positive or negative curvature, because the necks present in dumbbells have a catenoid geometry, which include both positive and negative curvature. We have now clarified this in the discussion.

      (10) Since the conclusions rely on the oligomeric state of the proteins, providing SEC-MALS spectra to show the protein oligomeric state right after the purification would strengthen the claims.

      While such SEC-MALDI experiments may be interesting, practical implementation of this is not possible since both first authors left the lab on to new positions.

      (11) Line 157-160: Suppl. Fig. 2 shows only a single EM micrograph of a small filament. Could the authors provide lower magnification images showing more filaments?

      As requested by Referee 3, point (7), we have toned down the importance of these short filaments.

      Also, why are the sample sizes for filament length (N=161) and width (N=129) different?

      Protein filaments formed by Cdv tend to stick to each other side by side, so that for some filaments the width could not be accurately assessed, and accordingly those were removed from the analysis.

      (12) The introduction states that CdvA binds membranes while CdvB does not. However, the results suggest CdvB facilitates membrane binding, helping CdvA attach. This discrepancy needs further explanation.

      We thank the referee for raising this point. We have now performed additional experiments (both SMS assay and flotation assays) showing that indeed CdvB from M. sedula is (unlike CdvB from Sulfolobus) able to bind the membrane on its own (Figure 1C, 1F).

      Reviewer #2 (Recommendations for the authors):

      Best practice would be to show single fluorescence channels in grayscale or inverted grayscale, retaining pseudocolouring only for the merged multichannel image.

      We decided to retain and standardize the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. We believe this improves readability, and this was also a request from Referee 3. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      It would be great to include a quantification of liposome curvature vs focal intensity of the various Cdv components - across figures.

      Quantification of liposome curvature at the neck can be done (De Franceschi et al., Nature Nanotech. 2024). However, in practice, this requires transferring of the sample post-preparation into a new chamber in order to increase the signal-to-noise ratio of the encapsulated dye, a procedure that drastically reduces the yield of dumbbells. The very sizeable amount of work required to obtain reliable measurements, especially considering all the proteins and protein combinations used in this study, indicates that this represents a project in itself, which goes well beyond the scope of this manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) We would encourage the authors to consider including the length of the scale bar next to the scale bar in each image and not in the figure description. This would greatly aid in clarity and interpretation of figures.

      We have now written the length of the scale bar in the figures.

      (2) In a similar vein, could the authors consider labeling panels throughout the manuscript, writing that sample is being presented? This goes mainly for the negative stain and the dumbbell fluorescence images, as having to continuously consult the figure legend again hinders clarity.

      We have now labelled the EM images as requested by the referee.

      (3) Lines 254-256: would the statement hold not only for CdvB2∆C, but for all imaged proteins? They all seem to localize to membrane necks, presumably favoring membrane binding to a specific membrane topology.

      We agree with the referee, and changed the phrasing accordingly.

      (4) CdvB2∆C construct - presumably this was a truncation of helix 5 of the ESCRT-III domain? Figure 1A shows that the ESCRT-III domain spans residues 34-170 and therefore implies that all five ESCRT-III helices (which make up the ESCRT-III domain) are present in the C-terminal truncation. Could the authors clarify?

      Indeed, the truncation was done at residue 170.

      (5) Results of the liposome flotation assays are presented inconsistently across the three figures (Figs 1C, 2DFH, and 3A). This makes it more difficult than it needs to be to interpret and compare results. Could the authors consider presenting the three gels in a more similar, standardized way across the three figures?

      To improve readability, we now standardized the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      (6) From the data presented in Fig 1EF, it cannot be concluded whether CdvB and CdvA colocalize, as only one protein is labelled. Is there a technical reason for this?

      We have now repeated the same experiment by having both proteins labelled, confirming that there is co-localization at the neck (Figure 1G).

      (7) Fig 2C: is the difference between the two samples significant

      As requested by Referee 3, we have removed Figure 2C.

      (8) Fig 2I is missing a 'merged' panel.

      We have now added the merged panel.

      (9) The fluorescence intensity plots in Supp Figs 1C and 3C would be easier to interpret if the lipid and protein signal would be plotted on the same plot (say, with normalized fluorescence intensity)

      It is not immediately obvious to us what the signal should be normalized to. What we wished to convey with these plots was that the intensity of proteins spikes at the neck region. In an attempt to improve clarity, we have now aligned the plots vertically, and highlighted the position of the neck.

      (10) CdvA should have a capital "A" in Figure 3A, panel 3.

      We have now corrected this.

      (11) The discussion doesn't comment on the need to truncate CdvB2.

      This is explained in the result session.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, and incomplete in some aspects. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets 

      We are encouraged by this favorable assessment and thank editors and reviewers for their constructive feedback and recommendations. We trust that the revisions made to the manuscript will clarify the aspects that had been perceived to be incomplete.

      Reviewer #1 (Public review):

      Summary: 

      This study seeks to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, it sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). This study used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. While TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple nonpolar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. This mechanism was confirmed using an additional set of simulations and used to explain experimental electrophysiology data.

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The study develops forcefield parameters for the RY785 molecule based on extensive QM-based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the singlechannel conductance. The study performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The conclusion is that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits the K+ current. This conclusion is plausible given that RY785 makes stable contact with multiple hydrophobic residues in the S6 helix. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The study, however, did not produce this semi-closed channel conformation and acknowledges that more direct simulation evidence would require extensive enhanced-sampling simulations. The study has not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the study quantified K+ permeation, it does not make any estimates of the ligand binding affinities or rates, which could have been potentially compared to the experiment and used to validate the models. 

      As stated in the original manuscript, we concur that the mechanism we propose remains hypothetical until further studies of the complete conformational cycle of the channel are conducted. The recently determined structure of a Kv2.1 channel in the closed state (Mandala and MacKinnon, PNAS 2025) presents an excellent opportunity to do so. Indeed, a cursory analysis of that structure shows that a Pro-Ile-Pro motif in helix S6 marks the position of the intracellular gate, where the pore domain constricts maximally (aside from the selectivity filter). As illustrated in Fig. 5, this motif is precisely where the benzimidazole and thiazole moieties of RY785 bind in our simulations. The mechanism we outline in Fig. 7 thus seems very plausible, in our view; that is RY785 occludes the K<sup>+</sup> permeation pathway before the pore domain reaches the closed conformation, explaining the observed electrophysiological effects (see Discussion). The Discussion has been revised to note the recent discovery of the aforementioned structure, its implications for the mechanism we propose, and the opportunities for further research that are now open.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Zhang et al. investigate the conductivity and inhibition mechanisms of the Kv2.1 channel, focusing on the distinct effects of TEA and RY785 on Kv2 potassium channels. The study employs microsecond-scale molecular dynamics simulations to characterize K+ ion permeation and compound binding inhibition in the central pore. 

      Strengths:

      The findings reveal a unique inhibition mechanism for RY785, which binds to the channel walls in the open structure while allowing reduced K+ flow. The study also proposes a long-range allosteric coupling between RY785 binding in the central pore and its effects on voltage-sensing domain dynamics. Overall, this well-organized paper presents a high-quality study with robust simulation and analysis methods, offering novel insights into voltage-gated ion channel inhibition that could prove valuable for future drug design efforts.

      Weaknesses:

      (1) The study neglects to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, there is potential for allosteric binding sites in the voltage-sensing domain (VSD), as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019).

      As noted in the manuscript, we designed our simulations to explore the possibility that RY785 binds within the pore domain, because TEA and RY785 are competitive and TEA is known to bind within the pore. That RY785 did in fact spontaneously and reproducibly bind within the pore was however not a predetermined outcome; if the site of interaction for the inhibitor was elsewhere in the channel, the simulation would not have shown a stable associated state, which would have prompted us to examine other possible sites, including the voltage sensors. It was also not predetermined or foreseeable a priori that the mode of interaction we observed in simulation provides a straightforward rationale for the electrophysiological effects of RY785. Based on our results, therefore, we believe that RY785 binds within the pore of Kv2. As stated by the reviewer, other allosteric modulators are known to bind instead to the sensors; to our knowledge, however, there is no precedent of a small-molecule inhibitor that simultaneously acts on the sensors and the pore domain. We therefore believe that future studies should focus on corroborating or refuting the mechanism we propose, through additional experimental and computational work; if, contrary to our claim, RY785 is found not to bind to the pore domain, it would be logical to explore other possible sites of interaction, as the reviewer suggests. The Discussion has been modified to address this point.

      (2) The study describes RY785 as a selective inhibitor of Kv2 channels and characterizes its binding residues through MD simulations. However, it is not clear whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      To clarify this question, we have included a multiple sequence alignment as Supplementary Figure 1; the revised manuscript refers to this figure in the Discussion section. The alignment reveals that the cluster of residues forming contacts with RY785 (Val409, Pro406, Ile405, Ile401, and Val398) is indeed specific to Kv2.1. Among Kv channels, Kv3.1 and Kv4.1 exhibit the greatest similarity to Kv2.1 at these positions, but they differ in a crucial substitution: Ile405 in Kv2.1 is replaced by Val. This replacement shortens the sidechain, undoubtedly reducing the magnitude of the hydrophobic interaction between inhibitor and channel (Val is approximately 6 kcal/mol, i.e. 1,000 times, more hydrophilic than Ile). Kv5.1 differs from Kv2.1 at two positions: Pro406 is replaced by His, and Val409 by Ile. The introduction of His abolishes the hydrophobic interaction at that position, and the need for hydration likely perturbs all adjacent contacts with RY785. Lastly, Kv6-Kv10 and Cav channels feature entirely different residues at these positions. Consistent with these findings, a recent study by the Sack lab (https://elifesciences.org/articles/99410) has demonstrated that Kv5, Kv6, Kv8, and Kv9 pore subunits confer resistance to RY785, while a high-throughput electrophysiological study carried out by Merck (Herrington et al., 2011) reported that RY785 shows no significant activity against Cav channels. The sequence alignment offers a simple interpretation for these experimental observations, namely that RY785 is recognized by Kv2 channels through the abovementioned hydrophobic cluster within the pore domain.

      (3) The study does not clarify the details, rationale, and ramifications of a biasing potential to dihedral angles.

      We refer the reviewer to published work, for example Stix et al, 2023 and Tan et al, 2022. We provide additional comments below.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing, yet it was not revealed whether polar groups of RY785 always interact with K+ ions.

      We detected no persistent specific interactions between RY785 and the permeant K+ ions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript describes atomistic molecular dynamics (MD) simulations of a voltage-gated potassium channel Kv2.1 using its cryo-EM structure in the open activated state and its inhibition by a classical non-specific cationic blocker tetraethylammonium (TEA) as well as a novel selective inhibitor RY785. Using multi-microsecond-long all-atom MD runs under the applied membrane voltage of 100 mV the authors were able to confirm that the channel structure represents an open conducting state with the computed single-channel conductance lower than experimental values, but still in the same order of magnitude range. They also determined that both TEA and RY785 bind in the channel pore between the cytoplasmic hydrophobic gate and narrow selectivity filter (SF) region near the extracellular side. However, while TEA directly blocks a knock-on K+ conduction by physically obstructing ion access to the SF, the mechanism of action of RY785 is different. It does not directly prevent K+ access to the SF but rather binds to multiple residues in the hydrophobic gate region, which effectively narrows a pore and drives the channel toward a semi-closed nonconductive conformation, which might be distinct from one with the deactivated voltage sensors and closed pore observed at hyperpolarized membrane potentials. However, additional studies beyond the scope of this work might be needed to fully establish this mechanism as suggested by the authors.

      The manuscript is written very well and represents a significant advance in the field of ion channel research. I do not have any major issues, which need to be addressed. However, I have several suggestions.

      For the apo-channel K+ conduction MD simulation under the applied voltage, the authors seem to observe mostly a direct or Coulomb knock-on mechanism across the SF with almost no water copermeation. This is in line with computational electrophysiology studies with dual membrane setup by B. de Groot and others but in disagreement with multiple previous studies by B. Roux and others also using applied electric field and CHARMM force fields as in the present study. I wonder why the outcomes are so different. Is it related to the Kv2.1 channel itself, a relatively small applied electric field used (corresponding to a membrane potential of 100 mV vs. 500-750 mV used in many previous simulations), ion force field (e.g., LJ parameters), or some other factors? Could weak dihedral restraints on the protein backbone and side chains contribute to this mechanism? I also wonder if the authors might have considered different initial SF ion configurations. Related to that, I wonder if the authors observed any SF distortions in their simulations including frequently observed backbone carbonyl flipping and/or dilation/contraction.

      We are aware of these discrepancies between published simulation studies, but cannot offer a satisfactory explanation, beyond speculation. The reviewer is correct that the mechanism of ion permeation we observe is comparable to that reported by de Groot, as we noted in Tan et al, 2022 and Stix et al, 2023. Neither in this nor in those previous studies did we observe any persistent distortions of the selectivity filter – but that outcome was expected by construction. The weak biasing potentials acting on the mainchain dihedral angles allow for local fluctuations but not a persistent deformation, relative to the conductive form determined experimentally.

      For MD simulations with the ligand present, I wonder if the authors can comment on the effect of the ligand especially RY785 on the pore size or more importantly size of the hydrophobic gate. The presence of the ligand itself would definitely result in a narrower pore, but I also wonder if this would also lead to a rearrangement of pore sidechain and/or backbone residues, which would lead to a narrower pore from a protein itself thus confirming the proposed mechanism of driving the channel towards a semi-closed state. It is easy to compute but I wonder if the presence of weak dihedral restraints may preclude this analysis.

      Yes, while the simulation design used in this study allows for local fluctuations in the mainchain structure and nearly unrestricted sidechain dynamics, changes in either the secondary or tertiary structure of the channel are strongly disfavored. This approach is thus sufficient to examine ligand binding or ion flow in the microsecond timescale but not channel gating. In the revised version of the Discussion, we outline a roadmap for future computational studies of that gating process, on the basis of the open-channel structure we used and the recently determined structure of the closed state.

      The authors state that RY785 does not block K+ ion, but it does significantly slow the rate of K+ ion access to the pore Scav site. Is this not a part of the mechanism for inhibition of the channel? The authors seem to focus on the primary mechanism of inhibition as the RY785 promoting channel closing, but would it not also reduce K+ current in the open state by slowing the rate of K+ entry into the cavity and selectivity filter? The authors should address this point in the text. I am also somewhat confused that in the MD simulations performed by the authors, there is still some K+ conduction with RY785 in the pore, which is not in 100% agreement with electrophysiology experiments. Does it mean that the channel in the simulations has not yet reached that semiclosed state or a reduced K+ conduction is not observed experimentally?

      The salient experimental observation is RY785 abrogates K+ currents through Kv2 channels (Herrington et al, 2011; Marquis et al, 2022). In our view, that observation can be explained in one of two ways: either RY785 completely blocks the flow of K+ ions across the channel while the pore domain remains in the conductive, open state – like TEA does – or RY785 induces or facilitates the closing of the channel, thereby abrogating K+ flow. The fact that we observe K+ flow while RY785 is bound to the channel is therefore not in disagreement with the electrophysiological measurements, but it does rule out the first of those two possible interpretations of the existing experiments. As it happens, the second possible explanation, i.e. that RY785 facilitates the closing of the pore domain, also provides a rationale for another puzzling experimental observation, namely that RY785 shifts the voltage dependence of the currents produced by the voltage sensors as they reconfigure to open or close the intracellular gate.

      Also, I wonder if the authors considered that since there are 4 potential equivalent sites in the pore (although, overlapping) more than one RY785 might be needed to prevent K+ conduction, even though the experimental Hill coefficient of ~1 does not indicate cooperativity.

      Admittedly, our simulation design was based on the premise that only one RY785 molecule might be recognized within the pore. Based on the outcome of the simulations, we are confident that this assumption was valid, as the binding pose that we identified rules out multiple occupancy – which would be indeed consistent with a Hill coefficient of ~1.

      I also wonder if the authors considered estimating ligand binding affinities and/or "on" rates from their simulations to have a more direct comparison with experiments and test the accuracy of their models. There are multiple enhanced sampling techniques allowing to do that, although it can be a study on its own.

      We thank the reviewer for this suggestion, which we will consider for future studies.

      The authors also discussed that they could not study Kv2.1 deactivation in a reasonable simulation time. Indeed it is very challenging but they should cite previous studies e.g. 2012 Jensen et al paper (PMID: 22499946) on this subject. There are structures of Kv channels with the deactivated voltagesensing domains (VSDs) available, e..g of EAG1 channel (PDB 8EP1), although they do not have a domain-swapped architecture. There are structural modeling approaches including AlphaFold, which can be potentially used to get a Kv2.1 structure with deactivated VSDs, and targeted MD, string method etc. can be used to study transition between different states with and without bound ligands.

      As noted, a structure of a Kv2 channel with a closed pore has now been determined experimentally. In the revised Discussion, we comment on what this structure tells us about the mechanism of inhibition we propose, and how it could be leveraged in future studies.

      The authors should be commended for doing a thorough QM-based force field parameterization of RY785. However, a validation of the developed force field parameters is lacking. In terms of QM validation, a gas-phase dipole moment can be compared in terms of direction and magnitude (it's normal to be overestimated to implicitly reflect solvent-induced polarization). If there are any experimental data available for this compound, they can be tested as well.

      We agree with the reviewer that forcefield validation is important, but to our knowledge no experimental data exists for RY785 to compare with, such as hydration free energies. We did however compare the gas-phase dipole moment computed with QM and with the MM forcefield we developed based on atomic charges optimized to reproduce QM interactions with water. The MM model yields a gas-phase dipole moment of 3.94 D, which is 20% greater than the QM dipole moment, or 3.23 D. That deviation is within the typical range for electroneutral molecules (Vanommeslaeghe et al, 2010), and as the reviewer notes, reflects the solvent-induced polarization implicit in the derivation of atomic charges. As shown in Author response image 1, the orientation of the dipole moment calculated with MM (right, blue arrow) is also in good agreement with that predicted with QM (left)

      Author response image 1.

      (1) p. 3 "the last two helices in each subunit" -> "the last two transmembrane helices in each subunit".

      Thanks. Corrected.

      (2) p. 5 "and therefore do not cause large density variations e.g. 100-fold or greater.". I would be more specific here and indicate what are the actual variations in density or free energy encountered and how they are compared e.g. with thermal fluctuations (~kT).

      Thanks. The exact variations in K+ density had been included in the original manuscript, in Fig. 2C, but we failed to refer to this figure at this point in the description of the results. The ion density is plotted in a log scale to facilitate conversion to free-energy units. Corrected.

      (3) p. 6 Figure 1 caption "and along the perpendicular to the membrane" -> "perpendicular to the membrane normal"?. "The channel is an assembly of four distinct subunits (in colors);" -> "The channel is an assembly of four identical subunits (distinct by colors);". I would use the same protein coloring method in panels B and C as was used in panel A.

      Thanks. Corrected as needed.

      (4) p. 6 Figure 2 In panel B I would appreciate a representative complete ion permeation event trace. In panel C caption I would indicate corresponding sites "S0-S4, Scav" for each residue mentioned. I also would not use gray color for site names in the figure.

      We appreciate the suggestion, but believe the figure is clear as is. Panel B is meant to focused on the mechanism of knock-on. Panel A includes numerous complete permeation events. 

      (5) p. 7 Figure 3 caption. Please indicate which atoms of residues T373 and P406 were used to define SF and gate positions. Chemical structures of both TEA and RY785 would be useful. In panels C and F channel interacting residues (if any) would be helpful to show.

      The revised caption clarifies that the positions of T373 and P406 are represented by their carbonalpha atoms. A close-up view of the structures of TEA and RY785 is included in the Supplementary Information section.

      (6) p. 8. Figure 4 caption. Please indicate if N atoms ere used for density maps in panels B and C, and which value of the density was used to show meshes. In panel A please indicate what are the units of the density shown by color maps. 

      The caption has been revised to clarify these questions.

      (7) p. 9 "inside the protein" -> "inside the channel pore".

      Thanks. Corrected.

      (8) p. 10 "which lines the cavity" -> "which lines the water-filled cavity"

      We appreciate the suggestion but believe the wording is clear as is.

      (9) p.10 Fig. 5. It would be helpful to distinguish residues from different chains e.g. by different colors rather than using different colors for different residues. The S atom in RY785 is hard to recognize due to the yellow color used for C atoms. Figure 5B is very confusing. It is not clear what this plot represents. For instance, what does it mean that Pro405 has ~10 contacts in 20% of simulation snapshots? Does it mean 10 C..C/S interactions within 4.5 A? I am not sure what the value of this is. I think a bar or radar chart plot showing % of contacts with one, two, or more residues of each type would be more helpful. 

      Thanks. The revised caption ought to clarify how to interpret the plot.

      (10) p. 12 "Due to its 2-fold molecular symmetry". TEA has a tetrahedral point group or Td symmetry. It has several two-fold rotational axes though. 

      Thanks. Corrected.

      (11) p. 12 "it prevents K+ ions in the cytoplasmic space from destabilizing the K+ ions that reside in the selectivity filter" I am not sure if this statement is entirely accurate as there might be destabilization of a multi-ion SF configuration not ions per see.

      We believe this statement is clear as is.

      (12) p. 13 Fig. 7 caption "includes non-conductive or transiently inactivated states" - I am not sure what "transiently inactivated state" is as inactivation is a specific term used in ion channel research and it does not seem to be explicitly considered in this study.

      A reference has been included in the caption for readers interested in the process of inactivation.

      (13) p. 14 "the net charge of these constructs is thus zero". This would depend on the number of basic and acidic residues in the protein. 

      Yes, it does – and as a result the construct we model has a net zero charge.

      (14) p. 14 I wonder if the protein was constrained or heavily restrained during MARTINI membrane building and equilibration procedure. Otherwise, C-alpha mapping would be problematic and clashes with lipid membrane atoms might take place as well.

      It was indeed. When a protein is simulated using the MARTINI coarse-grained forcefield, its fold must be preserved through a network of strong ‘virtual’ bonds between adjacent carbon-alpha atoms. This is standard practice so we do not believe it requires further explanation.

      (15) p. 15 PME - please spell out and provide reference.

      Corrected.

      (16) p. 15 "with a smooth switching function" - is it a special or standard switching function? Also, was it used for energy or forces? 

      The switching function brings both forces and energies to a value of zero at the cut-off value, smoothly. We refer the reviewer to the NAMD manual for further details.

      (17) p. 15 '𝑘 = 1 𝑘B𝑇.' Please confirm that there is a factor of "1" there, which can be actually skipped if this is the case. 

      The value of k = 1 KBT is correct.

      (18) p. 15. Please cite PMID: 22001851 for the transmembrane electric field application technique.

      Corrected.

      (19) p. 15 "and CHARMM36m" -> "and CHARMM36m force field". 

      Corrected.

      (20) p. 16 "the four proteins subunits" -> "the four protein subunits". 

      Corrected.

      (21) p. 16. Please provide the reference for CGenFF. It's reference 49. 

      Corrected.

      Supporting Information (SI): CGenFF is misspelled in multiple figure captions in the SI. All potential energy scans indicate "angle", but some are bond angles while others are dihedral angles. Using subscripts for atom numbers is confusing and does not match the numbering scheme used in Fig. S1. So, please use the same style of numbering throughout, e.g. C46-C42-N43 (without subscripts). Please label the X and Y axes in Figsures S2-S19 and S21. In Figure S22 please perform a linear regression analysis and/or compute Pearson correlation coefficients and indicate trend lines. Table S1. It would be good to compute RMS or mean unsigned errors to get an idea about accuracy. Also, please indicate if reference QM values were scaled by 1.16 for energies or offset for distances. 

      The Supplementary Information has been corrected. We thank the reviewer for their detailed feedback. 

      Reviewer #3 (Recommendations for the authors):

      (1) The study needs to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Molecular docking and/or MD simulations could quickly test this hypothesis. If this hypothesis is not true, a comprehensive search can exclude such a possibility, which can also confirm the long-range allosteric coupling between RY785 binding in the central pore and voltage-sensing domain dynamics. 

      Please see our response above.

      (2) The authors describe RY785 as a selective inhibitor of Kv2 channels and characterize its binding residues through MD simulations. To support this claim, Figure 5 needs to include a multiple sequence alignment with other Kv channels. This would help demonstrate whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      Please see our response above.

      (3) The study applies a biasing potential to 𝜙, 𝜓, and 𝜒1 dihedral angles. Please clarify:

      (a) Is this potential solely to prevent selectivity filter collapse/degradation, as mentioned in a previous D. E. Shaw Research publication (Jensen et al., 2012)?

      Yes, that is correct.

      (b) If it applies to all amino acids, can this potential prevent other changes, such as in the voltagesensing domain?

      Yes, that is correct.

      (c) What specific "large-scale structural changes" does this potential preclude? 

      For example, it would preclude the spontaneous degradation of the secondary or tertiary structure of the protein. We have revised the Methods section to make these points clearer. 

      (d) Given that such biasing potentials on backbone dihedral angles can decrease conformational flexibility, and considering that Kv channel permeability/conductivity could be highly sensitive to filter flexibility, what insights can you provide about the impact of the force constant k on channel conductivity?

      In previous studies based on an identical methodology (Stix et al, 2023; Tan et al, 2022), we have observed good agreement between calculated and experimental conductance values – at least as good as can be hoped for, when all approximations are considered. Based on the data presented in those studies, we have no reason to believe our methodology inhibits the permeability of the channel, which is logical as the local structural fluctuations required for K+ flow across the selectivity filter are not impaired, by definition. To the contrary, the fact that these weak biasing potentials make the conductive form of the filter the most favorable state in simulation enable a clear-cut analysis of conductance under plausible simulation conditions, both in terms applied voltage and K+ concentration. We refer the reviewer to the abovementioned studies for further details and a discussion of this subject.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing. Given the compact nature of the central cavity when RY785 is bound, it would be valuable to investigate whether polar groups of RY785 (e.g., nitrogens from the amide, benzimidazole, and thiazole moieties) always interact with K+ ions. Characterizing these interactions could inform the design of similar compounds with differential modulation effects.

      We examined this possibility and detected no convincing interaction patterns between RY785 and K+ ions – logically, inhibitor and ions are in close proximity while residing concurrently within the pore, but we detected no evidence of specific interactions.

      Minor points:

      It is strongly recommended that the refined force field parameters for RY785 be shared as a separate supplementary file in CHARMM force field format. This addition would be valuable for the scientific community, allowing other researchers to use or compare these parameters in future studies.

      We agree entirely. Upon publication of the VOR for this article the forcefield parameters for RY785 will be made freely available for download at https://github.com/Faraldo-Gomez-Lab-atNIH/Download.

      The study uses a KCl concentration of 300 mM, which exceeds typical intracellular K+ levels. While this may be intentional to enhance K+ permeation probability, a brief justification for this choice should be included in the Methods section.

      Yes, what motivated this choice in this and in our previous studies of K+ channels was the expectation of a greater number of permeation events, for a given simulation length, and therefore greater confidence (i.e. statistical significance) in the observed ion conductance, or in the degree to which it might be inhibited by a blocker. It worth noting that 300 mM KCl, while atypical in the intracellular environment, is often used in electrophysiological studies. The Methods section has been amended to clarify this point.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tnseq was carried out to identify genes involved in spontaneous (nutrient rich) or starvationinduced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a significant persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      Amongst the 5 shortlisted candidates from the screen, only 2 showed marginal phenotypes which limits the impact of the screening approach.

      We appreciate the reviewer’s comments, but we note that 4 out of 5 genes displayed phenotypes concordant with findings of the Tn-Seq data, with katG and pafA, as well as MAB_1456c (during starvation only) and blaR (in rich media only) having decreased survival as shown in Figure 3A-D. We do agree that some of the phenotypes were more modest in a single-mutant context than in the pooled Tn-Seq screen. In addition, several mutants that had modest changes in survival also showed profound defects in resuming growth after removal of antibiotics, with the pafA mutants particularly impaired. (Figure 3 - figure supplement 1).

      While the role of KatG mediated detoxification of ROS and involvement of ROS in antibiotic killing was well demonstrated, the lack of replication of this phenotype in some of the clinical isolates limits the significance of these findings.

      While the role of katG varied among strains, the antibiotic-induced accumulation of ROS was seen in all three strains (Figure 6A). This suggests that in some strains other ROS-detoxification pathways are able to compensate for the loss of katG.

      (Figure 2—figure supplements 1–3)

      Figure 1—figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.

      Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      See detailed responses below.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are fieldsolidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      Text modified to add more figure references.

      (1) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      We appreciate the suggestion and have added a line to the abstract.

      (2) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      We have added mention of this in the discussion (lines 354-356).

      (3) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

      We respond to the specific details below.

      Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses.

      They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

      Comments on revisions: No more comments for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are strongly encouraged to check the references. There is some systematic error in the citations of references. Started to list but then they were too many.

      For example Ln 51, Ref #11 cited, should be #10. Ln 59, #18 is wrongly cited. Should be - Ln 104. Ref #27 wrongly cited.

      Ref #26 and #28 identical.

      Even in discussion section a lot of references are mis-cited.

      We very much appreciate the reviewer catching this issue with the import of our references and we have corrected this.

      Reviewer #2 (Recommendations for the authors):

      Below I have listed comments on specific issues that I hope are useful during revision.

      Line 21 population is singular

      Text modified

      Line 21 comma after antibiotic (subordinate clause) Line

      Text modified

      25 is how singular?

      Text modified

      Impression of abstract: the work seems to confirm and therefore generalize concepts derived from studies with E. coli. If the authors agree, such a statement would be appropriate as a final sentence. I would also look for novel features to stress in the abstract.

      Line 41 this challenge is vague

      Text modified

      Line 43 comma such as (also comma at the end of the parenthetical statement). This type of comma error is common throughout the manuscript and slows reading.

      Text modified

      Line 60 paradoxically. Is this the best concept? Or is it the natural effect of evolution (assuming that mycobacteria or their ancestors were exposed to environmental antibiotics)?

      It is certainly problematic for clearing infection.

      Text not modified.

      Line 63 highlighted uncertainties ... meaning is unclear especially since you may have changed what "model" is referring to.

      Text modified

      Line 66 models.... Do you really mean systems? Models of what?

      This refers to mechanistic models. Text not modified.

      Line 67 arrest cell division. This is written as if it were true. Does the evidence point specifically to cell division or perhaps more accurately suppression of metabolism (see Ye et al 2025 mBio).

      Both have been postulated as important. Text modified to add concept of metabolism

      ... targeted by antibiotics non-essential... Do you think that antibiotics work by inactivating essential targets? That seems overly simplistic, as lethal action is more likely the metabolic response to the damage caused. By the end of the paragraph you come around to this view, but you have already misdirected the reader. The reader is not sure what to believe. Line 70 note that there are many inhibitors of transcription and translation that only block growth, they do not rapidly kill cells

      There can be both direct, and indirect secondary killing mechanisms. We devote a significant portion of the Discussion section to this topic.

      Line 71 debate. There was indeed a debate, but reference 22 is not a valid citation for this. I think you mislead the reader by not accurately describing the debate. It was basically about the inability of Kim Lewis and James Imlay to reproduce the work of ref. 22. A great deal of prior work and then subsequent work showed that the challenge to ref. 22 lacked substance.

      (1) Text modified to fix an error in the citation number related to direct β-lactam-mediated lysis.

      (2) We agree that there is a great deal of data supporting antibiotic-induced ROS as important for bactericidal activity in many circumstances and do not argue otherwise. This sentence points out that over the years the paradigm for how antibiotics kill bacteria has evolved.

      Line 80. It seems you are starting a new topic here. What about beginning a new paragraph?

      The paragraph introduces mycobacteria of which Mabs is one. Text not modified.

      Line 85 delete the comma: it implies a compound sentence that is not delivered.

      Text modified.

      Line 109 screen singular

      Text modified.

      Line 156 these conditions is imprecise and vague

      Conditions were described in paragraph above in the manuscript. Text not modified.

      Fig 2 it would be helpful to more clearly define the meaning of the coordinates

      Text modified.

      Line 230 and throughout please indicate the location of the data being cited for rapid reader reference

      Text modified.

      Lines 315-323 You could use this paragraph as the first of the Discussion. Some readers prefer to read the Discussion before the results. For them, a summary at the beginning of the Discussion is useful.

      Text modified.

      Line 328 without underlying mechanism... for E. coli refer to Zeng PNAS 2022. Depending on when the final version of this paper happens, there should be a figure in a Zhao Zhu mLife paper on purA that will have been published. Since it is not yet available, it cannot be cited.

      We agree that the Zeng et al study is interesting and have added this reference to our discussion. However, these findings related to broad Crp-regulated tolerance actually underscore the point that we are making: that there are multiple factors (Crp, RelA, Lon, TisB, MazE, others) that mediate antibiotic tolerance.

      Line 339 where are the data?

      These data are in Figure 5, panels C, D. We have clarified the text to indicate that only a single agent from each of these classes was tested.

      Line 346 here you are summarizing evidence for ROS in killing mycobacteria. You should include the moxifloxacin study by Shee et al 2022 AAC.

      Reference added.

      Line 348 refer to James Collins' work with E. coli in which his lab examined agents with a variety of mechanisms. There seems to be a fundamental difference between E. coli and mycobacteria with respect to rifampicin, a strictly static agent in E. coli but clearly lethal in mycobacteria. Note that chloramphenicol is static in E. coli and blocks ROS production. What does it do in mycobacteria? A brief discussion of this difference might be relevant at line 362

      Text modified.

      Lines 364-368 Here the idea might be simply that there are two modes of killing, one that is a direct extension of class-specific damage (chromosome fragmentation with fluoroquinolones, for example, or cell lysis by beta-lactams) and a second that is a metabolic response to the antibiotic damage (ROS accumulation). The second type is not class specific. Within this context, the mycobacterial killing by rifampicin might be a class-specific extension of inhibition of transcription that does not occur in E. coli.

      Agreed, text modified to include this.

      Line 400 The Key Resource table is not of publication quality. Precision and repeatability can be improved by spelling out the name of the vendor and its location (City, Country). In the present case, use of BD is lab jargon.

      We appreciate the reviewer’s precision. However, this is actually not lab jargon. Becton, Dickinson and Company now refers to itself as BD (see https://www.bd.com/en-us), and the American Type Culture Collection now refers to itself as ATCC (see https://www.atcc.org/about-us/who-we-are).

      Line 639 It would be good to have experienced colleagues critically review the manuscript, especially for English usage. Listing those persons here adds to the credibility of the work

      Text not changed.

      References: please refer to the journal style. Here you use italic for titles and scientific names, thereby obscuring the scientific names. Normally article titles are not italic and scientific names are ALWAYS italic unless prohibited by journal style.

      Our reference format is concordant with eLife submission guidelines, and all references are reformatted by the journal at the time of final publication (see https://elifesciences.org/insideelife/a43f95ca/elife-references-yes-we-take-any-format-no-we-re-not-rekeying).

      Supplemental Material: Please refer to journal style. Normally this is a stand-alone document that includes a title page and carefully crafted figure legends. Supplemental figures would be numbered as 1, 2, ... A professional appearing Supplemental Material section shows author publication experience not obvious in other parts of the paper. The text indicated MIC determinations. I would like to see a table of MIC values.

      (1) MIC table added as Supplemental Table 5.

      (2) The Supplemental figures are submitted and named in accordance with eLife instructions. Please note that for eLife, there is not a stand-alone supplementary figure section with a title page as you are requesting, but instead the figure supplements for each figure are provided as online files linked to each figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      We appreciate the Reviewer’s consideration of the strengths of our study supporting the identification of adult endothelial to hematopoietic transition (EHT) in the mouse bone marrow.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

      Recognizing the importance of the weaknesses pointed by the Reviewer, we provide below our response to the thoughtful recommendations rendered.

      Reviewer #1 (Recommendations for the authors):

      The main model is to label cells using Cdh5 (VE-cadherin) CreERT2 genetic tracing. Cdh5 is a typical marker of endothelial cells. The data shows that, when treating adults with tamoxifen, the model labels PBMCs after ~10 days, and the labeling kinetics plateau by day 14... The authors reach the main conclusion: that adult ECs are making hematopoietic cells.

      We agree that the main tool used in this study is to label endothelial cells (ECs) using Cdh5 (VE-Cadherin) CreERT2 genetic tracing in mice. Indeed, Cdh5 is recognized as a good marker of ECs. As a minor point, we wish to clarify that the results from treating adult Cdh5-CreERT2 mice with tamoxifen (Figure 1F) show that the ZsGreen labeling kinetics plateau by day 28 (not by day 14).

      Important controls should be shown to rule out alternative possibilities: namely, that the CreERT2 reporter is being sparsely expressed in HSPCs. Many markers, specific as they may seem to be, can show expression in non-specific lineages - particularly in the cases of BAC and PAC transgenic models, in which the transgene can be present in multiple tandem copies and subject to genome location-specific effects. As the authors remind readers, the Cdh5 gene is partly transcribed (though at low levels) in HSPCs, and even more clearly expressed in specific subpopulations such as CLPs, DCs, pDCs, B cells, etc. Some options would be to: i) check if the Cdh5-CreERT2 transgene (not endogenous Cdh5, but the BAC/PAC transgene) is expressed in LSKs (at least by qPCR), ii) verify if any CreERT2 protein levels are present in LSKs (e.g., by western blot), and iii) check if tamoxifen is labeling any HSPCs freshly after induction (e.g., flow cytometry data of ZsGreen LSKs at 24-48h post tamoxifen injection).

      We fully agree with the Reviewer that many markers, allegedly specific to a certain cell type, can show expression in other cell lineages. We also agree that excluding sparse or ectopic CreERT2 expression in hematopoietic stem and progenitor cells (HSPCs) is essential for interpreting lineage-tracing results. As suggested by the Reviewer, we have now examined if the Cdh5-CreERT2 transgene is expressed in bone marrow LSKs. To this end, we analyzed the Polylox single-cell RNAseq dataset presented in this study, containing ZsGreen<sup>+</sup> ECs and enriched ZsGreen<sup>+</sup> LSKs. As shown in the revised Figure S4D, CreERT2 transcripts were detected exclusively in Cdh5-expressing endothelial populations and were absent from Ptprc/CD45-expressing hematopoietic cells, except for plasmacytoid dendritic cells (pDCs; Figure S4E). These results are consistent with the RNAseq data from adult mouse bone marrow[1] showing that the Cdh5 gene is not expressed in HSPCs, CLPs, DCs, or B cells. Rather, among hematopoietic CD45<sup>+</sup> cells, Cdh5 is only expressed in a small subset of plasmacytoid dendritic cells (pDCs), which are terminally differentiated cells. These published results are described in the text.

      To further support this conclusion, we provide additional single-cell RNAseq analyses from our unpublished dataset of LSKs isolated from Cdh5-CreERT2/ZsGreen mice and not enriched for ZsGreen expression. These new analyses were performed after integrating the single-cell data from ECs and ZsGreen<sup>+</sup> hematopoietic cells from the Polylox dataset (current study). As shown in Author response images 1 and 2, CreERT2 expression closely matches the expression patterns of Cdh5, Pecam1, and Emcn and is not detected in Ptprc/CD45-expressing hematopoietic cells.

      Author response image 1.

      Expression of CreERT2, Cdh5, Ptprc and ZsGreen in BM cell populations enriched with ECs and hematopoietic cells. The single-cell RNAseq results are derived from ZsGreen-enriched BM ECs and ZsGreen-enriched BM hematopoietic cells were derived from Polylox lineage-tracing experiments (data shown in Fig. 5; 37,667 ECs and 48,065 BM hematopoietic cells) and from LSKs (23,017 cells) independently isolated from tamoxifen-treated Cdh5-CreERT2/ZsGreen mice without ZsGreen enrichment (unpublished data).

      Author response image 2.

      Expression of CreERT2, Cdh5, Ptprc, Pecam1, Emcn, ZsGreen1, Col1a2, Cd19, Cd3e, Itgam (CD11b), Ly6a (Sca-1), Kit(cKit), Cd34, Cd48, Slamf1 (CD150), and Siglech in enriched BM ECs and LSKs from Cdh5-CreERT2/ZsGreen mice treated with tamoxifen 4 weeks prior to harvest (same cell source as indicated in Author response image 1).

      Additionally, we functionally tested whether hematopoietic progenitors could acquire ZsGreen labeling following tamoxifen administration using transplantation assays (Figure 4A-D). ZsGreen<sup>-</sup> LSKs (purity 99%), sorted from Cdh5-CreERT2/ZsGreen donors that had never been exposed to tamoxifen to exclude background Cre leakiness, were transplanted into lethally irradiated wild-type recipients. After stable hematopoietic reconstitution, recipients were treated with tamoxifen. If transplanted HSPCs or their progeny expressed CreERT2, tamoxifen administration would be expected to induce ZsGreen labeling. However, no ZsGreen<sup>+</sup> hematopoietic cells were detected in these recipients, demonstrating that hematopoietic progenitors from Cdh5-CreERT2/ZsGreen and their descendants do not undergo tamoxifen-induced recombination.

      Together, the single-cell transcriptional and transplantation data demonstrate that CreERT2 expression and tamoxifen-induced recombination are restricted to Cdh5-expressing ECs (except for pDCs). These findings support the conclusion that ZsGreen<sup>+</sup> hematopoietic cells arise from adult bone marrow ECs rather than from contaminating hematopoietic progenitors.

      One important missing experiment is to trace how ECs actually do this hematopoietic conversion: meaning, which populations of HSPCs are being produced by adult ECs in the first instance? LT-HSCs? ST-HSCs? MPPs? GMPs? All of the above? What are the kinetics? Differentiation is likely to follow a hierarchical path, but this is unclear at the moment.

      We agree that defining the earliest EC-derived hematopoietic cell progenitors and the kinetics by which these progenitors appear (LT-HSC vs ST-HSC/MPP vs lineage-restricted progenitors) would provide important insights into adult EHT.

      In the current genetic labeling system, a rigorous kinetic analysis of hematopoietic cells first generated by EC-derived in vivo is not straightforward. Specifically, the low-level baseline reporter ZsGreen<sup>+</sup> fluorescence in hematopoietic cells (dependent on EHT occurring prenatally, perinatally or in young mice or other causes (Figure 1 A-D and Figure S1 D-I) impairs identification of newly generated ZsGreen<sup>+</sup> progenitors at early time points and distinguish them from baseline fluorescence. A potential solution might be to introduce serial harvests across multiple time-points in large mouse cohorts to capture rare transitional events with statistical significance.

      We wish to emphasize that the primary objective of this study was to establish whether adult bone marrow ECs have a hemogenic potential. Our data demonstrate adult EC-derived hematopoietic cell output that includes progenitor-containing fractions and multilineage mature progeny, under both steady-state conditions. We acknowledge that the current work does not resolve the order and kinetics of hematopoietic cell emergence following EHT. Therefore, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      One warning sign is how rare the reported phenomenon is. Even when labeling almost 90% of the BM ECs, these make at most ~3% of blood (less than 1% in the transplants in Figure 4F, less than 0.5% in the col1a2 tracing in Figure 7). This means this is a very rare and/or transient phenomenon... The most major warning sign is the fast kinetics of labeling and the fast plateau. We know that: a) differentiation typically follows some hierarchy, b) in situ dynamics of blood production are slow (work by Rodewald and Höfer). Considering how fast these populations need to be replaced to reach a steady state so rapidly (as reported here, 2-4 weeks), the presumably specialized ECs would need to be steadily dividing and producing hematopoietic cells at a fast pace (as a side prediction, the adult "EHT" cluster would likely be highly Mki67+). More importantly, the ZsGreen LSKs produced by the ECs would have to undergo VERY rapid differentiation (much faster than normal LSKs) or otherwise, if 3% of them are produced by a top compartment (the BM ECs) every 4 weeks, then the labeled population would continue to grow with time. The authors could try to challenge this by testing if the ZsGreen LSKs undergo much faster differentiation kinetics or lower self-renewal (which does not seem to be the case, at least in their own transplantation data). We believe a more likely explanation is that the label is being acquired more or less non-specifically, directly across a bunch of HSPC populations.

      The Reviewer correctly notes that that the population of hemogenic ECs in the adult mouse bone marrow is small and the output of hematopoietic cells from these hemogenic ECs accounts for at most 3% of blood cells. We agree that delineating the kinetics by which hematopoietic cells are generated from adult EC is important, as this information would provide important insights into adult EHT.

      Nonetheless, we believe that the rapid appearance and early plateau of labeled blood cells in our experiments may not derive from a sustained, high-rate generation of labeled blood cells from self-renewing top-tier hematopoietic cell compartments, such as LT-HSCs. Rather, our data are more consistent with a predominantly lineage-restricted and biased hematopoietic progenitor cell population being the source of labeled blood cells. Supporting this interpretation, longitudinal analysis of peripheral blood shows that EGFP<sup>+</sup> PBMCs are consistently enriched with myeloid cells, whereas EGFP<sup>-</sup> PBMCs are predominantly B cells (Figure 4G and H). This myeloid lineage skewing is stable over time and contrasts with what would be expected if labeling were acquired broadly and nonspecifically across the hematopoietic hierarchy. Therefore, our results are more consistent with myeloid biased progenitors being among the first populations that EHT generates.

      We acknowledge that our studies do not identify the earliest endothelial-derived hematopoietic cells produced in vivo, and do not define their differentiation kinetics. Addressing rigorously these questions would require temporally resolved lineage tracing with sufficiently powered cohorts at early time point to statistically distinguish from baseline reporter background. These important experiments were beyond the scope of the present study. As noted above, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      Transplant experiments in Figure 4 do offer a crucial experiment in support of the main conclusion of the manuscript. These experiments show that transplanted LSKs bearing the Cdh5-CreERT2 and ZsGreen reporter cannot acquire the tamoxifen-induced label post-transplantation - suggesting that the label is coming from ECs. However, it is also possible that the LSK Cdh5-CreERT expression is partly during the transplantation process... Indeed, we know through the aging data that the labeling is less active in aged mice. In any case, this would be verified by qPCR/western-blot (comparing native vs post-transplant LSKs).

      We agree with the Reviewer that the experiment in Figure 4A-D “offer a crucial experiment in support of the main conclusion of the manuscript.” The results of this experiment show that ZsGreen negative LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not acquire tamoxifen-induced ZsGreen fluorescence post transplantation, supporting the endothelial cell origin of blood ZsGreen<sup>+ </sup>cells.

      The Reviewer raises the possibility a “that the LSK Cdh5-CreERT expression is partly during the transplantation process... , and that this Cdh5-CreERT expression may occur slowly as learned “through the aging data that the labeling is less active in aged mice.” As we show in Figure 3F, tamoxifen administration induced a similar percentage of ZsGreen<sup>+ </sup>ECs in the bone marrow of Cdh5-Cre<sup>ERT2</sup>(BAC)/ZsGreen mice, whether tamoxifen was administered to 6-week-old, 16-week-old, 26-week-old or 36-week-old mice. Similar results with Cdh5-CreERT2 (BAC) mice are reported in the literature[2]. Since the mice transplanted with ZsGreen<sup>-</sup> LSKs were followed for 25 weeks after tamoxifen administration, we believe that the results in Figure 4A-D address the concern raised by the Reviewer.

      Supporting the conclusion that LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not express the Cdh5-CreERT2 under a native -non-transplant- setting, we now provide transcriptomic data from Cdh5-CreERT2/ZsGreen mice (not transplanted) showing that CreERT2 expression closely tracks with expression of canonical endothelial markers (Cdh5, Pecam1, Emcn) and is not detectable in Ptprc/CD45-expressing hematopoietic cells (Author response images 1 and 2). These data were obtained from non-transplanted mice treated with tamoxifen at ~12 weeks of age and analyzed four weeks later. Together, these results indicate that CreERT2 expression is endothelial-restricted in Cdh5-CreERT2-ZsGreen reporter mice.

      Figure 5 presents PolyLox experiments to challenge whether adult ECs produce hematopoietic cells through in situ barcoding. Several important details of the experiment are missing in the main text (how many cells were labeled, at which time point, how long after induction were the cells sampled, how many bones/BM-cells were used for the sample preparation, what was the sampling rate per population after sorting, how many total barcodes were detected per population, how many were discarded/kept, what was the clone-size/abundance per compartment). As presented, the authors imply that 31 out of ~200 EC barcodes are shared with hematopoietic cells... This would suggest that ~15% of endothelial cells are producing hematopoietic cells at steady state. This does not align well with the rarity of the behavior and the steady state kinetics (unless any BM EC could stochastically produce hematopoietic cells every couple of weeks, or if the clonality of the BM EC compartment would be drastically reduced during the pulse-chase overlap with mesenchymal cells. Important controls are missing, such as what would be the overlap with a population that is known to be phylogenetically unrelated (e.g., how many of these barcodes would be found by random chance at this same Pgen cut-off in a second induced mouse). Also, the Pgen value could be plotted directly to see whether the clones with more overlapping populations/cells (3HG, 127, 125, CBA) also have a higher Pgen. We posit that there are large numbers of hematopoietic clones that contribute to adult hematopoiesis (anywhere from 2,000-20,000 clones would be producing granulocytes after 16 weeks post chase), and it would be easy to find clones that overlap with granulocytes (the most abundant and easily sampled population) - HSPCs would be the more stringent metric.

      We thank the Reviewer for highlighting the need for a more detailed description of the Polylox experiments. To address this deficiency, we have compiled a document (Additional Supplementary Information file) containing all the specifics of the Polylox experimental and analytical parameters in one location. This includes: (i) the number of cells analyzed per population, (ii) the time points of induction and sample collection, (iii) the number of bones and total bone marrow cells used for preparation, (iv) the sampling rate following cell sorting, (v) the total number of detected barcodes per population, (vi) barcode filtering criteria and numbers retained or discarded, and (vii) clone-size and barcode number across cell compartments. We have updated the manuscript to refer readers to this Supplementary file.

      The Reviewer concluded from our results (Figure 5, Figure S5) that 31 out of ~200 endothelial cell (EC) barcodes shared with hematopoietic cells (HCs), implying that ~15% of ECs produce hematopoietic cell progeny at steady state. This interpretation in inconsistent with our data showing the rare nature of adult EHT and would require either that a large fraction of bone-marrow ECs can generate hematopoietic cells within short time windows, or that EC would clonally expand rapidly during the pulse-chase period, as noted by the Reviewer. The explanation for this apparent problem is technical. Briefly, the ~200 EC barcodes recovered do not represent all barcoded ECs. During Polylox barcode library construction, a mandatory size-selection step is applied prior to PacBio sequencing, retaining fragments that are approximately 800–1500 bp in length, whereas the full Polylox cassette spans ~2800 bp. This is mainly because the PacBio sequencer requires that the library be either 800-1500bp or over 2500bp, for optimal sequencing results. As described in the original Polylox publication[3,4], this size selection eliminates most (approximately 75%) longer barcodes, together with ~85% of the shorter barcodes. Thus, ECs harboring very long or short recombined barcodes are under-represented or excluded from sequencing. As a result, the 22 true barcodes linking ECs and HCs recovered from sequencing do not indicate that ~10–15% of ECs generate hematopoietic progeny. Rather, these barcodes represent a highly selected subset of ECs with barcode configurations compatible with library recovery and sequencing. The observed EC–HC barcode sharing thus reflects qualitative lineage connectivity, not the quantitative frequency of endothelial-derived hematopoiesis at steady state.

      The Reviewer correctly notes that true Polylox barcodes are shared by ECs and mesenchymal-type cells and asks that we examine whether this overlap could occur by chance alone. The Polylox filtering threshold (pGen < 1 × 10<sup>-6</sup>), that we have revised for stringency (from pGen < 1 × 10<sup>-4</sup>, without altering the essential results; new Figure S4 and revised Figure 5C-F) renders such overlap exceedingly unlikely. At this threshold, the expected number of random recombination events among 4,069 barcoded cells is approximately 0.004. Consequently, among the 87 mesenchymal cells identified here, fewer than 0.4 cells would be expected, to share a barcode with another cell by chance alone. Thus, the probability of recovering identical barcodes across unrelated lineages due to random recombination is vanishingly small, and the observed EC–mesenchymal barcode sharing substantially exceeds random expectation.

      Related to this observation, the Reviewer correctly notes that the endothelial and mesenchymal cell lineages are phylogenetically unrelated. However, endothelial-to-mesenchymal cell transition (EndMT), the process by which normal ECs completely or partially lose their endothelial identity and acquire expression of mesenchymal markers, is a well-established process that occurs physiologically and in disease states (Simons M Curr Opin Physiol 2023). In the bone marrow, the occurrence of EndMT has been documented in patients with myelofibrosis, and the process affects the bone marrow microvasculature (Erba BG et al The Amer J Patholl 2017). Single-cell RNAseq of non-hematopoietic bone marrow cells has shown the existence of a rare population of ECs that co-expresses endothelial cell markers (Cdh5, Kdr, Emcm and others) and the mesenchymal cell markers, as shown in Figure 6E and F.

      We fully agree with the Reviewer that given the large number of hematopoietic clones contributing to adult hematopoiesis -particularly granulocyte-producing clones- it may be relatively easy to detect barcode overlap with abundant mature populations, whereas overlap with HSPCs would represent a more stringent and informative metric of lineage relationships. The Polylox results presented here show the sharing of true barcodes between individual ECs and HSPC.

      Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      We thank the Reviewer for the supportive comments about our study.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      We agree that FACS sorting can never achieve 100% cell purity and that sorting purity is critical for interpreting the ex vivo culture experiments presented in our study. As requested by the Reviewer, we have now documented the purity of the sorted endothelial cell (EC) population used in the ex vivo culture experiments. The post-sort purity of CD45<sup->/sup>VE-cadherin<sup>+</sup>ZsGreen<sup>+</sup> ECs was 96.5 %; this data is now shown in the revised Figure 2B (Post Sort Purity panel). This purity level is comparable to purity levels of sorted ECs shown in Figure S2I (94.5 %).

      While we agree that a detailed time-course analysis of hematopoietic cell output from EC cultures could further strengthen the conclusion that bone marrow ECs can produce hematopoietic cells ex vivo, we wish to call attention to the additional critical control in the experiment shown in Figure 2B-D. In this experiment, we co-cultured CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells from Cdh5-CreERT2/ZsGreen mice, rather than ECs, and examined if these hematopoietic cells could produce ZsGreen<sup>+</sup> cell progeny after 8-week culture under the same conditions used in EC co-cultures (conditions not designed to support hematopoietic cells long-term). Unlike ECs, the CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells did not generate ZsGreen<sup>+</sup> hematopoietic cells at the end of the 8-week culture, indicating that the culture conditions are not permissive for the maintenance, proliferation and differentiation of hematopoietic cells. This provides strong evidence that even if few hematopoietic cells contaminated the sorted ECs, these hematopoietic cells would not contribute to EC-derived production of hematopoietic cells at the 8-week time-point. We have revised the text of the results describing the results of Figure 2B-D.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      The original manuscript reported survival and engraftment up to 12 weeks post transplantation. The recipient mice have now been monitored for up to 10 months post transplantation. These extended survival and engraftment data are now included in the revised Figure 2I and J replacing the previous 10-week analyses.

      We agree with the Reviewer that the time-course kinetics of donor cell repopulation would help define adult endothelial to hematopoietic transition (EHT) and the hematopoietic cell types produced by adult (EHT). We did not perform serial time-course sampling of peripheral blood beyond the 10-week and the 10-month time-points. Given that the recipient mice were lethally irradiated with increased susceptibility to infection, we sought to minimize repeated interventions that could compromise animal health and survival. We therefore prioritized long-term survival and endpoint analysis over repeated longitudinal sampling. Nonetheless, the long-term survival,10 months, and multilineage hematopoietic cell reconstitution after lethal irradiation provides functional evidence that adult EHT produced at least some LT-HSC.

      We acknowledge that phenotypic assessment of bone marrow LT-HSC chimerism /or secondary transplantation would further strengthen the manuscript. We have clarified these limitations in the revised manuscript under “Limitations of the study”.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      We agree with the Reviewer that, in most cases, a larger number of experimental data points is helpful to strengthen the conclusions, and that having additional mice transplanted with ZsGreen-enriched LSK would be desirable. However, we do not believe that additional mice transplanted with ZsGreen LSKs would strengthen the conclusions drawn from the experimental results shown in Figure 4D, in which we used 6 mice transplanted with ZsGreen-depleted (ZsGreen<sup>-</sup>) LSKs and 2 mice transplanted with ZsGreen<sup>+</sup>-enriched (ZsGreen<sup>+</sup>) LSKs. The independence of adult EHT from “pre-existing hematopoietic cell progenitors” is based on the following experimental results and conclusion from these results.

      First, ZsGreen<sup>-</sup> LSKs (purity 99%) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 6). These ZsGreen<sup>-</sup> LSKs robustly reconstituted hematopoiesis, demonstrating successful engraftment. Importantly, tamoxifen administration to the recipients of ZsGreen<sup>-</sup> LSKs produced no detectable ZsGreen<sup>+</sup> cells in the blood for up to 6 months post transplantation (Figure 4D, blue line encompassing the results of the 6 mice). This result demonstrates that the transplanted ZsGreen<sup>-</sup> hematopoietic progenitors and their progeny do not acquire ZsGreen labeling in vivo following tamoxifen treatment, indicating that they lack the Cre-recombinase. This result is consistent with the endothelial specificity of Cdh5 expression.

      Second, ZsGreen<sup>+</sup> LSKs (accounting for ~50% of the LSKs) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 2). This arm of the experiment was performed in part as a technical control to confirm successful engraftment and detection of ZsGreen<sup>+</sup> hematopoietic cells in the transplant setting. Importantly, tamoxifen administration to the two recipients of ZsGreen<sup>+</sup> LSKs (Figure 4D, two green lines reflecting these two mice) show that the level of ZsGreen<sup>+</sup> blood cells stabilized in each of the mice between week 10 and 24, showing equilibrium between the proportion of ZsGreen<sup>+</sup> and ZsGreen<sup>-</sup>cells in the blood. This indicates that pre-existing ZsGreen<sup>+</sup> LSK are not responsible for tamoxifen-induced increases in ZsGreen<sup>+</sup> hematopoietic cell in blood.

      Together, the results from this experiment demonstrate that in the setting of transplantation, tamoxifen does not induce ZsGreen labeling of ZsGreen- hematopoietic progenitors/their progeny. This result strongly supports the conclusion that ZsGreen⁺ hematopoietic cells arise independently of pre-existing or inducible hematopoietic progenitors. We have revised the text to clarify these experiments and to present the results in a simplified manner.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

      We agree that the biological significance and functional roles of hematopoietic cells generated de novo from adult bone marrow ECs remain important open questions. We also agree that the output of hematopoietic cells from adult EHT is low, but rare events can be important, particularly as they pertain to stem/progenitor cell biology. Both points are described under “Limitations of the study”. The primary goal of the present study was to address the question whether adult bone marrow ECs can undergo EHT. We believe that the combination of various mouse transgenic lines, different Cre-ER, different reporters (ZsGreen and mTmG), including the s.c. barcoding reporter (PolyloxExpress), different approaches to evaluate hematopoiesis in vivo and ex vivo, makes it rather unlikely that our conclusions are driven by an artifact related to a specific leaky reporter, contamination, or problems with one of the Cre-lines. The experiment where we find no tamoxifen-induced labeling of transplanted ZsGreen<sup>-</sup> LSKs derived from the Cdh5-CreERT2/ZsGreen mice is strongly supportive of the existence of adult EHT, virtually excluding a contribution of contaminant hematopoietic cells.

      Reviewer 2 Recommendations for the authors:

      (1) There is a discrepancy in the proportion of peripheral blood composition between different reporters (mTmG and ZsGreen) (Figure 1G and Figure S1K), especially the contrasting B cell proportion between both models. The additional comments on this data should be mentioned.

      In the revised Results section, we now note that the mTmG and ZsGreen reporters show slightly different efficiencies or kinetics of labeling. These differences have previously been reported[5] and have been attributed to relative reporter leakiness, sensitivity to tamoxifen, or different kinetics of Cre recombination. As suggested, these comments have been added to the text following the description of (Figure S2A).

      (2) Experimental methods concerning cell transplantation/transfer need more information, such as: a) using or not using rescue cells and how many cells are they if using, b) single or split dose of irradiation, c) when were cells transplanted following irradiation, etc. Otherwise, the data are uninterpretable.

      We have ensured that the Material and Methods section under “Bone marrow ablation and transplantation” contains all the information requested by the Reviewer.

      (3) Some of the grouped data haven't been statistically analyzed.

      We have reviewed all data and performed appropriate statistical analyses where comparisons were made. In the revised figures and legends, all grouped datasets now include statistical tests and p-values are indicated (added to Fig. 3H and I; Figure 4G).

      (4) Some flowcytometry plot has the quantitative number, others do not. The quantitative information is absolutely needed in all flow cytometry plots.

      We have updated the flow cytometry figures to include quantitative values (percentages or absolute counts) in all relevant plots (2B (new figure, bottom left); 2C; S1G, S1H).

      (5) It is more relevant to present the Emcn/VE-Cadherin plot from gated CD45+/ZsGreen+, not the CD45-/ZsGreen+ fraction (Figure 2C), as the latter were not the EHT-derived offspring, but rather the common phenotypic endothelial cells

      As requested, we have added the suggested flow cytometry plot. The revised Figure 2C now includes an Emcn vs. VE-Cadherin plot from the gated CD45<sup>+</sup>ZsGreen<sup>+</sup> population. This complements the existing panel and confirms that the cells of interest retain endothelial cell markers after culture, while the CD45<sup>+</sup>ZsGreen<sup>+</sup> cells did not express endothelial markers. The figure legend has been updated to explain the new panel. We agree that this plot more directly highlights the phenotype of the presumed EHT-derived cells.

      (6) To show the effect of the ex vivo culture, the authors should present the absolute number of CD45+ZsGreen+ cells in the pre-/post-culture; otherwise, the data are uninterpretable (Figure 2D).

      Our interpretation of the Reviewer’s comment above (relative to the experiment shown in Figure 2B-D) is that the Reviewer would like that we provide the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells introduced into the co-culture (supplemented with unsorted BM cells, ZsGreen<sup>+</sup> hematopoietic cell or ZsGreen<sup>+</sup> ECs) and the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. Currently, the results in Figure 2D show the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. The input of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells for unsorted BM cells was 2.93e6 on average; for ZsGreen<sup>+</sup> hematopoietic cells was 1.68e6 on average and from sorted ZsGreen<sup>+</sup> ECs was estimate up to 100.

      (7) It is confusing to see Figures 2F and 2G, which apparently show the data from the middle of the experimental procedure (Figure 2E). Those data should be labelled clearly regarding which procedures of the whole experiment protocol.

      As correctly noted by the Reviewer, Figures 2F and 2G provide data that relate to the middle of the graphical representation of the experiment shown in Figure 2E. We see how this may be confusing.

      Therefore, we have updated both the figure labeling and legend to explicitly indicate that Figure 2F and 2G provide the FACS sorting results for the cells used for transplantation. The revised legend now reads: “Representative flow cytometry plots of the non-adherent cell fraction after 8 weeks of co-culture (cells used for transplantation).”

      References

      (1) Kucinski, I., Campos, J., Barile, M., Severi, F., Bohin, N., Moreira, P.N., Allen, L., Lawson, H., Haltalli, M.L.R., Kinston, S.J., et al. (2024). A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 31, 244-259.e10. https://doi.org/10.1016/j.stem.2023.12.001.

      (2) Identification of a clonally expanding haematopoietic compartment in bone marrow | The EMBO Journal | Springer Nature Link https://link.springer.com/article/10.1038/emboj.2012.308.

      (3) Pei, W., Shang, F., Wang, X., Fanti, A.-K., Greco, A., Busch, K., Klapproth, K., Zhang, Q., Quedenau, C., Sauer, S., et al. (2020). Resolving Fates and Single-Cell Transcriptomes of Hematopoietic Stem Cell Clones by PolyloxExpress Barcoding. Cell Stem Cell 27, 383-395.e8. https://doi.org/10.1016/j.stem.2020.07.018.

      (4) Pei, W., Feyerabend, T.B., Rössler, J., Wang, X., Postrach, D., Busch, K., Rode, I., Klapproth, K., Dietlein, N., Quedenau, C., et al. (2017). Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460. https://doi.org/10.1038/nature23653.

      (5) Álvarez-Aznar, A., Martínez-Corral, I., Daubel, N., Betsholtz, C., Mäkinen, T., and Gaengel, K. (2020). Tamoxifen-independent recombination of reporter genes limits lineage tracing and mosaic analysis using CreERT2 lines. Transgenic Res 29, 53–68. https://doi.org/10.1007/s11248-019-00177-8.

    1. 13.6. Design Analysis: Mental Health# We want to provide you, the reader, a chance to explore mental health more. We want you to be considering potential benefits and harms to the mental health of different people (benefits like reducing stress, feeling part of a community, finding purpose, etc. and harms like unnecessary anxiety or depression, opportunities and encouragement of self-bullying, etc.). As you do this you might consider personality differences (such as introverts and extroverts), and neurodiversity, the ways people’s brains work and process information differently (e.g., ADHD, Autism, Dyslexia, Face blindness, depression, anxiety). But be careful generalizing about different neurotypes (such as Autism), especially if you don’t know them well. Instead try to focus on specific traits (that may or may not be part of a specific group) and the impacts on them (e.g., someone easily distracted by motion might…., or someone sensitive to loud sounds might…, or someone already feeling anxious might…). We will be doing a modified version of the five-step CIDER method (Critique, Imagine, Design, Expand, Repeat). While the CIDER method normally assumes that making a tool accessible to more people is morally good, if that tool is potentially harmful to people (e.g., give people unnecessary anxiety), then making the tool accessible to more people might be morally bad. So instead of just looking at the assumptions made about people and groups using a social media site, we will be also looking at potential harms to different people and groups using a social media site. So open a social media site on your device. Then do the following (preferably on paper or in a blank computer document):

      This section asks readers to think about how social media affects mental health in both good and bad ways. It suggests considering different types of people, like introverts, extroverts, and people with ADHD or anxiety, instead of making broad generalizations. The goal is to look at specific traits and how social media might help or harm people with those traits, especially when using the CIDER method to evaluate design choices.

    1. You aren’t likely to end up in a situation as dramatic as this. If you find yourself making a stand for ethical tech work, it would probably look more like arguing about what restrictions to put on a name field (e.g., minimum length), prioritizing accessibility, or arguing that a small piece of data about users is not really needed and shouldn’t be tracked. But regardless, if you end up in a position to have an influence in tech, we want you to be able to think through the ethical implications of what you are asked to do and how you choose to respond.

      Although in this case the engineer was able to successfully stand up against the unethical aspects of what they were doing, I think in other circumstances it may not be so easy. Engineers who don't comply could simply be fired, or they could find other workarounds if everyone isn't on the same page as they were with this case.

    1. Reviewer #1 (Public review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (CRY) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of CRY action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that CRY acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this CRY action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      While the authors have made improvements in this resubmitted manuscript, there are still multiple concerns about the paper. I think the authors provide enough evidence suggesting that CRY plays a role in sleep under short photoperiod. The data also supports that CRY acts in GABAergic neurons. However, there are still major issues with the quality of the confocal images presented throughout the paper. In many cases it appears that the images are oversaturated with poor resolution, making it hard to understand what is going on. In addition, none of the drivers used in this study are specific to the neurons the authors aim to manipulate. Therefore, the identity of the GABAergic neurons involved in this CRY dependent sleep mechanism remains unclear. Similarly, whether l-LNvs are the target of this GABA mediated sleep regulation under short photoperiod is not fully demonstrated. The data presented suggests that but does not prove it.

      Major concerns:

      (1) While the authors provided sleep parameters like consolidation or waking activity for some experiments. These measurements are still not shown for several experiments (for example Figures 2E, 3, 4, 5, and 6). These data are essential, these metrics must be reported for all sleep experiments.

      (2) Line 144 "We fed flies with agonists of GABA-A (THIP) and GABA-B receptor (SKF-97541) (Ki and Lim, 2019; Matsuda et al., 1996; Mezler et al., 2001). Both drugs enhance sleep in WT," The proper citation is needed here, Dissel et al., 2015 PMID:25913403. Both THIP and SKF-97541 were used in that paper.

      (3) Figure 2C and 2F: it appears that the control data is the same in both panels. That is not acceptable.

      (4) Figure 4A: With the quality of the images, it is impossible to assess whether GABA levels are increased at the l-LNvs soma.

      (5) Fig 4 S1A shows colabeling of l-LNvs and Gad1-Gal4 expressing neurons. They are almost 100% overlapping signals. This would indicate that the l-LNvs are GABAergic themselves, or that there is a problem with this experiment.

      (6) Fig 4 S1B: Again, I can see colabelling of the GFP and PDF staining, suggesting that Gad1-Gal4 expresses in l-LNvs.

      (7) Line 184: "Consistently, knocking down Rdl in the l-LNvs rescues the long sleep phenotype of cry mutants (Figure 4-figure supplement 1D)." This statement is incorrect as the driver used for this experiment, 78G01-GAL4 is not specific to the l-LNvs, so it is possible that the phenotypes observed are not coming from these neurons.

      (8) Figure 4G-K: None of these manipulations are specific to the l-LNvs. The authors describe 10H10-GAL4 and 78G01-GAL4 as l-LNvs specific tools, but this is not the case. Why not use the SS00681 Split-GAL4 line described in Liang et al., 2017 PMID: 28552314? It is possible that some of the effects reported in this manuscript are not caused by manipulating the l-LNvs.

      (9) Similarly for the manipulation of s-LNvs, the authors cannot rule out effect that are coming from other cells as R6-GAL4 is not specific to s-LNvs.

      (10) The staining presented in Fig 5 S1 is not very convincing. Difficult to see whether Gad1-GAL4 only expresses in the s-LNvs.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We appreciate the authors' efforts in addressing the concerns raised, particularly including a variance partitioning approach to analyse their data. Detailed feedback on the revised manuscript are below and we include a brief list of comments that we think the authors could address in the text: 

      (1) Justify metric selection - Could you please include in the text and explanation for why only five behavioural metrics were highlighted out of the many you calculated?

      We have added explanations throughout the manuscript clarifying the rationale for selecting these behavioral parameters, including in lines 467ff. and 531ff. In short, the five highlighted metrics were chosen because they capture key aspects of the behavioral repertoire and, importantly, can be consistently measured across all experimental conditions. Other parameters were excluded as they were only applicable under specific contexts and thus not suitable for cross-condition comparisons.

      (2) Discuss ICC variation - We note that there is variation among the ICC scores for the different metrics you've studied. While this is expected, we ask that you acknowledge in the text that some traits show high repeatability and others low, and reflect this variation in the conclusions.

      We have added an additional paragraph in the Discussion (lines 743ff.) addressing the variation in ICC values among behavioral traits. This new section highlights that some metrics show high repeatability while others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions about individual behavioral stability across contexts.

      (3) Tone down general claims - Because of the above point, we recommend that you avoid overstating that individuality persists across all behaviours. Please clarify this in the Abstract and main text that it applies to some traits more than others.

      We carefully reviewed the entire manuscript and revised the phrasing wherever necessary to avoid overgeneralization. Statements about individuality have been adjusted to clarify that consistent individuality can be measured in some behavioral traits more strongly than to others, both in the Abstract and throughout the main text.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features betweendifferent contexts. 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23°C and 32°C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32°C variance is predictable by the 23°C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to ingroup ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or openhardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the betweenfly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible. 

      We thank the reviewer for the careful and thoughtful assessment of our work.

      We have added an additional paragraph in the Discussion (lines 743ff.) explicitly addressing the variation in ICC values among behavioral traits. This section emphasizes that while some metrics show high repeatability, others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions regarding individual behavioral stability across contexts.

      Regarding the reviewer’s concern about the analytical approach, we would like to clarify that the hierarchical linear mixed model (LMM) was applied in a univariate framework—each behavioral metric was analyzed separately to estimate its individual ICC value. This approach allows us to quantify repeatability for each trait across environmental contexts while accounting for individual identity as a random effect. Although this is not a multivariate model in the strict sense, it represents an improvement over the prior pairwise correlation approach because it explicitly partitions within- and between-individual variance.

      As for the selection of behavioral metrics, the five parameters highlighted (% time walked, walking speed, vector strength, angular velocity, and centrophobicity) were chosen because they represent key, biologically interpretable dimensions of locomotor and spatial behavior and, importantly, could be measured reliably across all tested conditions. Several other parameters that we routinely analyze (e.g., Linneweber et al., 2020) could not be calculated in all contexts—for instance, under darkness or when visual cues were absent—and therefore were excluded to maintain consistency across assays.

      We agree that a truly holistic multivariate comparison across all extracted parameters would be valuable; however, given the contextual limitations of some metrics, such an analysis was not feasible in the present framework. We have clarified these points in the revised manuscript to avoid potential misunderstandings.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      We thank the reviewer for this helpful comment and agree that not all behavioral traits exhibit the same degree of inter-context consistency. We have clarified this point in the revised Abstract and ensured that it is also reflected in the main text. The Abstract now reads: 

      “We find that individuality is highly context-dependent, but even under the most extreme environmental alterations tested, consistency of behavioral individuality always persisted in at least one of the traits. Furthermore, our quantification reveals a hierarchical order of environmental features influencing individuality. We confirmed this hierarchy using a generalized linear model and a hierarchical linear mixed model. In summary, our work demonstrates that, similar to humans, fly individuality persists across different contexts (albeit worse than across time), and individual differences shape behavior across variable environments. The presence of consistency across situations in flies makes the underlying developmental and functional mechanisms amenable to genetic dissection.” 

      This revision clarifies that individuality is not uniformly expressed across all behavioral metrics, but rather in a subset of traits with higher repeatability, which are the most promising targets for future genetic analyses.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

      We thank the reviewer for drawing attention to this inconsistency in terminology. We apologize for the oversight and have corrected it throughout the manuscript to ensure uniform usage.

      Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anticonservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and withinindividual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      I am delighted to see the authors have included hierarchical models in their analysis. I really think this strengthens the paper and their conclusions while simultaneously making it more accessible to folks that typically use these types of methods to investigate these patterns of individual behavior. It's also cool, and completely jives with my own experience measuring individual behavior in that the activity metrics show the highest repeatability compared to the more flexible behaviors (such as "exploration"). I think it's quite striking and interesting to see such moderate repeatability estimates in these behaviors across what could be very different environmental scenarios. I think this is a very strong and meaty paper with a lot of information to digest producinghowever a very elegant and convincing take-home message: individuals are unique in their behavior even across very different environments.

      We sincerely thank the reviewer for the positive and encouraging feedback, as well as for their valuable input throughout the review process. We are very pleased that the inclusion of hierarchical models and the resulting interpretations resonated with the reviewer’s own experience and perspective.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) - forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) - indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small - can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      We agree that the emergence of sequence-dependent OFC activity at overlapping positions (e.g., P3) implies knowledge of the broader task structure and therefore must depend on learning. Although we did not record during early acquisition in the current study, we can outline a learning-stage framework consistent with both prior work and the comparative analyses included here and include it in the discussion.

      We think the development of OFC representations is a multi-stage process. Early in learning, before animals have acquired the sequential structure of the task, OFC activity is likely dominated by local sensory features and immediate reinforcement history, with little differentiation between sequences at overlapping positions. As animals learn that odors are embedded within extended sequences that have utility for predicting future outcomes, OFC representations would begin to differentiate identical sensory cues based on their sequence context, giving rise to sequence-dependent activity at positions such as P3. This stage reflects acquisition of the broader task structure and the recognition that current cues carry information about future states.

      With continued training, however, OFC representations normally undergo a further refinement: positions that differ in sensory identity but are functionally equivalent become compressed, while distinctions that are irrelevant for guiding behavior are suppressed. Evidence for this later stage comes from our over-trained control animals, in which discrimination between overlapping positions is near chance across most trial epochs, and from prior work using the same task in less-trained animals, where sequence-dependent discrimination is more strongly preserved. Thus, sequence differentiation appears to emerge during structure learning but is subsequently down weighted as animals learn which distinctions are behaviorally irrelevant.

      Within this framework, prior cocaine exposure appears to interfere specifically with this later refinement stage. Cocaine-experienced rats exhibit OFC representations resembling those seen earlier in learning—retaining sequence-dependent discrimination at overlapping and functionally equivalent positions—despite extensive training. This suggests not a failure to acquire task structure per se, but rather an impairment in the ability to collapse across states that share common underlying causes.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      Thanks for your suggestion, we have removed this supplemental figure as suggested.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      Thanks for your suggestion, we have included the related figure as suggested.

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      Thanks for your suggestion. We appreciate this point and agree that clearer guidance on how to interpret the temporal and scaling properties of the tensor components would help readers. In the TCA framework, each component is defined by three separable factors: a neuron factor, a temporal factor, and a trial (position) factor. The temporal factor reflects the shape of the activity pattern within a trial, indicating when during the trial that component is expressed, whereas the trial factor reflects how strongly that temporal pattern is expressed at each position and across trials.

      Importantly, the absolute scaling of these factors is not independently meaningful. Because TCA components are scale-indeterminate, the magnitude of the temporal factor and the trial factor should be interpreted relative to one another within a component, not across components. Thus, a large value in the trial factor does not imply stronger neural activity per se, but rather greater expression of that component’s characteristic temporal pattern at that position or trial.

      Accordingly, when a component shows similar temporal dynamics across groups but differs in its trial factor structure—as observed here—the interpretation is that the same within-trial dynamics are being differentially recruited across task positions, rather than that the timing of neural responses has changed.

      We have added a brief discussion of this in this section of the results in the manuscript.

      (5) Sucrose control

      Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      We agree that sucrose self-administration is not a perfect neutral manipulation and that this experience could, in principle, influence OFC representations. In particular, sucrose self-administration involves instrumental responding for the same primary reinforcer used in the odor task, and thus may promote additional learning about reward predictability, action–outcome contingencies, or contextual structure that could facilitate generalization.

      Several considerations, however, suggest that the generalization observed in control animals primarily reflects learning-dependent refinement of task representations rather than a specific consequence of sucrose self-administration per se. First, the amount of sucrose administered during this phase was minimal (50 µl × 60 presses at most per session for 14 sessions) compared with the total sucrose reward obtained during task recording (100 µl × 160 trials per session for several dozen sessions). Second, all rats were extensively trained on the odor sequence task prior to any self-administration, and the key signatures of compression and generalization we report—near-chance discrimination between functionally equivalent positions—are consistent with prior studies using the same task in animals that did not undergo sucrose self-administration. Finally, comparisons to less-trained animals in earlier work show that OFC representations evolve toward greater abstraction with increasing task experience, indicating that generalization is a property of advanced learning rather than a unique outcome of sucrose exposure.

      Importantly, even if sucrose self-administration were to enhance generalization in OFC, this would not account for the primary finding that cocaine-experienced rats fail to show these signatures despite identical task training and parallel instrumental experience. Thus, the critical comparison is not between sucrose-trained animals and naive controls, but between two groups matched for self-administration experience, differing only in the pharmacological consequences of the reinforcer. Within this framework, the absence of position-general representations in cocaine-experienced rats reflects a disruption of normal learning-dependent abstraction rather than an artifact of the control condition.

      We have added a brief discussion acknowledging that sucrose self-administration may bias OFC toward abstraction, while emphasizing that cocaine exposure prevents the emergence or maintenance of these representations under otherwise comparable experiential conditions.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      We acknowledge that the number of animals per group is relatively small and therefore cannot fully rule out animal-specific effects. However, the key neural and behavioral signatures reported here were consistent across individual animals within each group and across multiple levels of analysis, and no outliers were observed. In addition, sample sizes of this scale are common in cocaine self-administration studies due to their technical and logistical constraints. We did not attempt to obscure this limitation and have now explicitly acknowledged it in the manuscript discussion.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

      Thank you for pointing this out. We agree that the ordering of task positions in Figures 3E–F should be consistent with the rest of the manuscript. We have reordered the positions to match the standard sequence order used elsewhere in the paper (P1, P2, P3, P4) to improve clarity and avoid confusion.

      Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      We appreciate this suggestion and have tried to expand the Introduction to more explicitly situate the study within the existing literature on cocaine-induced changes in OFC function. In particular, prior work has shown that cocaine self-administration alters OFC firing properties and disrupts behavioral flexibility across species, including impairments in reversal learning, outcome devaluation, and sensory preconditioning. We have revised the Introduction to expand this literature review and more clearly articulate how these established findings motivated our focus on OFC representations of hidden task structure and generalization.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?

      The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      We agree that the current 0–100% scale can make small differences difficult to discern. We will make it clear in the figure captions (We will adjust the y-axis to a narrower range to better highlight group differences). Across P3, cocaine-experienced rats were more accurate than controls.

      We appreciate the suggestion to expand the discussion. We have revised the concluding section to acknowledge key limitations, including the use of only male rats, the number of subjects, and to note that alternative explanations—such as differences in motivational state or attention—could also contribute to the observed effects. These revisions provide a more balanced interpretation while retaining the focus on OFC-mediated generalization as a potential mechanism for persistent, context-specific drug-seeking.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      We thank the reviewer for this point. While neuronal encoding of individual positions (specific odors) in control animals was comparatively lower, this does not indicate that the rats were using a simpler strategy based solely on reward patterns. First, rats were extensively trained on the odor sequence task prior to recordings, demonstrating accurate discrimination across all positions, and their trial-by-trial behavior reflects sensitivity to specific odors rather than only reward alternation. Second, the task design—with overlapping sequences and positions that differ in reward contingency across sequences—requires tracking odor-specific context to maximize reward; a purely “two rewarded, two non-rewarded” strategy would fail at overlapping positions and would not account for the compression of functionally equivalent positions observed in the OFC. Third, in the less-trained rats shown in Figure 3C, decoding accuracy was higher than in the sucrose group, indicating that these animals still differentiated negative positions. With additional training, decoding patterns suggested improved generalization across positions. Thus, the near-chance neural selectivity in controls reflects representation of latent task states rather than external sensory cues, consistent with the idea that OFC abstracts task-relevant structure and ignores irrelevant sensory differences.

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

      At present, the basis of these response-time differences remains unclear, in part because motivation is difficult to define operationally. If motivation is indexed solely by reaction time or poke latency, then the data are consistent with increased response vigor in cocaine-experienced rats. Indeed, RT and poke-latency measures indicate that cocaine-experienced rats responded more quickly on some rewarded trials, including after P3. However, overall task performance was high in both groups, suggesting that these differences cannot be attributed simply to superior learning or engagement. Faster responses may also reflect differences in deliberation or strategy, with cocaine-experienced rats relying more on rapid, stimulus-driven responding and sucrose-trained rats engaging in more careful evaluation. In addition, altered reward sensitivity or persistent effects of cocaine exposure may contribute to these behavioral differences. Thus, the faster responses observed in cocaine-experienced rats likely reflect a combination of heightened reward responsivity and altered encoding of task structure, rather than a straightforward increase in motivation alone.

      Recommendations for the authors:

      The reviewers were very positive about the manuscript and emphasized the rigor and state of the art analyses. Two points that came up were the very small n (6 total and 3 per condition) and the exclusive use of males. Adding more subjects is not recommended. However, more discussion and acknowledgement of this issue is recommended. The main concern is that idiosyncratic differences between individuals (not differences in cocaine history) are responsible for the differences observed in OFC encoding.

      We acknowledge that the sample size (n = 3 per group) and use of only male rats limit generalizability and do not fully rule out idiosyncratic, individual-specific effects. However, the key neural and behavioral signatures we report were consistent across all animals within each group and across multiple analyses (single-unit, ensemble decoding, and TCA). We now explicitly note these limitations in the Discussion, emphasizing that while individual variability cannot be fully excluded, the convergence of results across multiple levels of analysis supports the interpretation that the observed differences reflect effects of prior cocaine exposure rather than idiosyncratic differences.

      Reviewer #2 (Recommendations for the authors):

      In the legend to figure 2, the authors state "Notably, rats could discriminate between the two sequences (S1 vs. S2) based solely on current sensory information at two task epochs ["Odor" at P3 and P4; black bars]. At all other task epochs, indicated by gray bars, the discrimination relied on an internal memory of events". I'm confused by this statement- how does the odor at P3 help to discriminate the sequences? Surely P1 and P4 are the times when the odor sampling indicates which sequence they are in?

      We thank the reviewer for pointing out this source of confusion. The statement in the original figure legend was imprecise, and we have removed the figure and revised the figure legends because the results in the left panel substantially overlapped with those shown in the right panel. In this task, odors at positions P1 and P4 are the only cues that directly signal sequence identity, whereas the odors presented at P2 and P3 are identical across sequences. Accordingly, discrimination observed during the “Odor” epoch at P3 does not reflect sensory differences but instead depends on the animal’s use of internal memory or sequence context to infer sequence identity.

    1. What children do not see in their books also teaches them about who matters and who doesn’t in our society.

      This quote stood out to me because it shows how important representation is in children’s books. When certain groups of people are not shown in books, children may think those people are not important or do not belong. Books should include different cultures, families, and experiences so that all students can see themselves and others represented. As future teachers, we need to choose books carefully so our classroom libraries reflect the diversity of the real world.

    1. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments on our manuscript. We have thoroughly considered all points raised and have made extensive revisions to address them. These revisions have significantly strengthened the manuscript.

      In summary, the key revisions and clarifications include:

      (1) Developmental Time-Course: To address the need for earlier phenotypic analysis, we have performed new immunofluorescence experiments at 30 days after hatching (dah). This new data (Fig. S7) precisely pinpoints the onset of the Leydig cell differentiation defect in dhh<sup>-/-</sup> mutants, establishing ~30 dah as the critical window for Dhh action.

      (2) Role of Ptch1 and Ptch2: We have qualified our conclusions regarding receptor specificity throughout the text to accurately reflect our findings and the limitation posed by the early lethality of ptch1 mutants. The in vivo genetic evidence for Ptch2 (the rescue of dhh<sup>-/-</sup> by ptch2<sup>-/-</sup>) is emphasized, while we now explicitly state that a role for Ptch1 cannot be ruled out without future conditional knockout models.

      (3) Mechanism between Gli1 and Sf1: In direct response to the reviewers' request for stronger evidence, we have performed a new cold probe competition assay. This experiment provides dose-dependent, biochemical evidence for the specificity of Gli1 binding to the sf1 promoter (New Fig. 5E). Furthermore, we have revised the text throughout the manuscript to use more precise language (e.g., "Gli1 activates sf1 expression") and removed overstated claims of "direct" regulation.

      (4) Methodological Rigor and Controls: We have added crucial negative controls for all RNA-FISH experiments using sense probes (New Fig. S9), provided detailed quantification methods for immunofluorescence, clarified the number of biological replicates for transcriptomic analyses, and corrected statistical tests as recommended.

      (5) Clarity and Presentation: We have revised the text for clarity, expanded the description of the TSL cell line's validation in the Introduction, added missing details to figure legends and methods, and incorporated suggested key references.

      We believe that our detailed responses and the significant new data and textual revisions have fully addressed the reviewers' concerns and have substantially improved the quality and impact of our manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.

      Major comments:

      (1) Are the key conclusions convincing?

      Most results as reported are convincing; however, some conclusions are premature as additional experiments are required to satisfy their claims. For example, the phenotype of the dhh-/- testis is convincing in that Cyp1c1 cells are missing and the addition of ptch2-/- rescues the phenotype indicating a direct path. The link from gli to sf1, however, requires additional study to validate the direct relationship (see item 3 below).

      We thank the reviewer for the positive assessment that our principal findings are convincing. Regarding the connection between Gli1 and Sf1, we agree that additional validation was important. We have now performed new experiments and revised our text. As detailed in our response to item 3 below, we have incorporated a cold probe competition assay (new Fig. 5E) which provides dose-dependent evidence for the specificity of Gli1 binding to the sf1 promoter. Furthermore, we have toned down our conclusions in the manuscript.

      (2) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Major: Most significant premature claim is the statement that gli1 directly controls sf1 activity. Additional experiments are required to make this claim (see next statement).

      We agree with the reviewer that the claim of "direct" control was premature. We have therefore revised the manuscript accordingly. All statements claiming "direct" regulation of sf1 by Gli1 have been removed or replaced with more accurate descriptions, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1." These changes, coupled with the new functional data from the cold probe competition experiment (Fig. 5E) described in our response to item 3, now provide a robust and appropriately qualified account of our findings.

      Minor: As addressed in the discussion section, the ptch1 animals fail to survive limiting the ability to validate both ptch1 and ptch2 roles. Thus, the conclusion that only ptch2 is required should be qualified.

      We thank the reviewer for this rigorous comment. We fully acknowledge the limitation imposed by the early lethality of ptch1 mutants, which precludes a definitive in vivo assessment of its potential role in postnatal testis development. In direct response to this point, we have revised the text throughout the manuscript to more accurately reflect the strength of our conclusions. Specifically, in the Results section, we now state that “This differential receptor requirement implies that Ptch2 likely acts as the functional receptor for transducing Dhh signals in TSL cells” (lines 174–176). Furthermore, we have strengthened the Discussion by explicitly stating: “Therefore, while our findings strongly nominate Ptch2 as the principal receptor for Dhh in SLCs, a definitive exclusion of a role for Ptch1 will require future studies employing Leydig cell–specific conditional knockout models” (lines 265–268). We believe these revisions provide a appropriately qualified interpretation of our data while maintaining the compelling narrative of Ptch2's primary role.

      Major: There are a couple of key references missing however, please consider including:

      - Kothandapani A, Lewis SR, Noel JL, Zacharski A, Krellwitz K, Baines A, Winske S, Vezina CM, Kaftanovskaya EM, Agoulnik AI, Merton EM, Cohn MJ, Jorgensen JS.PLoS Genet. 2020 Jun 4;16(6):e1008810. doi: 10.1371/journal.pgen.1008810. eCollection 2020 Jun.PMID: 32497091

      - Park SY, Tong M, Jameson JL.Endocrinology. 2007 Aug;148(8):3704-10. doi: 10.1210/en.2006-1731. Epub 2007 May 10.PMID: 17495005

      We have included the key references: Kothandapani A, et al. (2020). PLoS Genet. and Park SY, et al. (2007). Endocrinology.

      (3) Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Additional experiments are suggested to strengthen the direct connection between gli1 and sf1:

      Major: Figure 5F shows evidence for increased sf1-luc activity upon co-transfection of OnGli1 in TSL cells. These data would be strengthened with evaluation of the same sf1 promoter that has each/both putative GLI binding sites mutated.

      We thank the reviewer for this insightful suggestion. To further strengthen the evidence for the functional connection between Gli1 and the sf1 promoter, we have performed a new cold probe competition experiment. Given the potential presence of other unpredicted Gli-binding motifs within the 5-kb sf1 promoter region and the practical constraints, we employed an alternative, robust biochemical approach. This assay used a wild-type oligonucleotide containing the canonical Gli-binding motif (GACCACCCA) as a specific competitor. As shown in the new Fig. 5E, this cold probe caused a significant, dose-dependent reduction in Gli1-induced sf1-luc activity, while a mutated control probe (TTAATTAAA) had no effect. This result provides strong evidence that Gli1-mediated transactivation of the sf1 promoter is dependent on its specific binding to this consensus motif.

      Furthermore, in response to the reviewer's comment, we have revised the manuscript text to use more precise language, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1," toning down any overstated claims of direct regulation. Together with the existing data-which includes the original luciferase assay, the new competition experiment, and key loss-of-function/gain-of-function genetic evidence from SLCs transplantation-we believe our study now provides a compelling and multi-faceted case for Gli1 being the key regulator of sf1 within this pathway. We are confident that these revisions have satisfactorily addressed the point raised.

      Major: All 8xGli-luciferase assays should include evaluation of the mutant 8xGli-luciferase plasmid as a negative control.

      We thank the reviewer for highlighting the importance of reporter assay controls. In our study, we included the empty vector pGL4.23, which lacks any Gli-binding sites, as the fundamental negative control. As shown in Fig. 4C, this vector showed minimal background activity that was unresponsive to Dhh, confirming that the strong luciferase induction in the 8xGli-reporter is entirely dependent on functional Gli-binding sites. While a mutated 8xGli construct is one valid approach, we think that the use of an empty vector is functionally equivalent and equally rigorous for establishing specificity. We are confident that our current data unambiguously demonstrate Gli-dependent activation. For clarity, we have explicitly stated in the figure legend and methods that pGL4.23 served as the negative control.

      Minor: Figure 5D experiment that includes TSL-gli1(also 2,3) +/- OnDhh; please examine whether the absence of Gli affects expression of sf1 in each condition. In other words, provide a loss-of-function of Gli connection to regulation of sf1.

      We measured the mRNA expression levels of sf1 in TSL-WT, TSL-gli1<sup>-/-</sup>, TSL-gli2<sup>-/-</sup>, and TSL-gli3<sup>-/-</sup> cells using qRT-PCR. The results are presented in the new Supplementary Figure S8A. The results show that the loss of gli1 leads to a significant reduction in the expression of sf1. In contrast, the knockout of gli2 or gli3 had no significant effect on sf1 expression levels.

      (4) Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Given the expertise, it is not anticipated that the suggested experiments would be a significant burden to this group.

      We appreciate the reviewer's considerations. Now, we have performed the additional key experiments, which have been incorporated into the revised manuscript. We believe these new data have fully addressed the points raised.

      (5) Are the data and the methods presented in such a way that they can be reproduced?

      Most methods are adequately described or referenced to previous detailed description. There were, however, some methods that could benefit from additional details:

      Major: IF quantification data: please provide details on how the number of positive cells were quantified and presented, for example, how many cells from how many sections for each genotype were included for the analysis?

      We have added relevant information in the "Materials and Methods" section in line 369-373: “For each biological replicate (n\=5-6 fish per genotype), three non-serial, non-adjacent testis sections were analyzed. From each section, three representative fields of view were captured to ensure non-overlapping sampling. All positive cells number of Vasa, Sycp3 and Cyp11c1 was quantified by Image J Pro 1.51 software using default parameters.”

      Major: FISH: No controls are present, for example, scrambled RNA probes. Further, please clarify or address the significant presence of message in the nucleus.

      As suggested, we have now included negative control experiments using sense RNA probes for all genes (ptch1, ptch2, gli1, gli2, gli3). These controls showed no specific signal, confirming the specificity of our antisense probe hybridization. These data are now presented in the new Supplementary Figure S9.

      Major: TSL cells: TSL-onDhh, -onSf1: provide evidence for increase in expression

      We measured the mRNA expression levels of dhh in TSL-WT and TSL-OnDhh, and sf1 in TSL-WT and TSL-OnSf1 using qRT-PCR. The results are presented in the new Supplementary Figure S8B. The results show that overexpression of Dhh and Sf1 significantly increased the mRNA expression levels of dhh and sf1, respectively.

      Major: TSL + SAG cells and other treatments in general: how long were they treated before transplantation?

      Response: We have added relevant information in the "Materials and Methods" section in line 398-399: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before transplantation.”

      Major: Transcriptome analyses: how many replicates were used for each cell line? Please clarify-the results presented in Fig 5E: how was this plot generated, it is interpreted that all three cell lines were combined and compared to the WT line. It is not clear how this was achieved.

      We have added relevant information in the "Materials and Methods" section in line 445-447: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before collection. For each genotype, cells from three independent culture wells were pooled.

      Added relevant information in the "Results" section in line 198-202: “…we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions.”

      (6) Are the experiments adequately replicated and statistical analysis adequate?

      Most are adequate and appropriate, some questions remain:

      - Transcriptomes-how many replicates (see above)?

      - IF quantification-how were cells identified/how many sections (see above)?

      Minor: Statistics: methods indicate that a student's t-test was used, but ANOVA's are also used, which is appropriate. There are data presented that should be reevaluated via an ANOVA: Figure 4D, 4N-R; Figure 5G-no stats indicated in figure legend.

      We sincerely thank the reviewer for highlighting the inappropriate use of statistical tests in our original submission. We have re-analyzed all data using the ANOVA-based methods as suggested in the specific detail. We confirm that these changes do not alter the overall interpretation of our results but provide a more robust and statistically sound foundation for our conclusions. We changed “Differences were determined by two-tailed independent Student's t-test” to “Statistical significance was determined by one-way ANOVA followed by Tukey's test (C, Q-U, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (D) (*, P < 0.05; **, P < 0.01; NS, no significant difference).”

      In lines 719-721 we added “Statistical significance was determined by one-way ANOVA followed by Tukey's test (E, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (B, H) (*, P < 0.05; **, P < 0.01; NS, no significant difference).” in line 745-747.

      Reviewer #1 (Significance):

      The data presented in this manuscript provides important context towards the connection between the DHH pathway, Sf1, and steroidogenesis.

      The audience would likely include developmental biologists, including those related to differentiation of any hormone producing cell type and especially those focused on steroidogenesis onset. Clinical interests will be related to sex determination and differentiation, especially related to male sex phenotype differentiation. Basic scientists will be especially interested.

      Expertise: mouse fetal testis differentiation and maturation, steroidogenesis, hedgehog, sf1. Good fit except for the animal model, but they are surprisingly similar.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, Zhao et al., investigated the role of Dhh signaling pathway in the proliferation and differentiation of leydig lineage cells in the testes of Nile tilapia, an economic important farmed fish. By generating dhh mutants, the authors showed that loss of Dhh in tilapia recapitulated mammalian phenotypes, characterized by testicular hypoplasia and androgen insufficiency. A previous established TSL line was used to rescue the deficits in dhh-/- testes, which demonstrated that Dhh regulates the differentiation of SLCs rather than their survival. By generating mutant TSL lines, the authors aimed to identify the downstream players under Dhh in tilapia. Based on the data, the authors propose that a dhh-ptch2-gli1-sf1 axis exists in leydig cell lineage development.

      How secreted dhh from Sertoli cells affect the Leydig cells remains elusive. While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model. Unfortunately, this work is not well performed, and the conclusions are not well supported by the current data. And to reach logic conclusions, more meaningful experiments should be performed, and more convincing data should be provided.

      Strength:

      The authors used genetic mutants, TSL lines, and cell transplantation techniques to address the questions. The manuscript is technically sound, and overall is well-written.

      Limitations:

      Experimental design should be optimized, and more convincing data should be provided to reach solid conclusion.

      (1) The SLCs (stem leydig cells) used in this work. The SLC line was established from 3-month-old immature XY tilapia. The authors claimed that this line is a SLC line only because they express a few Leydig markers such as pdgfra and nestin. However, in my opinion, the identity of the cell line is not clear. It is suggested to perform more experiments, including flow cytometry assay or single cell RNA sequencing analysis, to further characterize this line, to demonstrate that this line is a real SLCs that are equivalent to the SLCs in 3-month testes of tilapia. According to the previous publication (2020), the information about the line was not well presented.

      We thank the reviewer for this comment regarding the characterization of the TSL cell line. The identity of TSL as a stem Leydig cell line was rigorously established in our previous publication (Huang et al., 2020), which provided comprehensive molecular, in vitro, and in vivo functional evidence that meets the definitive criteria for an SLC. This includes its stable expression of established SLC markers (pdgfrα, nestin, coup-tfii), its capacity to differentiate into steroidogenic cells producing 11-KT in vitro, and most critically, its ability to colonize the testicular interstitium, differentiate into Leydig cells, and restore androgen production upon transplantation in vivo.

      In direct response to the reviewer's point, we have revised the Introduction of our manuscript to provide a more detailed and clear description of the TSL line's origin and validation (lines 95-105) as “Furthermore, a stem Leydig cell line (TSL) has been established from the testis of a 3-month-old Nile tilapia. TSL expresses platelet-derived growth factor receptor α (pdgfrα), nestin, and chicken ovalbumin upstream promoter transcription factor II (coup-flla), which are usually considered as SLC-related markers in several other species. Notably, this cell line exhibits the capacity to differentiate into 11-ketotestosterone (11-KT)-producing Leydig cells both in vitro and in vivo. When cultured in a defined induction medium, TSL cells differentiate into a steroidogenic phenotype, expressing key steroidogenic genes including star1, star2, and cyp11c1, and producing 11-KT; upon transplantation into recipient testes, TSL cells successfully colonize the interstitial compartment, activate the expression of steroidogenic genes, and restore 11-KT production”, ensuring that readers can fully appreciate its well-founded identity as a SLC model without needing to consult the original publication. We are confident that the existing body of evidence solidly supports all conclusions drawn from its use in this study.

      (2) How loss of dhh affects testicular and the leydig cell lineage development are not clearly investigated. In the current manuscript, the characterization of dhh mutant was not enough and lack of in-depth investigation. The authors primarily looked at testes at 90 dph when Leydig cell lineage was well developed. In my opinion, this time was too late. To investigate the earlier events that are affected by loss of dhh, I suggested to perform experiments at earlier time points, in particular around the initiation stages of the sex differentiation and Lyedig cell specification/maturation.

      We thank the reviewer for this insightful comment. We agree that a thorough developmental analysis is crucial. In response to this point, we have now performed an in-depth investigation at earlier stages to precisely define the phenotype onset.

      Our revised manuscript includes new data from a developmental time-course analysis. While our initial characterization included 5, 10, and 20 dah, we now identified 30 dah as the critical window for Leydig cell differentiation onset, which was also supported by prior work (Zheng et al.). Our new immunofluorescence data at 30 dah now clearly show that Cyp11c1-positive cells are present in wild-type testes but are entirely absent in dhh<sup>-/-</sup> mutants (Fig. S7). This finding pinpoints the initial failure of SLC differentiation.

      We have integrated this key finding into the Discussion (lines 234-239) as “To define the onset of Leydig cell differentiation, we performed a developmental time-course analysis. This revealed that Cyp11c1-positive steroidogenic cells first appear in wild-type testes at 30 dah, while being conspicuously absent in dhh<sup>-/-</sup> mutants at this same stage (Fig. S7). This clear temporal pattern establishes ~30 dah as the developmental window when SLCs initiate their differentiation program in the Nile tilapia.”

      Concurrently, our analysis of the 90 dah timepoint remains vital, as it represents a mature stage with robust spermatogenesis and a stabilized somatic niche. This allows for a comprehensive assessment of the ultimate functional consequences of the early differentiation block, including its impact on germ cell support and overall testicular architecture.

      Thus, our study now provides a complete developmental perspective: the 30 dah timepoint identifies the initiation of the Dhh-dependent defect, while the 90-dah analysis reveals the mature, functional outcomes within the intact testicular niche.

      (3) The authors claimed that there was a ptch2-gli1-sf1 axis. The conclusion was drawn largely based on data that generated from the in vitro cultured TSL line. More data from genetic mutant tilapia are required to support the conclusion.

      We thank the reviewer’s insightful comments regarding the need for robust in vivo validation. In fact, our conclusion of a Dhh-Ptch2-Gli1-Sf1 axis is supported by an integrated experimental strategy, combining key in vivo evidence with targeted in vitro analyses to build a coherent model.

      (1) Evidence for Ptch2 as the key receptor: The role of Ptch2 is supported by a pivotal in vivo genetic experiment. The observation that the dhh<sup>-/-</sup> testicular phenotype is fully rescued in dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants provides compelling genetic evidence that Ptch2 is the essential receptor for Dhh in vivo (Fig. 4E-U). We acknowledge that the early embryonic lethality of global ptch1 mutation precludes its functional analysis in postnatal testis development. Therefore, while our data strongly nominate Ptch2 as the principal receptor, we have qualified our conclusions in the revised manuscript to reflect that a role for Ptch1 cannot be definitively excluded without Leydig cell-specific conditional knockout models.

      (2) Evidence for Gli1 and its regulation of Sf1: The role of Gli1 as the key transcriptional effector was efficiently identified using our well-characterized TSL system, a valid approach for dissecting this highly conserved signaling cascade. The functional connection between Gli1 and Sf1 is supported by multiple lines of evidence: transcriptomic profiling, promoter analysis, luciferase reporter assays (including a new cold probe competition experiment), and most importantly, in vivo functional validation via SLC transplantation. The latter demonstrated that Sf1 is both necessary and sufficient for SLC differentiation within the testicular niche (Fig. 5).

      In direct response to the reviewer's points, we have thoroughly revised the manuscript text to ensure all claims are accurately stated, particularly regarding the receptor specificity and the nature of the Gli1-Sf1 regulatory relationship. We believe our study provides a solid foundation for the proposed signaling axis.

      Overall, better experimental design should be planned, including the rescue experiments. Some key information was missed. For instance, the identity of the stem Leydig cells was not clearly presented.

      We have explained it in point #1.

      Figures:

      Figure 1: The authors described the phenotypes at 90 dph. Loss of dhh led to severe phenotypes in testicular formation, as evidenced by defective formation of Vasa, a germline stem cell marker; loss of expression of cyp11c1, a leydig cell marker; and loss of sycp3, a marker of meiosis of spermatogonia.

      However, in my opinion, 90 dph was too late. To investigate the role of dhh in Leydig cell lineage, the authors are suggested to focus on earlier developmental stages when the sex differentiation and maturation of leydig cells occur. This work is actually a development biology one that investigates how dhh loss in Sertoli cells affects the development of Leydig cells. The careful characterization of earliest testicular phenotypes of dhh mutant is very important.

      We have explained it in point #2.

      Figure 2: Please clarify the logic for performing rescue experiments using 11-KT. Provided the critical role of 11-KT in the testis development and spermatogenesis, it was not unexpected that 11-KT treatment can rescue most of the cell types in testes. If dhh is absolutely required for LC lineage development maturation, adding 11-KT at 30 dph will not have an effect. Why not perform rescue experiments using Dhh protein?

      We thank the reviewer for this insightful comment, which allows us to clarify the logical progression of our experimental design, a process central to genetic discovery.

      When we first characterized the dhh<sup>-/-</sup> mutant, we observed a complex suite of phenotypes: testicular hypoplasia, arrested germ cell development, a profound deficiency of Leydig cells, and drastically low androgen levels. A primary challenge was to distinguish which defects were direct consequences of losing Dhh signaling and which were secondary effects of the overall testicular failure.

      We therefore employed a classic genetic strategy: phenotypic dissection through targeted rescue. The 11-KT rescue experiment was designed to test a foundational hypothesis: Are the severe testicular defects in dhh<sup>-/-</sup> mutants primarily a consequence of the systemic androgen deficiency? The results provided a pivotal and clear answer: while 11-KT treatment partially rescued germ cell development and testicular structure, it completely failed to restore the population of Cyp11c1-positive Leydig cells. This critical finding allowed us to dissociate the phenotypes, demonstrating that the Leydig cell defect is a primary, cell-autonomous consequence of Dhh loss, not a secondary effect of low androgen.

      This conclusion logically propelled the next phase of our research: to shift focus from systemic hormone action to the local, niche role of Dhh in regulating the Leydig lineage directly. This led directly to the TSL transplantation experiments and the mechanistic dissection of the Ptch2-Gli1-Sf1 axis within SLCs.

      Regarding the use of Dhh protein, we agree it is a complementary approach. However, producing biologically active, recombinant Hedgehog ligand is challenging due to its essential dual lipid modification, which is required for solubility and activity. Our transplantation experiments with TSL-OnDhh cells (Fig. 3) functionally demonstrate that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, thereby directly addressing the core question without the need for recombinant protein.

      Figure 3. The authors showed that in dhh-/- testes, TSL engrafted equivalently but failed to express Cyp11c1. This result was strange which raised a question about the identity of the TSLs, as I have mentioned above. The authors claimed that the TSLs are stem Leydig cells, which I doubt. Additional data should provided to support the statement.

      In the testicular environment, the transplanted TSLs should be able to colonize and differentiate into more mature leydig cells. Only a small portion of the PKH26-labled TSLs became Cyp11c1 positive after transplantation, can the authors comment this observation?

      To address "Mutation of dhh blocks SLC differentiation", the authors should first carefully examine the TSL lineage development using dhh mutant. Then, investigate how loss of dhh disrupts the cross talk between Sertoli cells and Leydig cells. why bother performing transplanted TSLs? Please clarify. Why not perform rescue experiments using Dhh protein at appropriate developmental stages?

      We thank the reviewer for these comments, which allow us to clarify the rationale and interpretation of our key experiments.

      (1) We have provided comprehensive evidence establishing the TSL line as a SLC line (Response to Point #1). The observation that WT TSL cells engraft but fail to differentiate in the dhh<sup>-/-</sup> testicular environment is not strange; it is, in fact, the core and most crucial finding of this experiment. It provides direct functional evidence that the dhh<sup>-/-</sup> niche lacks the essential signals required to initiate SLC differentiation, consistent with the severe deficiency of endogenous Cyp11c1<sup>+</sup> cells in these mutants (Fig. 1I-J', N).

      (2) The reviewer's concern about "only a small portion" of cells differentiating is based on a misunderstanding. Our quantitative data (Fig. 3F) show that approximately 78% of the transplanted PKH26+ TSL cells successfully differentiated into Cyp11c1<sup>+</sup> cells in WT hosts. This high efficiency robustly demonstrates the differentiation potential of TSL cells and the permissiveness of the WT niche. The near-zero differentiation rate in the dhh<sup>-/-</sup> host (Fig. 3F) starkly highlights the specific and severe defect in the mutant microenvironment.

      (3) The TSL transplantation experiment was the most direct strategy to test why Cyp11c1<sup>+</sup> cells are absent in dhh<sup>-/-</sup> testes. It allowed us to distinguish between a failure in SLC differentiation and other possibilities (e.g., cell death). The finding that functional SLCs cannot differentiate in the mutant niche logically directed our subsequent focus onto the cell-intrinsic molecular mechanism (the Ptch2-Gli1-Sf1 axis) within the Leydig lineage. While Sertoli-Leydig crosstalk is an important area, it was beyond the scope of this study aimed at defining the intrinsic differentiation pathway.

      (4) Regarding Dhh protein rescue, generating bioactive, lipid-modified recombinant Hh protein is technically challenging. Our transplantation of TSL-OnDhh cells (Fig. 3) functionally demonstrates that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, effectively addressing this question without the need for recombinant protein.

      Figure S3. “To assess whether dhh mutation affects androgen-producing cells outside Leydig cells, 11-KT levels were analyzed during early testicular development before SLCs differentiation. IF analyses revealed that no Cyp11c1 positive cells were present in the testes of XY WT fish at 5, 10, and 20 dah, indicating that SLCs had not yet differentiated at these stages (Fig. S3A-C). Tissue fluid 11-KT levels showed no significant differences between WT and dhh-/- XY fish at 5, 10, and 20 dah (Fig. S3D)”. These observations suggested that loss of dhh does not affect the specification of SLCs, but affect its differentiation into mature LCs. The differentiation of Cyp11c1 should be later than 20 dah. So when is the earliest time point for formation of Cyp11c1 positive cells, and how loss of dhh affect this? These are important questions to answer.

      We agree with the reviewer's interpretation that our data suggest dhh loss affects SLC differentiation rather than initial specification. In direct response to the need for earlier timepoints, we have now performed and included an analysis at 30 dah, which we identified as the critical window for Leydig cell differentiation onset. Our new data (Fig. S7) show that Cyp11c1+ cells are present in WT testes but are entirely absent in dhh<sup>-/-</sup> mutants at this stage. This precisely pinpoints the initiation of the phenotypic divergence and establishes ~30 dah as the developmental window when Dhh signaling is required to drive SLC differentiation. Our study therefore now provides a complete developmental perspective, from the initial failure at 30 dah to the mature functional outcomes at 90 dah.

      Figure 4. The authors generated ptch1/2 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Ptch2, but not Ptch1, is specifically required for transducing Dhh signals in TSLs. The conclusion was only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments, using ptch mutants, should performed to substantiate this.

      The authors stated “Ptch2 acts as the obligate receptor for Dhh signaling during testis development”. If ptch2 is required for TSL lineage, why ptch2-/- testes exhibited no significant differences in testicular histology and Leydig cell (Cyp11c1+) populations and serum 11-KT levels? This contradictory statement need to be addressed.

      We thank the reviewer for these critical comments, which allow us to clarify the logic underlying our conclusions regarding Ptch2.

      (1) In Vivo Genetic Evidence for Ptch2: Our conclusion that Ptch2 is the primary receptor for Dhh is not based solely on the TSL luciferase assays. It is definitively supported by a key in vivo genetic experiment: the complete phenotypic rescue in the dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants (Fig. 4F-R). In genetic terms, the loss of the receptor (ptch2) suppressing the phenotype caused by the loss of the ligand (dhh) is classic evidence for a ligand-receptor relationship within a linear pathway. This in vivo evidence strongly substantiates Ptch2's role at the animal level. The early embryonic lethality of ptch1 mutants precludes a similar in vivo test for Ptch1 in postnatal testis development.

      (2) Addressing the Apparent Contradiction of the ptch2<sup>-/-</sup> Phenotype: The reviewer raises an excellent point, which stems from the fundamental biology of the Hh pathway as shown in Author response image 1. Ptch receptors are inhibitory. In the absence of ligand, Ptch suppresses pathway activity.

      Author response image 1.

      The canonical Hh signaling pathway. In the dhh<sup>-/-</sup> mutant, the pathway is suppressed due to unopposed Ptch activity, leading to a failure in SLC differentiation. In the ptch2<sup>-/-</sup> mutant, this key inhibitory brake is removed, leading to constitutive activation of the pathway. The fact that ptch2<sup>-/-</sup> testes are normally indicates that this level of pathway activation is not detrimental and, crucially, is sufficient to support wild-type levels of Leydig cell development and steroidogenesis. This lack of a phenotype in the receptor mutant, contrasted with the severe ligand mutant phenotype, is a common and expected observation in signaling pathways where the receptor acts as a tonic inhibitor.

      In summary, the normal development of ptch2<sup>-/-</sup> testes is not contradictory but is entirely consistent with its role as the inhibitory receptor for Dhh. The severe phenotype in dhh<sup>-/-</sup> mutants and its specific rescue by removing ptch2 provides compelling genetic evidence for their functional relationship. We have revised the text throughout the manuscript to ensure these conclusions are accurately stated.

      Figure 5. The authors generated gli1/2/3 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Gli1, but not Gli2/3, was specifically required for transducing Dhh signals in TSL cells. The conclusion is drawn, only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments should performed to substantiate this, using the gli mutant fish.

      To identify Gli1-dependent targets in SLCs, the authors compared transcriptomes of TSLWT, Dhh-overexpressing (TSL-OnDhh), Gli1-overexpressing (TSL-OnGli1), and SAG-treated (TSL+ SAG) TSL cells. While this experiments can be used to identify dhh target genes, it is better to use gli mutant cell lines. Since the authors have generate gli1/2/3 mutants, why not using these mutant fish to identify/confirm the Gli targets?

      We thank the reviewer for these comments.

      (1) We acknowledge that Gli1 as the key transcriptional effector is primarily based on our in vitro evidence using the TSL cell line. We have revised the manuscript accordingly to ensure this is stated precisely, avoiding overstatement.

      (2) Concerning the transcriptomic analysis, the reviewer suggests using glis mutant cell lines. While this is a valid approach, our strategy of profiling pathway activation (via Dhh/Gli1 overexpression or SAG treatment) was deliberately chosen to provide a high signal-to-noise ratio for identifying genes that are positively upregulated during the differentiation process. Analyzing loss-of-function mutants under basal conditions can be confounded by potential compensatory mechanisms among the Gli family members, potentially masking the specific transcriptional signature of pathway activation we sought to capture.

      By the way, we have generated gli1/2/3 mutant TSL cell lines for the functional luciferase assays, but we have not generated the corresponding glis mutant fish lines, which would represent a substantial new line of investigation.

      Reviewer #2 (Significance):

      While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors investigate the Dhh signaling pathway in Leydig cell differentiation in the tilapia model. They generated multiple mutant lines in different hedgehog pathway components and utilized a Leydig stem cell line to interrogate Leydig cell differentiation. Through this analysis, the authors demonstrate that Dhh regulates Leydig differentiation rather than survival. They also found that Ptch2 is the specific receptor that mediates signaling to promote Leydig differentiation and that Gli1 is the primary Gli involved. Furthermore, they show that a known regulator of Leydig cell development and function, SF1, is a downstream transcriptional target. Overall, the study identifies previously unknown information as to how Dhh signaling regulates Leydig cell development, which is necessary for testosterone production by the testis.

      Major Comments

      (1) In the RNAseq analysis is not clear exactly how the 33 "up-regulated" genes were identified. What was the methodology for identification of these genes? Some of the genes were down-regulated or not different in the OnGli condition and some in the OnDhh condition were not differentially expressed, as shown in Fig S8B. Therefore, it is unclear why all 33 genes are classified as upregulated "across all three conditions".

      We have clarified this methodology in the Materials and Methods section in line 452-454: “Differentially expressed genes (DEGs) were identified for each condition (TSL-OnDhh, TSL-OnGli1, TSL+SAG) compared to TSL-WT controls using edgeR (threshold: FDR < 0.05, |log2(foldchange)| ≥ 1.5). And we Added relevant information in the Results section in line 198-202: we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions (Fig. 5C, S6A).”

      We have also updated Fig. S8B to include a clear value and to better visualize the FPKM value levels of these 33 genes across the conditions.

      (2) In figure 4A (and possibly B), it appears that ptch RNA is in the nucleus of the cell. Why would the RNA be primarily in the nucleus? Is the RNA detection accurate? Were controls done? The methods state that sense probes were made but no how they compared to the antisense probes. This comment can also be applied to the gli FISH, particularly gli3 (Figure 5).

      This is an excellent observation. We speculate that the apparent nuclear signal may be due to strong transcriptional activity in the nucleus. To confirm the specificity of our FISH experiment, we performed FISH with sense RNA probes as negative controls for all genes (ptch1, ptch2, gli1, gli2, gli3), and no specific signals were observed (see New Fig. S9).

      Minor comments

      (1) In the introduction, please include information as to when tilapia reach sexual maturity

      We have added this information to the Introduction in line 91-92: early sexual maturity (approximately 3 months after hatching for males and 6 months after hatching for females).

      (2) When first mentioning experiments that use the PKH26 dye, please give a brief description of the dye in the text of the results. This is described in the methods but it would be helpful to have some information about what PKH26 is in the results to more easily understand the figure and experimental design.

      We have added a brief description in the Results section in line 151-152: “To dissect Leydig cell lineage impairment in dhh<sup>-/-</sup> testes, we transplanted the TSL labeled with PKH26 (a fluorescent red hydrophobic membrane dye that enables tracking of transplanted cells) into WT and dhh<sup>-/-</sup> testes (Fig. 3A).”

      (3) In the statistical analysis section of the methods, the authors state that two-tailed t-tests were performed however in the figure legends it states that ANOVA was done for some of the statistical analysis. Please clarify this.

      We have updated the Statistical Analyses section in Methods to clarify in line 472-476: “A two-tailed independent Student’s t-test was used to determine the differences between the two groups. One-way ANOVA, followed by Tukey multiple comparison, was used to determine the significance of differences in more than two groups. P < 0.05 was used as a threshold for statistically significant differences.”

      (4) Figures - in figures that have charts with the Y-axis labeled as "relative positive cells", or similar, please explain what exactly is meant by "relative". What is it relative to?

      We have revised all relevant Y-axis labels and figure legends to explicitly state the quantification method. For example, we now use: "Vasa<sup>+</sup> / DAPI<sup>+</sup> (%), Sycp3<sup>+</sup> / DAPI<sup>+</sup> (%) or Cyp11c1<sup>+</sup> / DAPI<sup>+</sup> (%).

      (5) Figure 1: please point out the testes in panels A and B

      We have indicated the position of the testes with arrows in Figures 1A and B.

      (6) In figure 4, it would be helpful for the WT images from S7 moved to fig 4.

      We have moved representative WT images from Fig. S7 into Fig. 4 for easier comparison with the mutant phenotypes.

      (7) Figure 4E: Are the yellow bars comparable to each other. Is there any significance to the increased luciferase with 8xGli in ptch2-/- as compared to the other genotypes?

      We thank the reviewer for this astute observation. Yes, the yellow bars are directly comparable, and the elevated basal luciferase activity of the 8xGli reporter in the ptch2<sup>-/-</sup> TSL cells is indeed significant and expected. The genetic ablation of ptch2 removes this inhibition, leading to ligand-independent, constitutive activation of the downstream signaling cascade. The observed increase in basal reporter activity in the ptch2<sup>-/-</sup> cells is a classic manifestation of this mechanism.

      The primary objective of this experiment was to test the cells' responsiveness to Dhh stimulation across genotypes. The key finding is that while wild-type and ptch1<sup>-/-</sup> cells showed a significant response to Dhh, the ptch2<sup>-/-</sup> cells-which already exhibited high basal activity-were completely unresponsive. This combination of constitutive activation and ligand insensitivity in the ptch2<sup>-/-</sup> genotype provides particularly strong genetic evidence that Ptch2 is the essential receptor mediating Dhh signal transduction in this system.

      (8) Figure 5G: please include what exactly what each construct name stands for in the figure legend

      We have expanded the legend for Fig. 5G to define each construct.

      (9) Figure S8B: please include what the values in the table are (eg are these the significance values?)

      We have updated the caption for Figure S8B (now Figure S6B): “The FPKM value for each gene in each sample is indicated within the squares. The color gradient from blue to red reflects low to high expression levels per row (gene).”

      Reviewer #3 (Significance):

      Strengths and limitations:

      The genetics of the tilapia system and the availability of the tilapia Leydig stem cell lines were particular strengths of this study. The study utilizes fish genetics to genetically interrogate the Dhh signaling pathway in Leydig cell development through generation and analysis of mutant lines. The tilapia Leydig stem cell line was an integral part of this study as it allowed for genetic and chemical manipulation of Dhh signaling in undifferentiated Leydig cells and, through transplantation into testes, allowed for analysis of how Leydig cell differentiation was affected.

      Advance:

      The study makes significant advances as to how Dhh signaling instructs Leydig cell differentiation, including identification of the Ptch receptor and Gli transcription factor that function downstream of Dhh in this process. Furthermore, they identify a direct link between Dhh signaling and Sf1 expression, which is known to important for Leydig cell function.

      Audience:

      This study will be of particular interest to reproductive biologists, endocrinologists, and developmental biologists. The study may also be of interest to researchers and physicians investigating cancers that are promoted by androgens produced by Leydig cells of the testis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      We thank the reviewer for their kind words, and have endeavored to address all of their concerns as to the structure and style of the manuscript.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail.

      Thank you for pointing this out. We have rearranged the methods in order to make the presentation more linear, and to reduce duplication with the introduction.

      Specifically, we moved the affinity definition to the start, removed the redundant bullet point list, and moved the parameter value table to the end.

      This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      This is a great point, we have either removed or replaced all references to "above" or "below" with more specific citations.

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference.

      We have clarified where various parameter values come from:

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      We thought of two different interpretions for this comment, so have worked to address both.

      First, the comment could have been that the distribution of loss functions on the training sample does not appear to be informative of performance on data-like samples. This is true, and in our revision we have emphasized the distinction between the two types of simulation sample: those for training, where each simulated GC has different (sampled) parameter values; vs the "data mimic" samples where all GCs have identical parameters. Since the former have different values for each GC, we can only plot many inferred curves together on the latter. We also would like to emphasize that the inference problem for one GC will have much more uncertainty than will that for an ensemble of GCs (as in the full replay experiment).

      “After building and training our neural network, we evaluate its performance on subsets of the training sample. While this evaluation provides an important baseline and sanity check, it is important to note that the training sample differs dramatically from real data (and the “data mimic” simulation sample that mimics real data). While real data consists of 119 GCs with identical parameters and thus response functions, we need the GCs in our training sample to span the space of all plausible parameter values. This means that while we must evaluate performance on individual GCs in the training and testing samples, in real data (and data mimic simulation) we combine results from 119 curves into a central (medoid) curve. Inference on the training sample will thus appear vastly noisier than on real data and data mimic simulation, and also cannot be plotted with all true and inferred curves together.”

      A second interpretation was that the reviewer did not have an intuitive sense of what a loss function value of, say, 1.0 actually means. To address this second interpretation, we have also added a supplement to Figure 2 with several example true and inferred response functions from the training sample, with representative loss values spanning 0.17 to 2.18. We have also added the following clarification to the caption of Figure 1-figure supplement 2:

      “The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.”

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

      We have expanded this section of the manuscript, and added a new plot directly comparing the methods.

      “In order to compare more directly to DeWitt et al. 2025, we remade their Fig.S6D, truncating to values at which affinities are actually observed in the bulk data, and using only three of the seven timepoints (11, 20, and 70, Figure 8, left). We then simulated 25 GCs with central data mimic parameters out to 70 days. For each such GC, we found the time point with mean affinity over living cells closest to each of three specific “target” affinity values (0.1, 1.0, 2.0) corresponding to the mean affinity of the bulk data at timepoints 11, 20, and 70. We then plot the effective birth rates of all living cells vs relative affinity (subtracting mean affinity) at the resulting GC-specific timepoints for all 25 GCs together Figure 8, right). Note that because each GC evolves at very different and time-dependent rates, we could not simply use the timepoints from the bulk data, since each GC slice from our simulation would then have very different mean affinity. The mean over GCs of these GC-specific chosen times is 10.9, 24.5, 44.4 (compared to the original bulk data time points 11, 20, 70). It is important to note that while the first two target affinities (0.1 and 1.0) are within the affinity ranges encountered in the extracted GC data, the third value (2.0) is far beyond them, and thus represents extrapolation to an affinity regime informed more by our underlying model than by the real data on which we fit it.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question. (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model. (3) Code and data are publicly available and well documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      Right, whoops, good point. We've rearranged the discussion to separate the concepts, for instance:

      “While affinity and fitness ceilings are separate concepts, they are closely related. An affinity ceiling is a limit to affinity for a given antigen: there are no mutations that can improve affinity beyond this level. This would result in a truncated response function, undefined beyond the affinity ceiling. A fitness ceiling, on the other hand, is an upper asymptote on the response function. Such a ceiling would result in a limit on affinity for a germinal center reaction, since once cells are well into the upper asymptote of fitness they are no longer subject to selective pressure.”

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      This is a great point, we've added a mention of this where we introduce the replay experiment in the Methods:

      “It is important to note that this is a much lower level than typical BCR repertoires, which average roughly 5-10% nucleotide shm.”

      And expanded on the explanation in the Discussion:

      “Some aspects of behavior in the low-shm/early times regime of the extracted GC data are also potentially different to those at the higher shm levels and longer times found in typical repertoires. This is especially relevant to affinity or fitness ceilings, to which we likely have little sensitivity with the current data.”

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

      Yes good point, we've added these citations in a new paragraph on between-lineage competition:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013: McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to follow the suggestions of manuscript re-organization by Reviewer 1, in order to improve readability. We would also like to suggest improving the discussion of the traveling wave model to explain it in a more self-contained way. In passing, please clarify what is meant by 'steady-state' in that model. A superficial understanding would suggest that the only steady state in that model would be a homogeneous population of antibodies with maximum affinity/fitness.

      These are great suggestions. We have substantially rearranged the text according to Reviewer 1's suggestions, especially the Methods, and expanded on and rearranged the traveling wave discussion. We've also clarified throughout that the traveling wave model is assuming steady state with respect to population. In the public response to reviewer 1 above we describe these changes in more detail.

      Reviewer #1 (Recommendations for the authors):

      I suggest that the organization of the paper be reconsidered. The current methods section is long and at times repetitive, making it impossible to parse in a single reading. Moving some technical details from the main text to an appendix could improve readability. Despite the length of the methods section, many important points, such as justification of choices in model specification or values of parameters, are treated only briefly.

      We have rearranged the methods section, particularly the discussion of our model, and have more clearly justified choices of parameter values as described in the public response.

      Discussion of similarities and differences with reference to Dewitt et al. 2025 should be revised, as it's currently unclear whether the method presented here has any advantages.

      We have expanded this comparison, and emphasized the main disadvantage of the traveling wave approach: there is no way of knowing whether by abstracting away so much biological detail it misses important effects. We have also emphasized that the two approaches use different types of data (time series vs endpoint) which are typically not simultaneously available:

      “The clear advantage of the traveling wave model is its simplicity: if its high level view is accurate enough to effectively model the relevant GC dynamics, it is far more tractable. But reproducing low-level biological detail, and making high-dimensional real data comparisons (e.g. Figure 5) to iteratively improve model fidelity, are also useful, providing direct evidence that we are correctly modeling the underlying biological processes. The two approaches also utilize different types of data: we use a single time point, and thus must reconstruct evolutionary history; whereas the traveling wave requires a series of timepoints. The availability of both types of data is a unique feature of the replay experiment, and provides us with the opportunity to directly compare the approaches.”

      The results obtained from the same data should be directly compared (can the response function be directly compared to the result in Figure S6D in Dewitt et al., 2025? If yes, it should be re-plotted here and compared/superimposed with Figures 6 and 7). The text mentions the results differ, but it remains ambiguous whether the differences are significant and what their implications are.

      We've added a new Figure 8, comparing a modified version of the traveling wave Fig S6D to a new plot derived from our results using the data mimic parameters. While the two plots represent fundamentally different quantities, they do put the results of the two methods on an approximately equal footing and we see nice concordance between them in regions with significant data (they disagree substantially for larger negative affinities). We have also added emphasis to the point that the traveling wave model uses an entirely separate dataset to what we use here.

      Other comments:

      (1) l. 80: "[in] around 10 days"?

      Text rearranged so this phrase no longer appears.

      (2) l. 96: "an intrinsic rate [given by?] the response function above".

      Text rearranged so this phrase no longer appears.

      (3) Figure 1: The. “specific model” could part be expanded and improved to help make sense of model parameters and the order of different processes in the population model. Example values of parameters can be plotted rather than loosely described, (e.g., y_h+y_c, the upper asymptotes can be plotted in place of the “yscale determines upper asymptotes” label.

      Great suggestion, we've changed the labels.

      (4) The cartoons in the other parts are somewhat cryptic or illegible due to small sizes.

      We have added text in the caption linking to the figures that are, in the figure, intended to be in schematic form only.

      “Plots from elsewhere in the manuscript are rendered in schematic form: those in “infer on data” refer to Figure 4-figure supplement 1, and those in “simulate with inferred parameters” to Figure 5.

      (5) L. 137: It's not helpful to give numerical values before the definition of affinity. (and these numbers are repeated later).

      Good point, we've moved the affinity definition to the previous section, and remove the duplicate range information.

      (6): Table 1: A number of notations are unclear, such as “#seqs/GC” or “mutability multiplier”. The double notation for crucial parameters doesn't help. At the moment the table is introduced, the columns make little sense to the reader, and it's not well specified what dictates the choice or changes of parameter values or ranges.

      We've moved the table further down until after the parameters have been introduced, and clarified the indicated names.

      (7) l. 147: Choices of model are not justified and appear arbitrary (e.g., why death events happen at one of two rate).

      We have clarified the reasoning behind having two death rates.

      (8) l.151: “happened on the edges of developing phylogenetic tree” - ambiguous: do they accumulate at cell divisions? What is a “developing tree”?

      We have removed this ambiguous phrasing.

      (9) l.161: This paragraph is particularly dense.

      We have rearranged this section of the methods, and split up this paragraph.

      (10) l. 164: All the different response functions for different event types? Or only the one for birth, as stated before?

      Yes. This has been clarified.

      (11) l.167: Does the statement in the bracket refer to a unit?

      This has been clarified.

      (12) l. 169: Discussion of the implementation seems too detailed.

      Hopefully the rearranged description is clearer, but we worry that removing the details of events selection would leave some readers confused.

      (13) l. 186: Why describe the methods that, in the end, were not used? Similarly, as a mention of “variety of response functions” seems out of place if only one choice is used throughout the paper. eq. (2): that's mˆ{-1} from eq. (1). Having the two equations using the same notation is confusing.

      We've moved the mention of alternatives to the Discussion, where it is an important source of uncontrolled systematic uncertainty, and removed the extra equation.

      (14) l. 206: Unclear what “thus” refers to.

      Removed.

      (15) l.211: What does “neglecting y_h” mean?

      This has been clarified.

      (16) l. 242: Unclear what “this” refers to.

      Clarified.

      (17) l. 261: What does “model independence” refer to in this context?

      From the sigmoid model. Clarified.

      (18) l. 306: What values for which parameters? References?

      We have clarified and updated this statement - it was out of date, corresponding to the analysis before we started fitting non-sigmoid parameters.

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      (19) l. 326: "is interpreted as having" or "corresponds to"?

      Changed.

      (20) l. 340: Not sure what "encompassing" means in this context.

      Clarified.

      (21) l. 341: "We do this..." -- I think this sentence is not grammatical.

      Fixed.

      (22) l. 348: "on simulation" -- "from simulated data"?

      Indeed.

      (23) l. 351: "top rows", the figures only have one row.

      Fixed.

      (24) Figure 2: It's difficult to tell from the loss function itself whether inference on simulated data works well. Why not report the simulated and inferred response functions? The equivalent plots in Figure 5 would also be informative. Has inference been tested for different "sigmoid parameters" values?

      This is an important point that was not clear, thanks for bringing it up. We have expanded on and emphasized the differences between these samples and the reasoning behind their different evaluation choices. Briefly, we can't display true vs inferred response functions on the training samples since the curves for each GC are different -- the plot would be entirely filled in with very different response function shapes. This is why we do actual performance evaluation on the "data mimic" samples, where all GCs have the same parameters. Summary stats (like Fig 5) for the training sample are in Fig 5 Supplement 2.

      (25) l. 354: Unclear what "this" refers to.

      Removed.

      (26) l. 355: We assume the parameters are the same?

      Yes, we assume all data GCs have the same parameters. We have added emphasis of this point.

      (27) Figure 4: Is "lambda" the fitness? Should be typeset as \lambda_i?

      Our convention is to add the subscript when evaluating fitness on individual cells, but to omit it, as here, when plotting the response function as a whole.

      (28) l. 412: "[a] carrying capacity constraint".

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) In 2 places, you state that observed affinity ranged from -37 to 3, but I assume that the lower bound should be -3.7.

      The -37 was actually correct, but we had mistakenly missed updating it when we switched to the latest (current) version of the affinity model. We have updated the values, although these don't really have any effect on the model since we only infer within bounds in which we have a lot of points:

      “Affinity is ∅ for the initial unmutated sequence, and ranges from -12.2 to 3.5 in observed sequences, with a mean median of -0.3 (0.3).

      (2). I had to look up the Vols nicker paper to understand the tree encoding: It would be nice to spend another sentence or two on it here for those who aren't familiar.

      Great point, we have added the following:

      “We encode each tree with an approach similar to Lambert et al. (2023) and Thompson et al. (2024), most closely following the compact bijective ladderized vector (CBLV) approach from Voznica et al. (2022). The CBLV method first ladderizes the tree by rotating each subtree such that, roughly speaking, longer branches end up toward the left. This does not modify the tree, but rather allows iteration over nodes in a defined, repeatable way, called inorder iteration. To generate the matrix, we traverse the ladderized tree in order, calculating a distance to associate with each node. For internal nodes, this is the distance to root, whereas for leaf nodes it is the distance to the most-recently-visited internal node (Voznica et al., 2022, Fig. 2). Distances corresponding to leaf nodes are arranged in the first row of the matrix, while those from internal nodes form the second row.”

      (3) On line 351, you refer to the "top rows of Figure 2 and Figure 3," but each only has one row in the current version. I think it should now be "left panel.".

      Fixed.

      (4) How many vertical dashed lines are in the left panel of the bottom row of Figure 7? I think it's more than one, but can't tell if it is two or three...

      Nice catch! There were actually three. We've shortened them and added a white outline to clarify overlapping lines.

      (5) Would the model be applicable to GCs with multiple naive founders of different affinities? Or would more/different parameters be needed to account for that?

      The model would be applicable, but since the time required for our simulation scales roughly with the total simulated population size, we could probably only handle competition among at most a couple of GCs. Some sort of "migration strength" parameter would be required for competition among GCs (or within one GC if we don't want to assume it's well-mixed), but that doesn't seem a terrible impediment. We've added the following:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013; McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

    1. Why do social media platforms make decisions that harm users? And why do social media platforms sometimes go down paths of self-destruction and alienating their users? Sometimes these questions can be answered by looking at the economic forces that drive decision-making on social media platforms, in particular with capitalism. So let’s start by defining capitalism. 19.1.1. Definition of Capitalism:# Capitalism is: “an economic system characterized by private or corporate ownership of capital goods, by investments that are determined by private decision, and by prices, production, and the distribution of goods that are determined mainly by competition in a free market” Merriam-Webster Dictionary In other words, capitalism is a system where: Individuals or corporations own businesses These business owners make what they want and set their own prices. They compete with other businesses to convince customers to buy their products. These business owners then hire wage laborers at predetermined rates for their work, while the owners get the excess business profits or losses. Related Terms# Here are a few more terms that are relevant to capitalism that we need to understand in order to get to the details of decision-making and strategies employed by social media companies. Shares / Stocks Shares or stocks are ownership of a percentage of a business, normally coming with getting a percentage of the profits and a percentage of power in making business decisions. Companies then have a board of directors who represent these shareholders. The board is in charge of choosing who runs the company (the CEO). They have the power to hire and fire CEOs For example: in 1985, the board of directors for Apple Computers denied Steve Jobs (co-founded Apple) the position of CEO and then they fired him completely CEOs of companies (like Mark Zuckerberg of Meta) are often both wage-laborers (they get a salary, Zuckerberg gets a tiny symbolic $1/year) and shareholders (they get a share of the profits, Zuckerberg owns 16.8%) Free Market Businesses set their own prices and customers decide what they are willing to pay, so prices go up or down as each side decides what they are willing to charge/spend (no government intervention) See supply and demand What gets made is theoretically determined by what customers want to spend their money on, with businesses competing for customers by offering better products and better prices Especially the people with the most money, both business owners and customers Monopoly “a situation where a specific person or enterprise is the only supplier of a particular thing” Monopolies are considered anti-competitive (though not necessarily anti-capitalist). Businesses can lower quality and raise prices, and customers will have to accept those prices since there are no alternatives. Cornering a market is being close enough to a monopoly to mostly set the rules (e.g., Amazon and online shopping) 19.1.2. Socialism# Let’s contrast capitalism with socialism: Socialism, in contrast is a system where: A government owns the businesses (sometimes called “government services”) A government decides what to make and what the price is the price might be free, like with public schools, public streets and highways, public playgrounds, etc. A government then may hire wage laborers at predetermined rates for their work, and the excess business profits or losses are handled by the government For example, losses are covered by taxes, and excess may pay for other government services or go directly to the people (e.g., Alaska uses its oil profits to pay people to live there). As an example, there is one Seattle City Sewer system, which is run by the Seattle government. Having many competing sewer systems could actually make a big mess of the underground pipe system. 19.1.3. Accountability in Capitalism and other systems# Let’s look at who the leaders of businesses (or services) are accountable for in capitalism and other systems. Democratic Socialism (i.e., “Socialists1”)# With socialism in a representative democracy (i.e., “democratic socialism”), the government leaders are chosen by the people through voting. And so, while the governmental leaders are in charge of what gets made, how much it costs, and who gets it, those leaders are accountable to the voters. So, in a democratic socialist government, theoretically, every voter has an equal say in business (or government service) decisions. Note, that there are limitations to the government leaders being accountable to the people their decisions affect, such as government leaders ignoring voters’ wishes, or people who can’t vote (e.g., the young, non-citizens, oppressed minorities) and therefore don’t get a say.

      I thought this assignment was interesting because it connected programming with a real-world scenario. It helped me understand how the way we design an algorithm can affect fairness and outcomes for different people. I also liked that it made us think not only about writing correct code, but also about the social impact of algorithms.

    1. As a social media user, we hope you are informed about things like: how social media works, how they influence your emotions and mental state, how your data gets used or abused, strategies in how people use social media, and how harassment and spam bots operate. We hope with this you can be a more informed user of social media, better able to participate, protect yourself, and make it a valuable experience for you and others you interact with. For example, you can hopefully recognize when someone is intentionally posting something bad or offensive (like the bad cooking videos we mentioned in the Virality chapter, or an intentionally offensive statement) in an attempt to get people to respond and spread their content. Then you can decide how you want to engage (if at all) given how they are trying to spread their content.

      I genuinely think this class overall will help me with how I engage with social media in the future. I notice faster when I am doomscrolling, and notice more if the content I am watching is trying to get a response out of me. While I don't think I can fully quit social media (namely Instagram and Twitter), I do think I can be more cognizant. However, I may go into the settings for both apps now and go through it very deeply to make sure I am not being tracked as much as usual, and turn off things like targeted ads.

    1. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1)   Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2)   Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3)   Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4)   Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5)   Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have adequately responded to all comments.

      We thank Reviewer 1 for their positive assessment of our previous round of revisions.

      Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weakness:

      The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      We thank reviewer 2 for their comments on our previous round of revisions. The statement here that “it remains somewhat dubious to use the exact estimated values as inputs to other models” suggests that we may not have been sufficiently clear on how infection duration is represented in our agent-based model (ABM) of malaria population dynamics. Because our analysis uses simulated outputs from the ABM to validate the performance of the two queuing-theory methods, we believe this point warrants clarification, which we provide below.

      When simulating with the ABM, we do not use empirical estimates of infection duration in immunologically naïve individuals from the historical clinical data as direct inputs. Instead, infection duration emerges from the within-host dynamics modeled in the ABM (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision). Briefly, each Plasmodium falciparum parasite carries approximately 50-60 var genes, each encoding a distinct variant surface antigen expressed during the blood stage of infection. Empirical evidence[1,2] indicates that these var genes are expressed largely sequentially. If a host has previously encountered the antigenic product of a given var gene and retains immunity to it, subject to waning at empirically estimated rates[3,4], the corresponding parasite subpopulation is rapidly cleared. Conversely, if the host is naïve to that gene, it takes approximately seven days for the immune system to mount an effective antibody response, resulting in a rapid decline or elimination of the expressed variant[5]. This seven-day timescale aligns with the duration of each successive parasitemia peak observed in Plasmodium falciparum infections[6,7], each arising primarily from the expression of a single var gene and occasionally from a small number of var genes.

      In our previous analyses, we therefore modeled an average expression duration of seven days per gene in naïve hosts. Specifically, the switching time to the next gene was drawn from an exponential distribution with a mean of seven days. Each var gene is represented as a linear combination of two epitopes (alleles), based on the empirical characterization of two hypervariable regions in the var tag region[8], and immunity is acquired against these alleles. Immunity to one allele of a given gene reduces its average expression duration by approximately half, whereas immunity to both alleles results in an immediate switch to another var gene within the infection. Consequently, the total duration of infection is proportional to the number of unseen alleles by the host across all var genes expressed during that infection (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision).

      Prompted by the reviewer’s comments, in this revision we additionally tested mean expression durations of 7.5 and 8 days per var gene, together with an extension of the within-host rules. These values were applied in combination with the extended within-host rules (see the next paragraph for motivation and details). Although differences among the three mean expression durations are modest at the per-gene level, when aggregated across all var genes expressed within an individual parasite, the resulting total infection duration can differ by on the order of several months. The resulting distributions of infection duration across immunologically naïve individuals and those aged 1-5 years, together with those generated under our previous simulation settings, span a range of means and variances that lies above and below, but encompasses, scenarios comparable to the historical clinical data from naïve neurosyphilis patients treated with P. falciparum malaria. We have provided example supplementary figures illustrating that the distributions of infection duration from the simulated outputs overlap with, and closely resemble, the empirical distribution from the historical clinical data (Appendix 1-Figure 27-32).

      We considered the following modification of the within-host rules. In our previous ABM simulations, we had assumed that an infection would clear only once the parasite had exhausted its entire var gene repertoire, that is, after every var gene had been expressed and recognized. However, biological evidence indicates that clearance can occur earlier for several reasons, including stochastic extinction before full repertoire exhaustion. Even if some var genes remain unexpressed, an infection can terminate due to demographic stochasticity once parasite densities fall to very low levels. This decline in parasite densities may result from non-variant-specific immune mechanisms or from cross-immunity among var genes that share sequence similarity or alleles[9,10,11], both of which can substantially reduce parasite numbers. To model the possibility of termination or clearance before full repertoire exhaustion, we implemented a simple scenario in which there is a small probability of clearing the current infection while a given var gene-whether non-final or final-is being expressed. This probability is a function of the host’s pre-existing immunity to the two epitopes (alleles) of that gene, thereby capturing in a parsimonious manner the effects of cross-immunity among sequence- or allele-sharing var genes in reducing parasitemia. Specifically, it is modeled as a Bernoulli draw whose success probability equals the immunity level against the gene (0 for no immunity to either epitope, 0.5 for immunity to one epitope, and 1 for immunity to both epitopes) multiplied by a constant factor of 0.025. Thus, the probability scales with pre-existing variant-specific immunity to the gene but remains small overall, while introducing additional variance into the emergent distribution of total infection duration across hosts.

      We acknowledge that the ABM used to simulate malaria population dynamics cannot capture all mechanisms and complexities underlying within-host processes, many of which remain poorly understood. However, we emphasize that the resulting distributions of infection duration generated by the ABM span a broad range of means, variances, and shapes, including distributions that closely match those observed in the clinical historical data. Because the queueing-theory methods rely on only the mean and variance of infection duration to estimate the force of infection (FOI), these scenarios, which collectively span and encompass values comparable to the empirical ones, provide an appropriate basis for evaluating the performance of the methods using simulated outputs. We have added supplementary figures (see Appendix 1-Figure 16-22) illustrating the corresponding FOI inference results when we allow for clearance before the complete expression of the var repertoire, and the accuracy of FOI estimation remains comparable across all the scenarios examined.

      Finally, we emphasize that the application of the queuing-theory methods to the simulated outputs and to the Ghana field survey data involve two self-contained steps. For the simulations, FOI is inferred directly from the emergent distributions of infection duration generated by the ABM. For the Ghana surveys, FOI is inferred using the historical clinical data, which remains one of the few credible and widely used empirical sources for infection duration in immunologically naïve individuals[6]. By exploring different mean expression durations and within-host rules in the ABM, which generates distributions of infection duration that span and encompass those comparable to the empirical distribution, we demonstrate that the queueing-theory methods perform comparably across diverse scenarios and are well suited for application to the Ghana field surveys.

      We expanded the section on within-host dynamics in Appendix 1 to elaborate on this point (Lines 817-854).

      Reviewer #3 (Public review):

      I think the authors gave a robust but thorough response to our reviews and made some important changes to the manuscript which certainly clarify things for me.

      We thank Reviewer 3 for their positive feedback on our previous round of revisions.

      References

      (1) Zhang, X. & Deitsch, K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections. Curr. Opin. Microbiol 70, 102231 (2022).

      (2) Deitsch, K. W. & Dzikowski, R. Variant gene expression and antigenic variation by malaria parasites. Annu. Rev. Microbiol. 71, 625–641 (2017).

      (3) Collins, W. E., Skinner, J. C. & Jeffery, G. M. Studies on the persistence of malarial antibody response. American journal of epidemiology, 87(3), 592–598 (1968).

      (4) Collins, W. E., Jeffery, G. M. & Skinner, J. C. Fluorescent Antibody Studies in Human Malaria. II. Development and Persistence of Antibodies to Plasmodium falciparum. The American journal of tropical medicine and hygiene, 13, 256–260 (1964).

      (5) Gatton, M. L., & Cheng, Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology, 128(Pt 4), 367–376 (2004).

      (6) Maire, N., Smith, T., Ross, A., Owusu-Agyei, S., Dietz, K., & Molineaux, L. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. The American journal of tropical medicine and hygiene, 75(2 Suppl), 19–31 (2006).

      (7) Chen D. S., Barry A. E., Leliwa-Sytek A., Smith T-A., Peterson I., Brown S. M., et al. A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa. PLoS ONE 6(2): e16629 (2011).

      (8) Larremore D. B., Clauset A., & Buckee C. O. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Comput Biol 9(10): e1003268 (2013).

      (9) Holding T. & Recker M. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum. J. R. Soc. Interface.1220150848 (2015).

      (10) Crompton, P. D., Moebius, J., Portugal, S., Waisberg, M., Hart, G., Garver, L. S., Miller, L. H., Barillas-Mury, C., & Pierce, S. K. Malaria immunity in man and mosquito: insights into unsolved mysteries of a deadly infectious disease. Annual review of immunology, 32, 157–187 (2014).

      (11) Langhorne, J., Ndungu, F., Sponaas, AM. et al. Immunity to malaria: more questions than answers. Nat Immunol 9, 725–732 (2008).

    1. Author response:

      We thank the three reviewers for their critical and in-depth assessment of our study. Below you find our comments to the public reviews and our revision plans.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

      We thank reviewer#1 for this positive assessment of our work. The deviation of the AF prediction from the experimental structure is , in our view, not puzzling. AF does not take the physical properties of proteins into account, but predicts structures based on strong sequence alignments. It therefore does not have “knowledge” about the general flexibility of domains such as the CCD, which is also observed in the corresponding MmpL5 structures, nor does it have knowledge about preferred conformational states. Rather than “failing” to confirm the AF predictions, our cryo-EM structure revealed an unexpected tilted conformation of the CCD. As we outline in comments below, the physiological relevance of the tilted CCD is unclear. Its flexibility might be required to interact with (still elusive) outer membrane protein components to form the fully assembled efflux machinery.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      We thank reviewer#2 for this positive assessment of our work and agree that it is interesting that the experimental structures do not fully agree with the AF predictions (see also comment to reviewer#1).

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      We thank the reviewer for this valuable comment. We will add a new figure with the model of the MmpL4 transport cycle based on our new data and discuss the proposed molecular transport mechanism in more detail in the main.

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      In the revised version, we will include additional data to assess the functional consequences of cross-linking.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      We state clearly in the discussion that the channel through the CCD seems too narrow to let large molecules like mycobactin and bedaquiline pass:[AG1]

      Line 318ff: “ The channel radius of the MmpL4 CCD is very narrow with a minimum of 1.1 Å according to the AlphaFold3 predition (Fig. 5). This is much smaller than the smallest axis of a molecular model of mycobactin molecule of ?? nm as determined from a model of iron-free mycobactin. In addition, the cryo-EM structure of MSMEG_1382 revealed a constriction in the CCD channel [21]. Even though the methionine side chains lining the channel wall are considered to be flexible{Aledo, 2019 #69594}, large conformational changes of the α-helical hairpins relative to each other would be required to allow passage of molecules as large as mycobactin and bedaquiline. The AcrAB-TolC efflux machinery provides an example for such large conformational changes to enable transport of large molecules by iris-like opening and closing movements the outer membrane channel-tunnel TolC [33]. Similar helical twisting may widen the channel of the CCD. Alternatively, it is conceivable that the substrates of MmpL4/MmpL5 are transported along the CCD surface, potentially requiring further protein partners. It is interesting to note that siderophore secretion and drug efflux by MmpL4/MmpL5 systems involves at least two additional proteins, namely the periplasmic protein Rv0455, which was shown to be essential for mycobactin efflux [34] and an outer membrane channel, whose identity remains elusive. A complete molecular understanding of the transport mechanism through the MmpL4/MmpL5 systems hence requires the identification of the missing components and structural information about their interactions.”

      The channel radius of the MmpL4 CCD is very narrow (minimum of 1.1 Å) according to the AlphaFold3 prediction (Fig. 5), and the cryo-EM [AG2] [MN3] structure of MSMEG_1382 revealed a further constriction in the CCD channel [21]. We therefore consider direct substrate transport through the CCD central channel to be physically implausible for molecules of the size of mycobactin and bedaquiline. Even accounting for the flexibility of the methionine side chains lining the channel wall, the large conformational changes of the α-helical hairpins relative to each other would be required to accommodate such large substrates. While iris-like opening movements have been described for TolC in the AcrAB-TolC system [33], those movements widen an already substantially larger channel, and even such dramatic conformational changes would be insufficient to open a channel as narrow as that of the MmpL4 CCD to a diameter permissive for substrate passage. We instead favor a model in which substrates are transported along the outer surface of the CCD, potentially with the assistance of additional protein partners. This is consistent with the observation that MmpL4/MmpL5-mediated siderophore secretion and drug efflux involves at least two further proteins: the periplasmic protein Rv0455, shown to be essential for mycobactin efflux [34], and an as-yet-unidentified outer membrane channel. In this context, the overall flexibility of the CCD - illustrated here by the tilted, incompletely formed conformation - may reflect the conformational dynamics required for interaction with these partner proteins, rather than being directly involved in forming a transport conduit. A complete mechanistic understanding will require identification of the missing components and structural characterization of the fully assembled efflux machinery.

      We do not think that the incompletely formed CCD represents a conformation that is relevant for transport. But it is a demonstration of the overall flexibility of the CCD, which may be required to further open the channel in case the substrates are transported within the CCD tube. Further in-depth experiments will be needed to clarify this interesting question, which is beyond the scope of this paper.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

      This is a great suggestion. We will include a 3D variability analysisin the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)<sub>3</sub>-(MmpL4) )<sub>3</sub> complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      We thank reviewer#3 for these useful comments, which we will address during the revision of the manuscript. In particular, we plan to include a scheme of an updated transport model.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

      We believe that there was a misunderstanding about our interpretation of the tilted CCD. As a matter of fact, it must be a stable intermediate, otherwise no density would have been observed for it in the cryo-EM maps. Despite being a stable intermediate, it is indeed unlikely that it represents a conformational state that is relevant/required for transport. Firstly, only the upright, complete CCD can bridge the periplasm. because . Secondly, the structure was determined in detergent and lacks additional protein binder partners, which might stabilize the upright conformation of the CCD . It is also conceivable, as the reviewer pointed out, that disulfide cross-linking may have caused the tilt. However, as we wrote in the manuscript, we do not think that cross-linking caused this striking asymmetry of the CCD, because the three MmpL4 and MmpS4 chains are basically symmetrical in the C1-processed data (see also Figure 2E):

      Line 182 ff: “To assess whether there are asymmetries in other parts of the structure, we superimposed the individual protomers of the (MmpS4)3-(MmpL4)3 complex analyzed using C1 symmetry (Fig. 2E). Apart from the two resolved α-helical hairpins, the MmpL4 core domains and the resolved parts of MmpS4 differ by a RMSD of less than 0.6 Å and are therefore structurally identical considering the map resolution of around 3 Å. The fact that the core domains of MmpS4 and MmpL4 do not deviate between the protomers argues against the possibility that the cross-links established between them cause the (asymmetric) tilt of the CCD.”

      Regarding the DDM binding site, we will indeed include an updated transport model. That said, we wish to be cautious, because we lack experimental proof that MmpL4 can in fact transport DDM.

    1. Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits are regulating behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting and in fact more fundamental than showing if it is serotonin that does it or not.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      Comments on the latest version:

      The changes to the manuscript sufficiently addressed my few comments. I do not have anything else substantial to add to my review and I am comfortable with my initial assessment.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      We have added further discussion of this to the discussion section.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We have added a figure (1F to better visualize the changes in handedness over days). We have also pointed out the connection between the power spectrum and the autoregressive model given by the Wiener-Khinchen theorem (which states that the autocorrelation function of a wide-sense stationary process has a spectral decomposition of its power spectrum).

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      We have discussed this further in the discussion.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We have adjusted our wording and contextualized our claims based on previous literature.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      We have reanalyzed the behavioral data in a hierarchical model to account for batch effects. Accounting for batch effects (Fig 1G, S1G) we still observe differences between genotypes and for pharmaceutical manipulations of serotonin, though our data provides more equivocal evidence for the effects of trh<sup>n</sup> on drift.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We have added text indicating that these two behavioral responses have previously been shown to be correlated to each other and that the spectral power analysis and autoregressive model are conceptually linked.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We have added a table in the supplemental clarifying all of the parameters of modeling for each figure.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Highlights of the Consultation Session of 3 Reviewers

      In the consultation session, the reviewers discussed as particularly important the relative contribution of genotype and variable environment. Further analyses of the replicates of the genotypes were suggested to exclude the environment as the source of difference in the extent of drift between genotypes. If the difference in the extent of drift between replicates is greater than the difference in the extent of drift between genotypes, then one cannot really say that there is a genetic control over drift and that it would evolve (which is still an interesting result, but would be less exciting for a follow-up evolution experiment). If replicates differ, testing whether the relative difference in the extent of drift between genotypes is maintained across environments would also be strong evidence that the extent of behavioral drift is a property of a genotype and not a mere result of a fluctuating/variable environment. The authors do present two behavior paradigms that can serve the purpose of comparing the relative extent of drift between genotypes across the two paradigms that they already have. The authors might consider whether experimental data could be brought closer to theory by including an experiment in a variable environment (e.g temp or diet changes etc.).

      Reviewers also agreed in the consultation session that methods and definitions were somewhat cryptic, and it would be very helpful if they were more detailed. For example, linking the free walking analysis to the Ymaze and then the model1 to the model2 was not straightforward.

      We have added text to make more explicit the theoretical connection between the freewalking analysis, the ymaze analysis, and the model. We have added text and a supplemental table to clarify the methods.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 161: The authors state in the supplement that they used DGRP strains, which are inbred and not isogenic. According to the original authors, they possess 99.3% genetic identity. The isoD1 strain has no known crossing scheme, so complete chromosome isogeneity remains questionable, especially after 12 or more years since its creation. The authors should refer to the strains as "near-isogenic" or a similar term.

      We have adjusted the language as suggested to be more accurate.

      (2) Lines 276, 338: The manuscript contains some unfinished sentences or remnants from the drafting process (e.g., "REFREF"). A thorough editorial review is recommended to eliminate such errors.

      We have cleaned up all references and made additional passes to adjust text.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors want to claim that serotonin is a regulator of drift, they should provide a negative control experiment, using equivalent perturbations of another neuromodulator and non-modulator. Alternatively, they could simply soften the claims revolving around serotonin and its putative direct role in modulating drift.

      We have softened the claims as suggested to avoid claiming our results show a specific role for serotonin.

      (2) I would suggest always using "behavioral drift" when referring to drift, especially in the context of modeling, because it can be easily confused with genetic drift and cause confusion when reading.

      We have adjusted the language throughout the manuscript per this suggestion.

      (3) It would be good to see in the methods if the 2-hour assays were always done at the same time of the fly's subjective day and when (e.g. how many hours after lights on).

      We have clarified this.

      (4) I understand that many experiments use methodology replicated from the group's previous work, but I would recommend elaborating the experimental methods section in the supplementary such that the reader can understand and reproduce the methods without having to sift through and look for them in previous papers.

      We have expanded on our discussion of the methodology in the methods section.

      Reviewer #3 (Recommendations for the authors):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales. However, it's unclear why the authors chose to switch to a different assay to compare strains. In particular, it's ambiguous whether the behavioral measure in one setup is comparable to that in the other; specifically, whether a bias in one setup reflects the same type of bias in the other. The behavior is also sampled differently across setups (though the details are unclear; see comments below) and analyzed using different methods. Consequently, it remains uncertain whether the slow fluctuations observed in the arena setup are also present in the Y maze. It appears that the analysis of the Y maze data only addresses individual behavioral variance or, at most, day-to-day changes, without accounting for longer-term correlations in bias-which I understood to be the primary interest in the arena setup. Some clarification is needed here (see specific comments below).

      In Figure 2, the authors attempt to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous. This approach is well-conceived, and the findings are convincing, though the model would benefit from further clarification and additional explanation in the text.

      Here are some more specific comments:

      PART 1:

      (1) L 223 one probably cannot see a circadian peak at 24h if the data were filtered at 24h, did they look with another low pass cutoff?

      We clarified in the text that the power spectrum analysis was performed on unfiltered data.

      (2) L 243 the spread in standard deviation is said to be consistent with drifting bias, however, I do not agree with this. The variation could be stochastic but independent across days, and show no temporal correlation. As done with the circular arena, a drift should be estimated as a temporal correlation in the behavior.

      It is consistent insofar as seeing a non-zero standard deviation is a necessary condition for drift. While it does not show that there is any consistency over time, this can be inferred from the autoregressive model (as well as previous work). We have added text to make this clearer.

      (3) In the autoregressive model this temporal aspect seems to be incorporated only to the first order (from day to day). Therefore, from what I understand, the drift term is not correlated over time. This seems very different from the spectral analysis done in the circular assay, and I wonder if it fits at all the initial definition of drift. For example, is the model compatible with a fixed mean and a similar power spectrum as in Figure 1C? The text should clarify that.

      can be made clear in the case of σ = 0 and ϕ = 1, where values wouldϕ ≠ be0 In an AR(1) process, datapoints day to day are correlated as long as . This perfectly correlated with each other across time. The AR(1) model and the PSD of circling can be related via the Wiener-Khinchin theorem. We have added text to make this connection clear.

      (4) Did serotonin have no role in turning bias? My understanding of previous work was that serotonin should affect the bet-hedg variance as well - the authors should discuss what is expected or not, especially given that the pharmacological and genetic approaches do not have the same effect on bet-edging (Figure 1H-I).

      As the pharmacological methods were only applied after eclosion, we do not find it surprising that we do not measure differences in the initially measured distribution of handedness in that case. We do see more evidence of it in the mutations, though the trh<sup>n</sup> experiments provide a less clear effect after our adjustments to account for batch effects.

      (5) Methods: It is unclear how flies were handled across days; e.g. in Y mazes: 2h each day for how many days? In the arena flies were imaged either twice daily for 2h per session, or continuously for 24h (L138) - but which data are used where?

      We will make this more clear, but all data in figure 1 was the continuous 24h data

      This part of the methods is not well explained and I think it should be described in more detail.

      (6) How many flies per genotype were tested in fig 1E?

      Information was added to the caption to duplicate information in the table.

      PART 2:

      (7) In Figure 2B I do not understand the formulation N(50−ϕ: 50, σ), N(phi-et: et, σ) or in general N(x: m, s): does this mean that the variable x has normal distribution with mean m and variance s? Usually this would be written as N(x|m, s) or N(x; m, s)

      If so then: N(50−ϕ: 50, σ) = N(ϕ: 0, σ) which has mean=0 while the figure caption says "from a normal distribution centred on the long term environmental mean" - what is the long term environmental mean?

      If this is correct, and, therefore, we are just centering the mean, what about N(et-phi: et, σ)?

      Et is the environment at the time, not the mean of the environment (which is 50). We have added more detail in supplementary methods to address this.

      (8) Should ϕ vary between 1-100? And is the environmental parameter in Figure 2C also varying between 1-100? These ranges should be written somewhere.

      While implied in the sigma notation, we have added more detail in supplementary methods to explain the situation.

      (9) As far as I understand the bounding envelope in Figure 2B is necessary to contain the drift model. In Figure 1F, a bounding effect was generated by the "tendency to revert to no bias." It is unclear to me whether these two formulations are equivalent. Moreover, none of these two models might be able to recapitulate the correlations observed in the circular arena and analyzed spectrally in Figure 1C. It would be necessary that the author make an effort to relate these models/quantifications one to another. My understanding of Figure 1B is that there are slow fluctuations around the mean. Is the bounded drift model in 2B not returning to the same mean? And do these models generate slow fluctuations? Further explanation could help clarify these points.

      We have added additional explanation to explain the connection between the power spectrum and the two methods of (phi and bounding envelop) of establishing stationarity.

      (10) Expanding on the above: I thought that the definition of individuality is based on some degree of stability over days. However, both models assume drift to occur from day to day (and also the analysis of the DGRP lines assumes so). Some clarification here could help: is the initial bet-edging variation maintained in the population? And is the mean individual bias still a thing or it is just drifting away all the time?

      The initial bet-hedging is maintained to some degree, based on the parameter of phi and the bounding envelope. We have added text to make this clearer.

      (11) In both Figures 2C and 2E the populations are always shrinking, is that correct? And if so, is it expected? Does the model allow growth in a constant environment?

      As the plotted values are the log, the optimal environments do allow growth (visible more clearly in 2D). We have added some text to make this clearer.

      (12) Growth is quantified only across 100 days (Figure 2D) but at day 100 there is not something like a steady state, how is 100 chosen? Would it make sense to check longer times to see if the system eventually takes off? And if not, why?

      (13) Related to the above: what is the growth range achieved in Figure 3A-B? Is the heatmap normalized to the same value across conditions? I think it would be important to consider the absolute range of variation of growth or at least the upper value across conditions.

      Moreover: is growth quantified at day 100? What happens at longer times? Does the temporal profile of the growth curve differ across environmental conditions? (I'm referring to a Figure as 2D).

      As we are plotting the log change, we are ultimately showing the growth rate. While a more realistic model would involve carrying capacity, we believe a simplified model showing growth or no growth captures the difference in growth rate between different strategies. We have added some text to make this clearer.

      (14) Suddenly at line 502, sexual maturity is introduced as a parameter, which was never mentioned before, called a_min in the figure legend of panel 3a, but it is unclear where this is in the model. And please also clarify if sex maturity is the same as generation time.

      Sexual maturity is the same as generation time, we have standardized terminology throughout the paper.

      (15) Regarding lines 505-508, could one simply conclude that in this model formulation, the generation time has the effect of a low pass filter on environmental fluctuation? The question is: is this filtering effect the only effect of generation time?

      While this seems to capture the high-frequency effect we see, it does not explain the shift from bet-hedging->drift we see at lower-frequency environmental fluctuations.

      (16) What reproductive rate is used for the PCA analysis? Is the variance associated with the drift so low because of choosing a fast reproductive rate? A comment in the main text would be helpful.

      We have clarified that these plots were done at 10 days.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Most importantly, in accordance with questions raised by Reviewer 1, we now include a detailed comparison of the cell type frequencies between the two examined time points as well as comparison of the pseudotimes along those lineages. This is detailed in the new section “Many cell types are shared between day 8 and day 16 EBs” and illustrated in Supplementary Figure 6c and Supplementary Figures 7-8.

      Besides this new chapter and its accompanying methods part, we mainly edited the language and to clarify methods and assumptions according to the Reviewer suggestions.

      The main concern of Reviewer 2 was our use of the liftoff gene annotation. We explained our reasoning for this choice extensively in our public response to the Reviewer, but did not incorporate this into our manuscript because even though this is an important subject it is not within the main scope of our paper.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample. Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C). For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability. Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32, that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4C, D. The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4.

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3. We will add a better description in the revised version.

      Reference

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1B: the orangutan tubulin stain looks a bit unusual - just confirming that this is indeed the right image the authors want to include here.

      We agree, this unfortunately also reflects the findings from the scRNA-seq analysis in that we found hardly any cells that we would classify as proper neurons.

      (2) Typo on line 90: 'loosing' should be 'losing'.

      Fixed

      (3) Line 118: why do the authors believe that using singleR will give better results than MetaNeighbour? This certainly seems supported by the data in S4 and S5, but the reasoning is not clear.

      We think that this might depend on the signal to noise ratio, which is a property specific to each dataset. Here we just wanted to state that our approach seems to work better for our developmental data, but we didn’t test out other data and thus cannot generalize.

      (4) Figure 2B: there are some coloured lines on the first filled black bar from the left - do they mean anything? I couldn't work it out from looking at the figure.

      Indeed this is a bit misleading the colors on the left represent the species identity: this was to illustrate the mixing of the of species for each cell type: The legend reads now: “Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right.”

      (5) Figure 3: I did not understand how the seven bins of the cell type specificity metric were derived until much later - it is just the number of cell types in which a gene is expressed, yes? Might be worth making this clearer earlier in the text.

      We made this more explicit in the legend. “Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity).”

      (6) It would be great to provide a bit more thorough documentation for the shiny app, so it can serve as a stand-alone resource and not require going back and forth with the paper to make sure one knows what one is doing at every point.

      Agree, this would be a good idea. We are on it.

      (7) Line 477: I think this is unclear - the authors retain over 11000 cells per species but then set the maximum number of cells in a cluster for pairwise comparison to 250... which is a lot fewer. What happens to all the other cells? This probably needs some rewriting to clarify it.

      We did this to minimize the power differences due to cell numbers and thus make the results more comparable across species. We added this explanation to the methods section for Marker gene detection.

      Reviewer #2 (Recommendations for the authors):

      How was the clustering resolution (0.1) determined?

      This resolution was only used for the initial rough check up of the germ layers as reported in Figure 1 and Supplementary Figures S3. We chose this resolution because it yielded roughly the same number of clusters as the number of cell types that we got from classification with the Rhodes et al data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides evidence that cerebellar projections to the thalamus are required for learning and execution of motor skills in the accelerating rotarod task. This important study adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The data presentation is generally sound, especially the main observations, with some limitations in describing the statistical methods and a lack of support for two separate cerebello-thalamic pathways, which is incomplete in supporting the overall claim.

      We completed the MS by adding a double retrograde labelling study showing that the two pathways have limited overlap and by addressing the other concerns.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      We thank the reviewer for pointing out this weakness of description. The description of the Methods has thus been expanded and better justified in the “Quantification and statistical analysis” section.

      We agree with the reviewer that comparison between Deming regressions would be fragile due to the weakness of these regression in treatment groups (while they are quite robust for control groups) and they are not included in the MS, although Deming regression coefficients with their confidence intervals are now provided for all groups in the statistical tables. As now more clearly explained in the Methods, the comparisons between groups are based on the distribution of residuals around regressions of the control regression lines. If we understand correctly the reviewer’s request, the control groups are all included.

      (2) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals from the DCN but for the output channels of the basal ganglia and cerebellum: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018).” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). Hintzen et al. have indeed performed an extensive review indicating the limited overlap between cerebellar- and basal ganglia-recipient territories. The sentence has been corrected to clarify what the “They” referred to.

      The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei? how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      (3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      The recordings were not extended to the wash period, but examination of the firing rate before CNO on successive days did not evidence major changes in the population firing rate (this is now shown in a new supplementary figure 6).

      (4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      Since reference to these time windows is repeatedly used in the text we have shifted to “Early” and “Late” phase terminology.

      (5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task." I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This has been corrected to: “suggesting the cerebellar contribution to the consolidation of the task is critical early in the learning process and cannot be easily reinstated later”

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation in the accelerating version). Indeed, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group, while there was no measurable effect on the CN-CL group, which actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast, the CN-VAL group only showed significantly lower performance on day 4 consistent with intact learning abilities. Yet, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while in average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s). Overall, we focused our argument on the first days of learning where the differences between the groups are more pronounced. We clarified the discussion (section “A specific impact on learning of CL-projecting CN neurons.”)

      Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel. The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      While we agree that after 3-4 days of learning the difference between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible.

      Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) Cerebellothalamic connections are important for learning motor skills

      (2) Cerebellar efferents specifically to the central lateral (CL) thalamus are important for shortterm learning

      (3) Cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) That once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is now better acknowledged in the discussion in the section “A specific impact on learning of CL-projecting CN neurons.” However, we want to underline that the strongest deficit in learning is found in animals with CN->CL inhibition which latency to fall saturates at about 100s on the rotarod; this indicates that mice fall as soon as the accelerating rotarod speed reaches about 16rpm. In fixed speed rotarod, the inhibition of CN->CL neurons shows not even a trend of difference at 15rpm with control mice, and the animals run 2 minutes without falling at this speed. This makes us confident that the CN->CL pathway interfers more with the learning than with the actual locomotor function on the rotarod.

      (2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      This issue is treated in the discussion. (see also replies to reviewers 1 and 2 above). We added experiments with simultaneous retro-AAV infections in CL and VAL and the data are presented in Supplementary Figure 5. We found that retrograde infection targeted different populations of CN neurons; although collaterals in both CL and VAL may be present for (some of) these two populations of neurons, they are likely strongly biased toward one or the other thalamic regions, explaining the differential retrograde labelling in the CN. We hope these experiments will answer the reviewer’ s concern.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Multiple studies have reported on the effect of cerebellar nuclei (CN) manipulation on locomotion. Here the authors perform several controls and careful analysis to rule out gross motor deficits caused by DREADD-mediated CN silencing. As the authors point out in the discussion, part of the difference from prior studies could be the mild degree of inhibition here. However, it is possible that the CN inhibition here induces a subtle motor deficit and the accelerating rotarod task is challenging and more readily reveals this motor deficit, rather than a deficit in motor learning per se. Two pieces of data seem to suggest this:

      (a) under CN inhibition during the task (Figure 1i), mice could never achieve the level of performance as mice under CN inhibition after the task, even after several days of training, which suggests the CN inhibition is interfering with task performance;

      (b) in highly trained mice (after learning), applying the CN inhibition impaired performance to a similar extend as mice in Figure 1i (Figure 4).

      Can the authors rule out the possibility that CN inhibition during the task is impairing motor execution rather than motor learning?

      We do not rule out a contribution of impaired motor coordination at the highest speed (last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”). Indeed, most of our argument in favor of deficit in learning is primarily in the first days (Early phase), particularly for the CN->CL CNO group (Fig 3h). A crucial control in our work is the use of fixed speed rotarod, where no deficit is observed. The difference between the fixed and accelerating rotarod is rather minimal since the acceleration of the rotarod is rather small (0.12rpm/s for speed up to >20 rpm).

      Interpreting the effect of treatment reversal is challenging. If the only effect of CNO was a motor deficit, the animals who learned under CNO should rapidly regain higher performance under saline, which is not observed. When switching from CNO to Saline after 7 days of training, it is difficult to disentangle which part is due to a crude motor deficit (which would not show in fixed speed rotarod), and which part is due to an unability to resume motor learning after the task has been (mis-)consolidated.

      (2) The separation of the cerebellar pathways to the intralaminar thalamus (IL) and ventral thalamus (VAL) is not clear to me. It is not clear the CN neurons projecting to these nuclei are distinct. In addition, although IL projects to the striatum and VAL does not, both IL and VAL project to motor cortex. It is unclear to what extent these pathways can be separated. The argument for distinct pathways (as laid out in the discussion) is the distinct behavior deficits when manipulating these two pathways, but this difference seems subtle (point 3).

      We now clarify that CN populations are different help to retrograde labelling experiments (new Suppl Fig 5). A discussion on the differences in IL and VAL projections is now discussed in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.” Briefly, we argue that the despite some overlap of their targets, the profiles of the CL and VAL differ substantially.

      (3) The pattern of behavioral deficits induced by CN->CL and CN->VAL neurons appear similar in Figure 3b-c and e-f. I have difficulty seeing how these data lead to the differences in the regression fits in panels 3g-k, which seem to show distinct patterns of performance change within and across sessions. One notable difference in Figure 3b-c and e-f seems to be that CN->VAL CNO treated mice exhibit lower performance on the very first trial for most days. Somehow, this pattern is present even after the CNO treatment is switched to saline (Figure 3f). I wonder if this data point is driving the difference. One control analysis the authors could do is to exclude the 1st trial and test if the effects are preserved.

      Since the learning is cumulative and involves varying degree of consolidation it is indeed difficult to substantiate the difference from the average performance: a performance on day 3 may be limited by slow learning and perfect consolidation or good learning and imperfect consolidation. That is why we designed an analysis which takes into account the observed relationships between initial performance, within session gain of performance and acrosssession carry-over of this gain of performance (Fig 2). This analysis focuses on the first days of learning, before the performance plateau is reached in the CNO groups. While a clear deficit in consolidation is observed with full CN inhibition, this is not the case for the CN→CL CNO groups, despite their weaker performance after 3 days, similar to that seen with full CN inhibition. In contrast, normal learning is observed in the CN→VAL CNO group during these three days. The consolidation deficit in the CN→VAL CNO group is more subtle than in the CN CNO group and is indeed largely driven by the first data point. This is consistent with the idea that CN→VAL inhibition only partially impairs consolidation (compared to full CN inhibition), leaving some “savings” that allow rapid reacquisition.

      (4) The quantification of locomotion in Figure S2 needs more information. What is linear movement? What is sigma? What is the alternation coefficient? These are not defined in the legends or the Methods as far as I can tell. Related to point 1 above, the authors should provide some analysis of the stride length and hindlimb to forelimb distance as measures of locomotion execution.

      These measures were taken from Simon J Neurosci 2004 24(8):1987-1995 which is now cited and their description is now provided in the Methods.

      Minor:

      (5) To help readers follow the logic of experimental design, please explain why CNO was switched to saline after day 4 in Figures 1j, 3c, and f. Specifically, is the saline manipulation meant to test something as opposed to applying CNO throughout the entire course of the behavioral test?

      Since we had no difference between the groups at the end of the Early phase, we decided to test whether the skill consolidated under CNO remained available when the CNO was removed (and it indeed was). This is now more clearly stated in the Results.

      (6) I have difficulty understanding what is plotted in Figure 4b and d. The legend says the change in performance is calculated the same way as in Figure 2a, so the changes are presumably the regression slopes. But how are the regression slopes calculated for daily start (1st trial) and daily end (last trial)?

      Skill level at the beginning and end of each trial correspond to the values of the regression line for abscissae values of trial 1 and trial 7 (green points). This has been added to the figure legend.

      (7) Do CN-CL and CN-VAL neurons also project to other brain regions besides the thalamus? Might these pathways also contribute to learning and consolidation of the accelerating rotarod task? Please discuss.

      This is now discussed in more detail in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”

      Reviewer #3 (Recommendations for the authors):

      (1) Please check the anatomic evidence for the strict dichotomy between intralaminar (specifically central lateral nucleus) nuclei projecting to the striatum and the ventral-anteriorlateral (VAL) complex projecting to the cortex. For example, while the Chen et al paper shows that there are cerebellar-intralaminar-striatal projections, it does not exclude intralaminar cortex projections, which have at least been demonstrated in rats. Similarly, VAL has projections to striatum (see, e.g., Smith et al, "The thalamostriatal system in normal and diseased states", Frontiers in Systems Neuroscience, 2014). It may be that some of these projections are stronger, but I don't think it's true that these pathways are as well-separated as the authors suggest. I also don't think this changes the fundamental conclusions but is important for potential mechanisms by which differential learning could occur and necessitate modification of Figure 5.

      We have toned down the interpretation of CL and VAL relaying specifically to different brain structures and mostly put forward the duality of the pathways. The connections with the cortex are now discussed at the end of the section “A specific impact on learning of CL-projecting CN neurons.”

      (2) Please provide more details on the spike sorting. By what metrics were single units declared to be well-separated? How many units were identified under each condition? What was the distribution of firing rates with and without CNO treatment? Are the units shown in panel 1f from before and after CNO as in panel E or are just 2 examples of isolated units? The units by themselves are not very helpful to the reader. Showing sample auto and/or crosscorrelograms for units recorded on the same electrode would be more helpful to show how well-isolated the units are.

      Single units were considered well-isolated based on quantitative quality metrics computed after MountainSort 4 spike sorting (Phyton 3.8). Units were required to have a signal-to-noise ratio (SNR) greater than 5, inter-spike interval (ISI) violations less than 1%, an amplitude cutoff below 0.1, a presence ratio above 0.9, a firing rate greater than 0.1 Hz, and at least 50 detected spikes. In addition, units were assessed for temporal stability across the recording using autocorrelograms and presence over the recording, ensuring there were no prolonged periods of total inactivity. Units meeting these criteria were deemed well-separated and reliable for further analysis. This has been added to the Methods.

      Cell numbers are provided with the statistics in the supplementary table for fig panel 1g. Panels are from the same unit before and after CNO. Example of auto- crosscorr- are provided in the new Supplementary Figure 6.

      (3) Panel 2g - "firing rate modulation" is unclear. I think the authors are showing the mean firing rate with DREADD+CNO treatment divided by the mean firing rate in the pre-CNO condition for the same group (I couldn't find that in the Methods, my apologies if I missed it)? However, firing rate modulation to me means variability in firing rate within a recording. Perhaps "relative firing rate" or "% pre-CNO firing rate" would be clearer?

      The definition has been added to the Method and the axis has been changed to ‘Change in FR induced by SAL/CNO’

      (4) Figure 3f - why does consolidation appear to be impaired after the transition from CNO to saline between sessions, when in panel 1j suppressing the CN does not have a similar effect once CNO is switched to saline? Could this be driven by a small number of mice? Since a central conclusion of the paper is that CN-VAL connections are uniquely important for posttraining consolidation, this discrepancy is important to explain - if the results post-saline are spurious, how do we know that the results post-CNO aren't also spurious? Panels similar to Figure 4b and d showing all the data from the last/first trial of each session I think would be convincing.

      Our results overall indicate that the overnight consolidation of the improvement in performance seem only effective in the early phase (as pointed out on the summary figure 5). We do not believe then that the saline results are spurious.

      It can be seen indeed in the control groups of the figure 1; to make this more visible, we plot in Author response image 1 the difference between trial 7 and trial 1 the next day. An overnight drop in performance becomes visible in the late phase.

      Author response image 1.

      The decrement on the first trial in the first 3 days is visible for the majority of the mice. The plot asked by the reviewer is represented in the Author response image 2.

      Author response image 2.

      Minor points:

      (5) In panel 1a, the solid yellow line obscures a lot of the image and I don't think adds anything.

      We assume this was referring to a line on fig1d, which has been removed.

      (6) Panel 2a - color selection could present problems for those with red-green color blindness.

      This has been fixed.

      (7) Supplementary Figure 3 - what are the arrows and arrowheads indicating?

      These have been removed.

      (8) In the Discussion: "Studies of cerebellar synaptic plasticity provide clearly support the involvement of cerebellum in rotarod learning..." Delete the word "provide"

      This has been fixed

      (9) "This indicates that either the distinct functional roles of VAL-projecting or CLprojecting." The second "of" should be "or", I think.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewer for the thorough and constructive evaluation of our manuscript. We greatly appreciate the recognition of our work's strengths, particularly the integration of experiments and mathematical modeling, the stochastic framework for describing sloughing events, and the insights into pressure-driven detachment dynamics.

      We have carefully considered each point raised and provide detailed responses below. In response to the reviewer's comments, we have revised the Methods section to better clarify our approach to three-dimensional assessment. We believe these revisions have improved the clarity of the manuscript.

      Below, we address each of the specific concerns raised by the reviewer:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:<br /> The study achieves its primary goal of integrating experiments and modeling to understand the coupling between flow and biofilm growth and detachment in a microfluidic channel, but it should have highlighted the weaknesses of the methods. I list the ones that, in my opinion, are the main ones:

      The study does not consider biofilm porosity, which could significantly affect the flow and forces exerted on the biofilm. Porosity could impact the boundary conditions, such as the no-slip condition, which should be validated experimentally.

      Porosity is indeed a key component of biofilm structures, resulting from the polymeric nature of the EPS matrix, mechanical forces, and biological processes such as cell death or predation. When considering flow-biofilm interactions, this porosity may allow fluid flow through the biofilm, with reported permeability values spanning an extremely broad range from 1015 to 10-7 m2 (Kurz et al., 2023).

      However, we argue that biofilm permeability is not the primary driver in our system:

      (1) In microscopy visualization, our biofilms form dense structures where flow around the biofilm through narrow channels dominates over flow through the porous biofilm matrix.

      (2) We performed microrheology experiments in these biofilms by imaging the Brownian motion of nanoparticles in the biofilm. Their trajectories indicate that, in our conditions, the viscoelastic flow of the biofilm itself largely dominates over the flow of culture medium through the biofilm matrix.

      (3) We argue that the extreme variability in reported permeability values (spanning several orders of magnitude, Kurz et al., 2023) reflects not only differences in experimental systems, but also fundamental challenges in defining and measuring permeability for viscoelastoplastic biofilms (the biofilm itself is actually flowing). Given this uncertainty, incorporating permeability into our model would introduce parameters that cannot be reliably constrained from literature or independently measured in our setup. Our approach (i.e. treating the biofilm as impermeable and focusing on flow obstruction) avoids this parametrization complexity while successfully capturing the observed dynamics.

      (4) Our model successfully predicts the observed scaling laws (φmax ∝ Q1/2, Fig. 7f) and hydraulic resistance dynamics (Fig. 3) without invoking permeability, suggesting that flow obstruction rather than flow penetration is the dominant mechanism.

      Reference: Kurz, D. L.; Secchi, E.; Stocker, R.; Jimenez-Martinez, J. Morphogenesis of biofilms in porous media and control on hydrodynamics. Environ. Sci. Technol. 2023, 57 (14), 5666−5677.

      The research suggests EPS development as a stage in biofilm growth but does not probe it using lectin staining. This makes it impossible to accurately assess the role of EPS in biofilm development and detachment processes.

      We respectfully disagree that lectin staining is necessary to assess the role of EPS in our system, and we argue that our approach using genetic mutants is superior for the following reasons. Lectin staining has significant limitations. While widely used, lectin staining (e.g., concanavalin A) is non-specific (binding not only to EPS polysaccharides but also to bacterial cell surfaces) and is non-quantitative. It can confirm the presence of polysaccharides but cannot establish causal relationships between specific EPS components and mechanical properties or detachment dynamics. We performed preliminary experiments with ConA-rhodamine (data not shown), which showed widespread presence of polysaccharides. However, this provided limited insight beyond confirming EPS production, which is well-established for P. aeruginosa PAO1 biofilms. We employed a more rigorous genetic approach to directly assess the role of EPS composition. We used Δpel and Δpsl mutants (strains lacking key exopolysaccharides that are the primary structural components of the PAO1 matrix). Our results demonstrate that both mutants show significantly reduced maximum clogging compared to wild-type. The Δpsl mutant is particularly affected, with near-complete detachment at certain flow rates. These differences directly link EPS composition to mechanical stability and detachment dynamics. This genetic approach provides causal, quantitative evidence for the role of specific EPS components in biofilm development and detachment, information that lectin staining cannot provide. We believe this addresses the reviewer's concern more rigorously than lectin staining would.

      While the force and flow are three-dimensional, the images are taken in two dimensions. The paper does not clearly explain how the 2D images are extrapolated to make 3D assessments, which could lead to inaccuracies.

      We thank the reviewer for this important observation. We would like to clarify our methodological approach. Our primary three-dimensional measurement is the hydraulic resistance R(t), obtained from pressure drop measurements across the biofilm-containing channel section. This pressure-based measurement inherently captures the three-dimensional flow obstruction caused by the biofilm. We then employ a geometric model (uniform biofilm layer on all channel walls) to convert R(t) into volume fraction φ(t).

      The two-dimensional fluorescence imaging serves to validate this model-based approach rather than being the basis for three-dimensional extrapolation. The uniform layer assumption is supported by three independent lines of evidence: (i) the excellent quantitative agreement between predicted and measured scaling laws (φmax ∝ Q1/2, Fig. 7f), obtained without adjustable parameters; (ii) the high reproducibility of φmax values across different flow rates and replicates; and (iii) the strong correlation between model-derived φ(t) from pressure measurements and integrated fluorescence intensity (Fig. 3b-d).

      We have added clarifying text in the Methods section (subsection "Data analysis for the calculation of the hydraulic resistance and volume fraction") to better explain this approach and emphasize that pressure measurements provide the three-dimensional information, with the geometric model serving as the link to volume fraction.

      Although the findings are tested using polysaccharide-deficient mutants, the results could have been analyzed in greater detail. A more thorough analysis would help to better understand the role of matrix composition on the stochastic model of detachment.

      We thank the reviewer for this suggestion. Our mutant analysis demonstrates that Δpsl and Δpel strains have significantly reduced φmax and altered detachment dynamics compared to wild-type (Fig. 8), directly linking EPS composition to mechanical stability as predicted by our model. A rigorous quantitative connection between matrix composition and the stochastic parameters (interevent times, jump amplitudes) would require: (i) substantially more sloughing events for statistical power, (ii) independent mechanical characterization of each mutant, and (iii) a mechanistic model linking EPS composition to detachment parameters. We are currently developing microrheology approaches to characterize mutant mechanical properties, which could enable such refinement in future work.

      However, this represents a substantial study beyond the scope of the current manuscript, which establishes the self-sustained sloughing-regrowth cycle and its stochastic nature. The mutant results serve their intended purpose: demonstrating that EPS composition affects detachment, consistent with our model's framework.

      Reviewer #2 (Public review):

      This manuscript develops well-controlled microfluidic experiments and mathematical modelling to resolve how the temporal development of P. aeruginosa biofilms is shaped by ambient flow. The experiment considers a simple rectangular channel on which a constant flow rate is applied and UV LEDs are used to confine the biofilm to a relatively small length of device. While there is often considerable geometrical complexity in confined environments and feedback between biofilm/flow (e.g. in porous media), these simplified conditions are much more amenable to analysis. A non-dimensional mathematical model that considers nutrient transport, biofilm growth and detachment is developed and used to interpret experimental data. Regimes with both gradual detachment and catastrophic sloughing are considered. The concentration of nutrients in the media is altered to resolve the effect of nutrient limitation. In addition, the role of a couple of major polysaccharide EPS components are explored with mutants, which leads results in line with previous studies.

      There has been a vast amount of experimental and modelling work done on biofilms, but relatively rarely are the two linked together so tightly as in this paper. Predictions on influence of the non-dimensional Damkohler number on the longitudinal distribution of biofilm and functional dependence of flow on the maximum amount of biofilm (𝜙max) are demonstrated. The study reconfirms a number of previous works that showed the gradual detachment rate of biofilms scales with the square root of the shear stress. More challenging are the rapid biofilm detachment events where a large amount of biofilm is detached at once. These events occur are identified experimentally using an automated analysis pipeline and are fitted with probability distributions. The time between detachment events was fitted with a Gamma distribution and the amplitude of the detachment events was fitted with a log-normal distribution, however, it is not clear how good these fits are. Experimental data was then used as an input for a stochastic differential equation, but the output of this model is compared only qualitatively to that of the experiments. Overall, this paper does an admirable job of developing a well-constrained experiments and a tightly integrated mathematical framework through which to interpret them. However, the new insights this provides the underlying physical/biological mechanisms are relatively limited.

      We thank the reviewer for the thorough evaluation of our work and for highlighting the tight integration between experiments and modeling. We appreciate the constructive feedback regarding the goodness-of-fit for the probability distributions.

      To address the concern that "it is not clear how good these fits are," we have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes.

      Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      We respectfully disagree that “new insights this provides the underlying physical/biological mechanisms are relatively limited.” Beyond confirming previous findings (e.g., scaling for gradual detachment), we believe our work provides several novel mechanistic insights. First, the Pe/Da criterion enables quantitative prediction of nutrient limitation regimes, allowing systematic decoupling of nutrient effects from other phenomena in biofilm studies. Second, we demonstrate that pressure, not shear, drives sloughing detachment events, a mechanism overlooked in previous studies where the notion of “shear-induced detachment” clearly dominates. Third, we show that sloughing-regrowth cycles occur even in single channels, establishing pressure-driven fluctuations as a signature of confined biofilm growth, independent of geometric complexity. Finally, the stochastic description of sloughing demonstrates that, while instantaneous biofilm states are irreproducible, the underlying randomness is predictable, therefore addressing a fundamental challenge in biofilm research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the abstract, I suggest clarifying the term "bacteria development." It is unclear if it refers to bacterial growth, biofilm formation, or biofilm detachment. The concept is expressed more clearly at the end of the Introduction.

      We have modified the entire abstract to make it clearer. The abstract now explicitly establishes the key processes - growth ('nutrients necessary for growth', 'growing bacteria obstruct flow paths') and detachment ('mechanical stresses that cause detachment', 'flow-induced detachment', 'sloughing') - before using 'bacterial development' as a collective term to refer to these coupled spatiotemporal dynamics. We believe the abstract is now clear as written.

      (2) Findings from Sanfilippo et al. (2019) were slightly questioned by Padron et al. (PNAS, 2023), who discovered that H2O2 transport is responsible for fro operon upregulation.

      Thanks for the clarification, which is indeed significant. The new sentence now reads: Pseudomonas aeruginosa has been found to regulate the fro operon in response to flow-modulated H2O2 concentrations (Sanfilippo et al. 2019, Padron et al. 2023).

      (3) Additionally, Kurz et al. (2022) account for pressure buildup as the mechanism controlling sloughing.

      We respectfully disagree and note that Kurz et al. (2022) identify shear stress, not pressure buildup, as the primary mechanism controlling sloughing. Besides the title, key sentences include “opening was driven by a physical process and specifically by the shear forces associated with flow through the biofilm”, “The opening of the PFPs is driven by flow-induced shear stress, which increases as a PFP becomes narrower due to microbial growth, causing biofilm compression and rupture.” While pressure differences are measured as indicators of system state and do contribute to normal compression stresses, their mechanistic explanation emphasizes that narrowing PFPs experience increased shear rates that eventually exceed the biofilm's yield stress, triggering viscoplastic deformation and detachment. The pressure buildup is a hydraulic consequence of narrowing rather than the direct cause of sloughing. In contrast, our work demonstrates that in confined geometries, pressure differences generate tangential stresses at the biofilm-solid interface that directly drive detachment.

      (4) The flow control strategy represented in Fig. 1 is not explained and should be detailed in the Methods section.

      The methods section reads as follows. Inoculation and flow experiments BHI suspensions were adjusted at optical density at OD640nm= 0.2 (108 CFU/mL) and inoculated inside the microchannels from the outlet, up to approximately ¾ of the channel length in order to keep a clean inlet. The system was let at room temperature (25°C) for 3h under static conditions. Flow experiments were then performed at 0.02, 0.2, 2, 20 and 200 μL/min constant flow rates for 72h in the microchannels at room temperature. For the experiments at 0.2, 2, 20 and 200 μL/min, the fluidic system was based on a sterile culture medium reservoir pressurized by a pressure controller (Fluigent FlowEZ) and connected with a flow rate controller (Fluigent Flow unit). The flow rate was maintained constant by using a controller with a feedback loop adjusting the pressure in the liquid reservoir. The reservoir was connected to the chip using Tygon tubing (Saint Gobain Life Sciences Tygon™ ND 100-80) of 0.52 mm internal diameter and 1.52 mm external diameter, along with PEEK tubing (Cytiva Akta pure) with 0.25 mm inner diameter adapters for flow rate controller. The waste container was also pressurized by another independent pressure controller to reduce air bubble formation in the inlet part. For the experiments at 0.02 μL/min, we used an Harvard Phd2000 syringe pump for the flow.

      (5) Including images of the actual biofilms formed in a portion of the channel would aid in understanding the analysis presented in Fig. 2.

      Images are introduced later on (eg Figure 5). There is also supplementary material showing videos.

      (6) The boundary conditions used to calculate the stress in the developed model should be discussed. The authors should specify why biofilm porosity is neglected.

      We have added a detailed discussion in the supplementary (Section I.2).

      (7) In the first section of the Results, the authors hypothesize that heterogeneity in biofilm development could be due to oxygen limitation. However, given the high oxygen permeability of PDMS, this hypothesis is later denied by their data. It would be prudent to avoid this hypothesis initially to streamline the presentation. Additionally, the authors should specify how oxygen levels at the inlet and outlet are measured.

      We appreciate this comment and agree that streamlining would simplify the presentation. However, after careful consideration, we have chosen to retain the oxygen limitation hypothesis for the following reasons: (1) oxygen limitation is a frequently invoked mechanism in biofilm systems and deserves explicit consideration, (2) it is not immediately obvious that oxygen remains non-limiting in larger microchannels where transverse gradients could develop, and (3) systematically eliminating this plausible alternative hypothesis strengthens our mechanistic conclusion that BHI drives the observed heterogeneity. Regarding oxygen measurements: we did not directly measure dissolved oxygen concentrations. Our approach is only indirect.

      (8) What is the standard deviation of the doubling time measured at different flows (page 9)?

      We have indicated the standard deviation in the text. Note that the graph shows the SEM.

      (9) What is the "zone of interest" in the channel mentioned on page 9?

      We have added the following sentence to clarify: To further understand this effect, let us consider the mass balance of biofilm in the zone of interest -- the zone where biofilm grows in between the two UVC irradiation zones -- in the channel.

      (10) Minor and major detachment events should be classified based on a defined threshold or criteria, and their frequency should be measured.

      We appreciate the reviewer's concern about quantitative rigor. However, we respectfully disagree that imposing arbitrary thresholds to classify 'minor' vs. 'major' events would improve our analysis. Detachment events in our system span a continuum of magnitudes, and any threshold would be artificial and potentially misleading. Our quantitative characterization of detachment dynamics is provided through the statistical analysis of interevent times, which we show follow a gamma distribution. This stochastic framework captures the full spectrum of detachment behavior without requiring arbitrary binning. The terms 'minor' and 'major' in our manuscript are used qualitatively to illustrate the range of observed phenomena, not as formal classifications.

      (11) Have the authors identified a reason for the peaks in the volume fraction in the Δpsl mutants at the highest flow rate?

      The biofilm thickness following these sloughing events is below our detection limit, consistent with a residual layer of cells. However, these cells grow, leading to a time window where the fraction is measurable, before a new detachment event occurs. Our understanding is that the psl mutant forms a weaker matrix with a much lower threshold for sloughing.

      (12) The fit of the probability density function for the relative density function does not match the data well. The authors should comment on this.

      We have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes. Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      (13) Additionally, the simulated fraction appears very flat, with limited detachments compared to experiments. Why?

      The model captures the essential dynamics of growth-detachment cycles, including the characteristic timescales and volume fraction ranges. Some event-to-event variability in the experimental data likely reflects biological stochasticity not captured by our current approach—for example, variations in local biofilm mechanical properties or matrix composition that affect the precise stress at which sloughing occurs. While incorporating such biological variability as a stochastic parameter would improve detailed agreement, it would require extensive additional characterization beyond the scope of this study. The current model successfully reproduces the key qualitative and semi-quantitative features of the system.

      (14) The methods section should include a more detailed explanation of how the model was validated against experimental data.

      Model validation was performed by comparing predicted biofilm volume fraction time series and sloughing event statistics against experimental observations across multiple flow rates. The model reproduces the characteristic growth-sloughing cycles, timescales, and steady-state volume fractions without additional parameter fitting beyond the experimentally measured distributions.

      (15) It would be useful to include information on the reproducibility of the experiments and any variations observed between replicates.

      Experiments were performed in N=3 biological replicates. Individual time series for all replicates are shown in Supplementary Figures, demonstrating consistent behavior across replicates.

      (16) A discussion of the limitations of the study, particularly regarding the assumptions made in the modeling and their potential impact on the results, would strengthen the paper.

      We have added a discussion on why we chose to neglect the porosity of the biofilm, and strengthened parts on the uniform biofilm layer assumption.

      Reviewer #2 (Recommendations For The Authors):

      Page 2: "A vast" —> "The vast"

      Changed.

      The text and line widths on many of the figures are far too small. I printed it out at normal size, but had to look at a PDF and magnify to actually see what the graphs are showing. Fig. 9c is particularly illegible.

      Changed.

      Fig. 1 caption "photonic" —> "optical"?

      Changed

      Can you spell out the actual mathematical definition of 𝜙 on page 5 when it is introduced? Currently it just says the "cross section volume fraction of the biofilm", but that seems potentially ambiguous. It is valid to say that this is "fraction of the cross section occupied by the biofilm"?

      Changed

      Bottom of page 5: can you state the physical interpretation of the assumption that M is bounded between 0 and 1. i.e. that growth is larger than detachment?

      There is a comment on that in the paper. It reads “In assuming that M ∈ ]0, 1] and eliminating cases where M > 1, we have not considered situations of systematic detachment 𝜙equ = 0 for any value of the concentration, since this is not a situation that we encountered experimentally.” This comes just after presenting the expression on the only non-trivial steady-state, as it becomes easier to explain the consequences of the initial choice at this point.

      Currently the choice of detachment initially used in the model is a bit confusing. You say that you are going to assume a (1-𝜙)-1 model for simplicity (bottom of page 5), but then later you find that the (1-𝜙)3/4 model is more accurate (page 16). Since the latter has already been confirmed in numerous other studies, why not start with that one from the beginning?

      We thank the reviewer for this important question, which highlights an area where our presentation could be clearer. We did not find that the (1-φ)-3/4 model is "more accurate." Rather, we deliberately chose the (1-φ)-1 scaling because it captures pressure-induced detachment, which we hypothesized would dominate in confined flows where biofilms clog a large portion of the channel. The (1-φ)-3/4 scaling, widely used in previous studies, describes shear stress at the biofilm/fluid interface and was developed primarily for reactor systems where pressure effects are negligible. Our analysis on page 16 validates this choice by demonstrating that pressure stress indeed exceeds shear stress when volume fraction is large, which corresponds to late Stage I and all of Stage II precisely where our model is applied. The excellent quantitative agreement between predicted and measured φmax values across flow rates (Fig. 7f, Table 1) further supports the (1-φ)-1 scaling. We recognize that our initial presentation may have suggested the (1-φ)-1 choice was merely for "simplicity." We have revised this section to emphasize that this scaling was chosen specifically to capture pressure-driven detachment in confined geometries, with the physical justification provided by the stress analysis that follows. We have also clarified our ideas on page 16 to express clearly that (1-φ)-3/4 is never used. We could alternatively use a multi-modal detachment function combining both scalings, but the data do not require this additional complexity.

      In general, the models you derived in this study could be better contrasted with that from previous works. e.g. can you compare your Eqn (4) with the steady-state solutions obtained by other previous studies? Is this consistent with previous works or different? (aside from framing the biofilm thickness in terms of 𝜙)

      We are currently working on a paper dedicated to modeling biofilm development in confined flows, which will do a better job at comparing approaches.

      Top of page 6 - you assume K* = 0.1 - Does this assume that cells grow at half the rate in 0.1X BHI as they do in 1X BHI? Has this been confirmed experimentally or is this just a guess?

      This was estimated rather than measured directly. Model predictions were a lot more sensitive to the Damköhler number, than to the value of K.

      "radial" is used widely in this paper, but you are using a square geometry. Is "transverse" a better choice?

      Yes it clearly is. It’s been changed.

      Fig 3. Are panels (a) and (b) showing different bioreps of the same condition? If so, please spell that out in the caption.

      There was an error here in the caption of fig a. This has been changed. The correspondence is between a and c, and these are exactly the same, not bioreps.

      In multiple places it noted that the change in hydraulic resistance is correlated with the "change in biofilm colonization." Why not demonstrate this directly using a cross correlation analysis? How is the latter connected to the 𝜙 parameter? (e.g. is this d(𝜙)/dt?)

      We thank the reviewer for this suggestion. To clarify: φ(t) represents the volume fraction of biofilm in the channel. We measure this in two independent ways: (1) φ(t) from hydraulic resistance (black line in Fig. 3) i.e. calculated from pressure measurements using φ = 1 - √(R₀/R(t)), assuming uniform layer growth (see Methods section "Data analysis for the calculation of hydraulic resistance and volume fraction") and (2) φ(t) from fluorescence (green squares in Fig. 3) i.e. estimated from integrated GFP intensity or image segmentation of the glass/liquid interface. The reviewer is correct that we should quantify this relationship directly. We have now added correlation analysis between these two independent measurements of φ (new Supplementary Figure S21). The analysis shows strong positive correlation, with r-values ranged from 0.68 to 0.77 across all flow rates. This validates two key aspects of our approach: (1) the uniform layer assumption used to convert R(t) to φ(t) is reasonable, and (2) the pressure-based measurements accurately capture the dynamics visible in fluorescence imaging, including both growth phases and sloughing events. The strong agreement is particularly notable given that these measurements probe different aspects of the biofilm: hydraulic resistance is sensitive to the three-dimensional obstruction of flow, while fluorescence captures primarily the biofilm attached to the glass surface within our focal plane. Their correlation supports the model assumptions. We have revised the manuscript to clarify this relationship and present the correlation analysis.

      Top of page 9 - a doubling time of 110 mins is reported in liquid culture - is this in shaken or static conditions? Can you provide some data on how this was calculated? (e.g. on a plate reader?) Do you think your measurements in the microfluidics could be affected by attachment/detachment of cells, rather than being solely driven by division. It is curious that your apparent growth rate varies by a factor of two across the different flow rates and there is not a monotonic dependency. Both attachment and detachment would depend on the flow rate (with some non-trivial dependencies).e.g. https://www.pnas.org/doi/10.1073/pnas.2307718120 https://doi.org/10.1016/j.bpj.2010.11.078

      Given that your doubling time in the microfluidics is sole based on changes in cell number (rather than directly tracking cell divisions) it seems possible your results here are measuring the combined effect of growth, attachment and detachment, rather than just growth.

      We agree with those comments regarding the doubling time measurement. We have added a description of how we performed the doubling time measurement in the Methods section.

      Page 9 - you discuss the role of EPS here, but the effect of EPS is not demonstrated here and this is muddled with a discussion about the non-linearity of the putative dependency. Maybe this would be on a firmer footing if you save the discussion of EPS for the section on the Psl and Pel mutants?

      Changed.

      Middle of page 9: Please define what "smooth detachment" means and contrast it with catastrophic sloughing. Also, please define what you mean by "flow, seeding, and erosion" detachment are and how these three things differ from one another.

      We have clearly defined each term in the revised version.

      The results from wavelet scalograms seem to be underutilised and not well described. Can you clearly say what time series this analyses has been calculated on the caption? e.g. hydraulic resistance? Other than simply pointing out the "blue stripes", what can be gained from this analyses that could not be obtained with another method? It would be great if the basic features of this plot could more fully discussed (e.g. is the curved envelope at the bottom caused by edge effects?)

      We have improved the text, captions and method section following the reviewer’s comment.

      Fig. 5 a and b - please list the time at which each of these images were taken. Do these have the same dt between the two sets of images?

      Yes the dt is the same (30 minutes). It’s been indicated in the caption.

      Fig. 6: you have significant 2D variation in the biofilm width along the length of the channel. The relative contribution of pressure and shear based detachment will be different at different positions along the length. However, this variation is ignored in your model. Can you please comment on this in our manuscript and how it might affect the interpretation of your results? e.g. would the longitudinally averaged description yield the same result as one that takes the geometry into account (on average)?

      Our model indeed assumes longitudinally averaged properties. A more detailed spatially resolved model would be valuable for capturing heterogeneities and will be explored in future work.

      Bottom of page 11: you say standard deviations are in the range of 10-3. How does this jibe with the error bars on the middle flow rate in Fig. 7e?

      This extremely low standard deviation only applies to the maximum value of 𝜙 and is a completely different measurement from the whisker boxes presented in fig7e.

      Fig. 7: You are calculating the "Fraction" here. Is this "𝜙"? If so, can you put that on the y-axis instead? You calculate the volume fraction two different ways e.g. with hydraulic resistance and with imaging. Is only one of these shown in (e)? Is the same powerlaw dependence shown in (f) conserved when the other measurement of the "fraction" is used? Can you include both in Fig. 7e?

      We have modified the axis and indicated 𝜙.

      (e) is calculated only from hydraulic resistance. This is the most precise measurement to evaluate 𝜙 quantitatively.

      Related to the previous comment: Some of the estimates of 𝜙max in Table 1 are obtained by fitting the model to integrated fluorescence data (Fig. 2b), while others are estimated from measurements of the hydraulic resistance. The former yields non-unique sets of parameters. Can the biofilm fraction instead actually be estimated directly from fluorescent imaging by segmenting biofilm and directly calculating how much of the cross section is occupied by cells on average across the length? This seems like a more direct measure of this quantity. Given there are multiple ways of estimating the same parameter, it would be better consistency checking to make sure that different methods actually yield the same result.

      We have now added in Fig S21 a direct comparison of these two measurement methods. These are strongly correlated. Microscopy is more direct but only provides 2D pictures. Hydraulic resistance provides a 3D measurement, but relies on a model of biofilm distribution. Both are imperfect, but correlate well. In particular, we see that the 2D measurement does capture sloughing.

      You cite a large number of supplemental figures (e.g. Fig. S21 on page 12), but the figures in your SI only go up to 11.

      We have revised references to supplementary figures.

      Bottom of page 11: Your data from liquid culture suggests that your psl mutant grows at half the rate of WT cells. Is that consistent with your microfluidic data (e.g. Fig. 8)? If not, might this be a sign that your growth rate analyses from the microfluidics might be affected by attachment/detachment? (see comment above) Psl cells should detach much more easily.

      The approach taken to measure doubling times in the microfluidic system does not rely on the macroscopic measurements presented in figure 8, but rather on the approach presented in fig 4. These measurements require specific imaging (different magnification and time stepping) and we did not perform such experiments for the mutants.

      In analyses of sloughing, you fit the times between the jumps and the relative amplitude. Are these two random variables correlated with one another? Might that influence your results? Your methods say that "jumps were identified through through the selection of local maxima" of the derivative. Do you to say "minima" here? Did you keep all local maxima/minima or did you have a threshold?

      These are two random variables, not correlated with another. This is an assumption, and it would be interesting to analyze whether these are correlated. To perform this analysis, we believe that we would first need to acquire even more data and more replications to improve the statistical analysis.

      Yes, it was minima (in the code we make everything positive, hence the confusion).

      Yes, there is a threshold on the value of the jump itself. This value is extremely low and essentially filters out noise.

      Fig. 9 - can you make it clearer in the caption what timeseries you are analysing here? I understand from the methods this that is the "volume fraction." The data/fits are difficult to see in Fig. 9 b and impossible to see in Fig. 9c because the green bars get in the way of the other two data sets. Can this visualisation be improved? It is not clear to me how good of a job the Gamma and log-normal fits are actually doing.

      We have clarified that histograms are calculated from all experiments/replicates.

      We have slightly modified the graph to make it clearer. This comparison is intrinsically hard, partly because it compares discrete data with continuous PDFs.

      Aside from noting the results from the stochastic sloughing model are 'strikingly similar to experimental data', which seems to be based on a qualitative analysis of the lines in Fig. 7 d, e, and f. However, experimental data is not plotted in the same graph nor is the experimental data that we should be comparing this to cited in the text/caption.

      We have added a note in the caption to indicate which figure it can be compared to.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03242R

      Corresponding author(s): Shinya Kuroda

      1. General Statements

      We appreciate the reviewers for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer's comments and have revised our manuscript accordingly.

      The reviewers' comments in this letter are in Bold and Italics.

      2. Point-by-point description of the revisions

      Response to Reviewer #1's Comments

      Evidence, reproducibility and clarity:

      Major comments

      1. This study leaves out lipid metabolism as a major energy metabolism pathway relevant to AD. The authors themselves cite the significance of acylcarnitines and CPT1A in AD (pg. 3, lines 32-33, pg. 4, lines 1-2). Lipid metabolism and homeostasis is known to be disrupted in AD1. Fatty acid oxidation is a known energy source in the prefrontal cortex2 and will also generate acetyl coA, which this study reveals is a significant decreased metabolite in AD. Furthermore, sphingomyelin emerges as one of the major decreased DEMs as well. Thus, lipid metabolism should be highlighted in Figure 3 and discussed throughout the manuscript; otherwise its omission should be clearly stated and justified.

      We appreciate the reviewer's insightful comment regarding a critical role of lipid metabolism in AD. We recognize that lipid metabolism is a metabolic pathway deeply involved in AD pathology (Baloni et al., 2022, 2020; Varma et al., 2021). Accordingly, we have revised the Limitations section to more strongly emphasize its role as a vital energy source (pg. 13, lines 15-17). Regarding the visualization of lipid metabolism, we extracted lipid-related pathway from the trans-omic network but found that the regulatory relationships among DEPs and DEMs were excessively complex and interconnected. Thus, interpreting this regulatory network seemed to be more challenging compared to the other energy production pathways presented in our manuscript. Therefore, we have concluded that the pathway analysis in our trans-omic network may not be suitable for deeply elucidating the lipid dysregulation in AD. We have added a statement acknowledging this as a limitation of our current methodology in the revised manuscript (pg. 13, lines 13-22).

      The covariates used for differential analysis should be discussed and justified. Notably, age is used as a covariate for transcriptomic analysis but not proteomic and metabolomic analysis, with no justification. Additionally, given the known importance of lipid metabolism in AD and the putative role of APOE in lipid homeostasis3, APOE genetic status should be considered as a covariate, or its omission should be justified.<br />

      We appreciate the reviewer's comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, such as age at death and RIN, is that these data were not available for each sample. Thus, we referred to the original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, and years of education. Regarding the proteomic dataset, in the original article (Johnson et al., 2020), age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).

      The authors make a conclusion statement that suggests intervention: "Collectively, our data suggests that preserving or improving the ability to produce ATP and early intervention in the process of nitrogen metabolism are candidates for the prevention and treatment of dementia" (pg. 12, lines 12-14). This claim is not well-supported by the evidence provided in the study. There are a few limitations: (a) This was an observational, not interventional study; (b) The study did not establish whether the metabolic disruptions are causes or effects in AD; and (c) ATP or other bioenergetic indicators were not directly measured. Therefore, any statements about potential interventions should be removed or qualified as highly speculative.

      We agree with the reviewer that the statement regarding potential interventions was not sufficiently supported by our analyses. Accordingly, we have removed the sentence regarding prevention and treatment from the revised manuscript (e.g., we have deleted final paragraph of the previous manuscript).

      In conjunction with the last point, the main conclusion of the study is that energy production is down in AD. The data presented in Figure 3 are consistent with this conclusion, but it is far from definitive due to limitations stated above in comments 3a and 3b. The authors should offer additional support for this conclusion: experimental follow-up, flux modeling, analysis of alternative datasets with ATP measurement, causal inference.<br />

      We sincerely thank the reviewer for this valuable and constructive suggestion. Regarding flux modeling, we agree that metabolic flux analysis could provide important mechanistic insight. Indeed, previous studies have applied flux modeling in the context of lipid metabolism in Alzheimer's disease (Baloni et al., 2022). We also attempted to perform flux modeling focusing on energy metabolism. However, we found it difficult to obtain biologically meaningful and robust results and therefore decided not to include these analyses in the current manuscript.

      With respect to ATP measurements, we fully agree that direct evidence of altered ATP levels would further strengthen our conclusion. However, to the best of our knowledge, there are currently no publicly available large-scale datasets that directly measure ATP levels in human postmortem brain tissues. This limitation makes it challenging to incorporate validation in the present study.

      Regarding experimental follow-up, we agree that functional validation is essential to confirm the mechanistic implications of our findings. We are actively considering follow-up experimental studies. However, we consider the present work to be a multi-omic integrative analysis aimed at identifying key molecular alterations and generating biologically important hypotheses. We have revised the Limitation section to more clearly position this manuscript as an observational systems-level analysis (pg. 13, lines 20-22).

      The validation analysis did not sufficiently show the generalizability of this study's results. The authors demonstrated a correlation of 0.53 to the MSBB transcriptomics data and 0.60 to the AMP-AD DiverseCohorts proteomics data. Beyond these correlation coefficients, no meaningful comparison between the datasets is offered. How concordant are the differentially expressed features (or pathways) between the datasets? How robust would the trans-omic network be if incorporating the alternate datasets? Is the main conclusion (energy metabolism is down in AD) supported by the validation datasets? We think this analysis should be expanded and described in the main text. Although the results for external metabolomics datasets are reported in Fig S2C, correlation coefficients with the external data are not reported. The authors state, "Note that each study used different definitions for AD and CT groups, had variations in measurement methods and brain regions analyzed." We appreciate these limitations. However, the external data should be re-analyzed using the same definitions of AD and CT, if possible. The limitations and results (which DEMs are shared between datasets) should be discussed in the main text. __

      We thank the reviewer for this important comment regarding the generalizability of our findings. In the revised manuscript, we have expanded the validation analyses and summarized the results in Figure S2. First, at the transcriptomic level, Figure S2B and S2C show the overlap between up- and downregulated genes in AD identified in our ROSMAP-derived analyses and those reported in a previously published large-scale meta-analysis of 2,114 postmortem samples across seven brain regions (Wan et al., 2020). A substantial proportion of DEGs were shared, supporting cross-cohort and cross-region robustness to some extent. At the proteomic level, Figure S2E shows a comparison between the ROSMAP and the AMP-AD DiverseCohorts datasets. We highlighted the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3 and calculated a separate correlation coefficient for this subset (Pearson coefficient = 0.86, p-value = 1.5e-7), further supporting our main conclusion. In addition, to assess the concordance between the two datasets in a threshold-independent manner, we additionally performed Rank-Rank Hypergeometric Overlap (RRHO) analysis (Figure S2E). RRHO analysis (Cahill et al., 2018; Plaisier et al., 2010) enables the comparison of ranked protein lists without relying on arbitrary differential expression cutoffs and has been used for cross-dataset comparison in several previous studies (Fröhlich et al., 2024; Maitra et al., 2023). The RRHO heatmaps demonstrated significant enrichment in the concordant quadrants, confirming systematic agreement between datasets beyond simple correlation coefficients. For metabolomics, Figure S2G shows RRHO analyses comparing the ROSMAP metabolomic data with other datasets measured by the same UPLC-MS/MS platform (Batra et al., 2024; Novotny et al., 2023), demonstrating significant concordance in ranked metabolite changes in AD.

      The glycolysis analysis and discussion needs more development. Glycolysis and gluconeogenesis share many of the same enzymes, but they are not the same pathway and should not be discussed as such. To make a claim about the overall influence of enzyme and metabolite levels on glycolysis, the authors should focus on the energetically committing steps of glycolysis (hexokinase, phosphofructokinase, pyruvate kinase) in Figure 3A, and include the full/current version of the figure in the supplement. Gluconeogenesis-specific enzymes (pyruvate carboxylase, PEPCK) are not mentioned at all - are they among the DEPs/DEGs?<br />

      We appreciate the reviewer's comment regarding the distinction between glycolysis and gluconeogenesis pathway. Among the gluconeogenesis-specific enzyme proteins, G6PC1, FBP1, PC, and PCK2 were measured in our dataset, but none of them were identified as DEPs. In addition, gluconeogenesis is a process that occurs primarily in the liver and kidney rather than the brain. Given this biological context and the lack of significant changes in relevant enzymes, we have revised the terminology throughout the manuscript, replacing "glycolysis/gluconeogenesis pathway" with "glycolysis pathway" in the revised version.

      Given that there wasn't good concordance between the DEGs and DEPs, did including the mRNA and transcription factor layers in the network really add anything useful? It seems like the main conclusions of the manuscript were driven by the protein and metabolite layers only. How many of the DE metabolic enzymes were coregulated at the transcript and protein level? It would be useful to include the 5-layer trans-omic network in the supplement to display these results. Given your network, at what level does it appear that energy metabolism is regulated?<br />

      It is true that our primary conclusion regarding the regulation of energy metabolism is driven by the changes in protein and metabolite abundance. However, we consider the low concordance between mRNA and protein expression itself to be an important feature of AD pathology, as also reported in previous studies (Johnson et al., 2022; Tasaki et al., 2022). Although we did not perform a further analysis of this discordance, we believe that including the TF and mRNA layers into the metabolic trans-omic network strengthens a system-wide view of metabolic dysregulation in AD.

      Regarding the mRNA changes corresponding to the DEP enzymes, please refer to Figure S7A.

      Comment further on the results from Figure 2D. What can be learned from identifying metabolites with the greatest degree centrality? What pathways other than energy metabolism are highlighted by the trans-omic network?<br />

      We assume that some energetic indicators, including AMP and acetyl-CoA, and nitrogen metabolism-related metabolites, Glu, 2-oxoglutarate, and urea, can be potential key regulators of dysregulated metabolism in AD.

      (Suggestion) We suggest the authors leverage their trans-omic network in additional ways beyond giving a snapshot of a few energy metabolism pathways. The analysis of top DEMs could go further. What pathways are impacted beyond energy metabolism? Among the metabolic reactions allosterically regulated by top DEMs, what metabolic pathways are enriched?<br />

      We identified the enriched metabolic pathways that were allosterically regulated by DEMs in AD using Fisher's exact test. Alanine, aspartate, and glutamate metabolism pathways were significantly enriched in 2-oxoglutarate, glutarate, alanine, and glutamate-regulating metabolic reactions. Arginine and proline metabolism pathway was enriched in N-methyl-L-arginine and putrescine-regulating metabolic reactions. Arginine biosynthesis pathway was enriched in arginine-regulating metabolic reactions. Glycerophospholipid metabolism pathway was enriched in CDP-ethanolamine-regulating metabolic reactions. Glycine, serine, and threonine metabolism pathway was enriched in serine-regulating metabolic reactions. Purine metabolism pathway was enriched in AMP-regulating metabolic reactions. Pyrimidine metabolism pathway was enriched in deoxyuridine and thymidine-regulating metabolic reactions. Sphingolipid metabolism pathway was enriched in sphingosine-regulating metabolic reactions. However, this analysis did not yield sufficiently valuable insights into the regulatory relationships among biomolecules in AD. Thus, we did not include these results in the revised manuscript.

      (Suggestion) Figure 3 shows that most differential signal in AD points to lower energy production due to the combination of differentially expressed metabolites and enzymes, but we are not given much context about the strength of these among all the differential signals. We would suggest including volcano plots where the features of interest, i.e. DE enzymes and metabolites, are colored differently (or a similar figure).<br />

      We thank the reviewer for this constructive suggestion. To provide better context regarding the importance of the differential signals, we have added volcano plots for mRNAs, proteins, and metabolites in Figure S4A, B, and C.

      (Suggestion) The PPI network could be better leveraged to understand metabolic changes in AD. If nodes are grouped into subnetworks (e.g. by Louvain / Leiden clustering) and tested for pathway enrichment, could you find functional subnetworks of coordinately up- and down- regulated metabolic enzymes? This could yield some pathways of interest beyond the energy metabolism pathways already highlighted.<br />

      We appreciate the reviewer's suggestion to utilize the PPI network for subnetwork analysis. However, it is important to note that the proteomic dataset analyzed in this study is derived from the original work of (Johnson et al., 2020). In that paper, the authors already performed a Weighted Gene Co-expression Network Analysis (WGCNA) across several datasets to identify co-expressed modules and functional pathways.

      Given this, we assumed that applying additional clustering methods to the same dataset would be unlikely to yield significant biological insights beyond the established findings.

      __ ____Minor comments __

      12. "All genes" and "all metabolites" should not be the background for the proteomic and metabolic pathway enrichment analysis by Metascape and MetaboAnalyst. The background should be limited to the proteins and metabolites that were measured.

      We fully agree with the reviewer that using "all gene" or "all metabolites" as a background is not suitable for enrichment analyses. As suggested, we have revised the enrichment analyses using the measured proteins and metabolites as a background in both Metascape and MetaboAnalyst (Fig. S4D).

      Highlight the metabolic enzymes in Fig S2B. Calculate a separate correlation coefficient for the enzymes extracted in the energy metabolism analysis from Fig 3.<br />

      We appreciate the reviewer's suggestion to refine the correlation analysis. As requested, we have revised Fig. S2D to explicitly highlight the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3. We calculated a separate correlation coefficient for the subset (Pearson coefficient = 0.86, p-value = 1.5e-7).

      Use a multiple hypothesis adjusted p-value or q-value in Figure S3.<br />

      We agree with the reviewer regarding the necessity of correcting for multiple comparisons. Accordingly, we have revised Fig. S4D using q-values.

      Describe the methods used to calculate the logFC values from the validation dataset.<br />

      We have revised the Methods to include a detailed description of the procedure used to calculate the log2FC values for the validation datasets (pg. 21, lines 13-15).

      It is difficult to read Figure 3. We would recommend really emphasizing to the reader to refer to Fig S7B as a "key" to this figure. The description of the red/blue arrows and nodes in the methods section (pg. 24, lines 21-36, pg 25, lines 1-4) were also helpful, but very lengthy. We recommend putting an abridged version of this description into the Fig S7 figure legend.<br />

      We appreciate the feedback regarding the readability of Fig. 3. As recommended, we have revised the manuscript to explicitly direct readers to Fig. S8B as an essential "key" for interpreting the network visualization (pg. 8, lines 28). Furthermore, we have added an abridged description of the network elements to the legend of Fig. S8B.

      The S7 figure legend should refer to panels A and B, not E and F.<br />

      We apologize for this oversight. We have corrected the legend of Fig. S8.

      (Suggestion) Are any of the differentially expressed metabolites allosteric regulators of the DE transcription factors? This could be interesting to discuss.<br />

      We appreciate the reviewer's insightful suggestion about the potential allosteric regulation of the DETFs by DEMs. We conducted an extensive literature search to identify any reports related to this perspective. However, to the best of our knowledge, no such direct interactions have been reported to date.

      Significance:

      The study's strength lies in leveraging three omics modalities across large patient cohorts (n ~ 150-240) to identify coherent signals between transcriptomics, proteomics, and metabolomics in postmortem DLPFC tissue. It was encouraging to see that the main result, showing downregulation for TCA, oxidative phosphorylation, and ketone body metabolism, emerged from consistent signals across both proteomics and metabolomics. This result was consistent with previous findings in other models cited by the author4,5 and other studies 6,7 demonstrating deficiency in energy-producing pathways in AD. Another strength of the study is the application of thoughtful methodology to connect differentially expressed proteins and metabolites via an intermediate data layer of metabolic reactions. The authors leverage the KEGG and BRENDA databases and apply sound logic to estimate the effects of enzyme level and metabolite level on pathway activity, with metabolites serving as substrate, product, or allosteric regulator for reactions. This trans-omic network methodology was developed in previous studies cited by the author8,9. However, as written, this study is limited in its contribution of new knowledge to the AD research field. The main conclusion (energy production is down in AD, due to regulatory disruption of energy metabolism) is not strongly supported (see comments 1, 3, and 4 for elaboration). The evidence could be improved by orthogonal approaches: further experimentation, further integration of external datasets, causal modeling, or flux modeling. Alternatively, even in the absence of new experimental and computational approaches, the story could be made more complete by further leveraging the trans-omic network to provide insights into (a) the regulation of energy metabolism; and (b) the impacts of key disrupted metabolites (see comments 7-9). The study is also limited in its demonstrating the power of these methodologies to provide integrative insights. As mentioned above, the integration of enzyme levels and metabolite levels is clearly useful (Figure 3). In contrast, the utility of the mRNA and transcription factor layers was not evident. The study did not appear to improve or expand upon trans-omic network methodology described in the previous works. Finally, the various analyses (analyzing the trans-omic network for nodes with the highest degree centrality, the PPI analysis, and viewing the energy metabolism pathways in the network) provided disparate results that were only tenuously connected in the discussion section.


      Response to Reviewer #2's Comments____

      Evidence, reproducibility and clarity: Summary

      This manuscript integrates public transcriptomic, proteomic, and metabolomic datasets from ROSMAP DLPFC samples to construct a multi-layer metabolic trans-omic network in Alzheimer's disease. By linking transcription factors, enzyme mRNAs, proteins, metabolic reactions, and metabolites, the authors report coordinated downregulation of the TCA cycle, oxidative phosphorylation, and ketone body metabolism, along with mixed regulatory signals in glycolysis/gluconeogenesis. They interpret these patterns as indicative of broad energetic dysfunction and alterations in amino-acid/nitrogen metabolism in AD. While the framework is conceptually appealing, much of the analysis remains descriptive, and several biological interpretations extend beyond what the data can robustly support. The reliance on bulk tissue without accounting for cell-type composition, limited covariate adjustment, and the absence of validation or sensitivity analyses reduce confidence in the mechanistic conclusions. Overall, the study provides a preliminary systems-level overview, but additional rigor is needed before the proposed trans-omic regulatory insights can be considered convincing.

      Major Comments

      1. Interpretation requires more cautious phrasing, and validation is essential. The manuscript frequently asserts that specific pathways are "inhibited" or that energetic deficits are "compensated," but these conclusions extend beyond what the descriptive, bulk-level data can support. Because no metabolic flux, causality, or direct functional measurements are included, the results should be framed as putative regulatory shifts, not confirmed impairments. Critically, key claims about pathway inhibition would require flux modeling, perturbation analyses, or experimental validation to be convincing. Without such validation, the mechanistic interpretations remain speculative.

      We thank the reviewer for this crucial comment. We fully agree that, given the descriptive and bulk-level nature of our analysis, mechanistic interpretations must be made with caution. In the absence of direct metabolic flux measurements or experimental validation, our findings should be interpreted as putative regulatory shifts rather than confirmed functional impairments. Accordingly, we have revised the manuscript to temper mechanistic claims. We have replaced definitive statements with more speculative phrasing (e.g., "Our analysis revealed a putative coordinated downregulation ..." instead of "Our analysis revealed a coordinated downregulation ..." in Abstract section; "we demonstrate the systems-level view of the potential dysregulated energy production ..." instead of "we demonstrate the systems-level view of the dysregulated energy production ..." in pg. 10, lines 25-26).

      Although the authors acknowledge this in the limitations, bulk-level differences may primarily reflect altered proportions of neurons, astrocytes, microglia, and oligodendrocytes rather than true within-cell-type regulation. Incorporating a cell-type deconvolution or performing a sensitivity analysis would substantially improve interpretability. This issue also impacts the trans-omic network: if the molecules included originate from different cell types, the inferred regulatory relationships may not reflect true intracellular processes.

      We appreciate the reviewer's point that bulk-level differences can reflect altered proportions of different brain cell types, subsequently affecting the inferred trans-omic network analysis. To assess the changes in cell type proportions of the samples that we used in our study, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglias, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two group. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).

      Differential analysis covariates. For the differential expression analyses, only gender and PMI were included as covariates. Additional variables, such as age at death, RIN, neuropathological measures, and comorbidities, can strongly influence molecular profiles and should be considered to ensure that the observed differences reflect AD-related biology rather than confounding pathological or technical factors.

      We appreciate the reviewer's comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, including age at death and RIN, is that these data for each sample were not available. Thus, we referred to original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, or education. Regarding the proteomic dataset, in the original article, age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).

      Network stability and sample non-overlap. Proteomic, transcriptomic, and metabolomic data come from partially overlapping individuals. The authors should test whether the reconstructed network is robust to: different significance thresholds, restricting analyses to overlapping samples and alternative definitions of AD vs control.

      __ __We appreciate the reviewer's comment for the trans-omic network stability. In our study, the number of individuals for whom all omic modalities were measured was relatively small (n=25 in CT and n=35 in AD). This limited overlap reduces statistical power and can affect the downstream network construction. We have acknowledged this limitation in the revised manuscript and clarified that the reconstructed networks should be interpreted with caution regarding reproducibility and generalizability (pg. 13, lines 13-23).

      Minor Comments

      1. Some TF enrichment and regulatory inferences lack explicit mention of multiple-testing correction.

      We apologize for the lack of clarity in our original description. We have corrected for multiple-testing for the TF inference. Thus, we have revised the Methods section to explicitly describe the correction method used and the threshold applied (pg. 23, lines 23-24).

      The limitations section is strong but should explicitly discuss the influence of postmortem interval on metabolite levels.<br />

      We appreciate the reviewer's comment about the effect of postmortem interval on changes in metabolite levels. Accordingly, we have added the description of this perspective in our revised manuscript (pg. 13, lines 1-5).

      __*Reviewer #2 (Significance (Required)):

      Significance *__

      The study extends a trans-omic integration framework, originally applied to metabolic disease, into the context of Alzheimer's pathology. Although the biological findings largely confirm known alterations in mitochondrial and energy metabolism, the network-based approach offers a structured way to view cross-layer regulatory changes. Its main advance is conceptual rather than biological, providing a unified framework rather than uncovering fundamentally new mechanisms. This work will primarily interest researchers in neurodegeneration and systems biology, as well as computational groups developing multi-omics integration methods.

      Response to Reviewer #3's Comments


      Evidence, reproducibility and clarity

      This study leverages existing transcriptomic, metabalomic and proteomic datasets from prefrontal cortex (PFC) to assess metabolic dysregulation in Alzheimer's disease (AD). They found a downregulation of multiple metabolic pathways, including TCA cycle, oxidative phosphorylation, and ketone metabolism, that may explain bioenergetic alterations in AD. The study used matching ROSMAP omics datasets from the DLPFC that have allowed more robust data integration. However, the datasets are all generated using bulk tissue, which makes data interpretation difficult. For example, the AD changes they observed may be due to shifts in cell type proportion with disease (e.g. cell death, neuron inflammation). Did the authors account for any potential shifts in cell type proportion in their analysis?* *

      __If the assumption is that the changes in AD are cell intrinsic, which cell types are likely to be impacted? Can the authors integrate any existing single-cell analysis to infer which cell types may be driving the signals they detect, and whether this accounts for some of the antagonistic regulatory effects that were detected?______

      We thank the reviewer for their insightful comments. We agree that the use of bulk tissue datasets cannot account for cell-type heterogeneity. As noted in our Limitations section (pg. 12, lines 24-27), we recognize that previous studies have found that the Braak stage is correlated positively with microglia and astrocyte proportions and negatively with oligodendrocyte proportion (Hannon et al., 2024; Shireby et al., 2022). Regarding the integration of single-cell analysis, we have referenced recent snRNA-seq findings (Mathys et al., 2024) in our Limitations section (pg. 12, lines 28-32) to deconvolve our bulk signatures.

      Furthermore, in our revised manuscript, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglia, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two groups. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).

      Significance

      The manuscript provides multimodal insight into metabolic dysregulation in AD in the PFC. Given that metabolic dysfunction is likely to play a major in disease pathogenesis, this is a study of importance. However, the findings lack granularity at the cell type level, which limits the impact of the study.

      Reference

      1. Baloni, P., Arnold, M., Buitrago, L., Nho, K., Moreno, H., Huynh, K., Brauner, B., Louie, G., Kueider-Paisley, A., Suhre, K., Saykin, A. J., Ekroos, K., Meikle, P. J., Hood, L., Price, N. D., Alzheimer's Disease Metabolomics Consortium, Doraiswamy, P. M., Funk, C. C., Hernández, A. I., ... Kaddurah-Daouk, R. (2022). Multi-Omic analyses characterize the ceramide/sphingomyelin pathway as a therapeutic target in Alzheimer's disease. Communications Biology, 5(1), 1074.
      2. Baloni, P., Funk, C. C., Yan, J., Yurkovich, J. T., Kueider-Paisley, A., Nho, K., Heinken, A., Jia, W., Mahmoudiandehkordi, S., Louie, G., Saykin, A. J., Arnold, M., Kastenmüller, G., Griffiths, W. J., Thiele, I., Alzheimer's Disease Metabolomics Consortium, Kaddurah-Daouk, R., & Price, N. D. (2020). Metabolic Network Analysis Reveals Altered Bile Acid Synthesis and Metabolism in Alzheimer's Disease. Cell Reports. Medicine, 1(8), 100138.
      3. Batra, R., Arnold, M., Wörheide, M. A., Allen, M., Wang, X., Blach, C., Levey, A. I., Seyfried, N. T., Ertekin-Taner, N., Bennett, D. A., Kastenmüller, G., Kaddurah-Daouk, R. F., Krumsiek, J., & Alzheimer's Disease Metabolomics Consortium (ADMC). (2023). The landscape of metabolic brain alterations in Alzheimer's disease. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 19(3), 980-998.
      4. Batra, R., Krumsiek, J., Wang, X., Allen, M., Blach, C., Kastenmüller, G., Arnold, M., Ertekin-Taner, N., Kaddurah-Daouk, R., & Alzheimer's Disease Metabolomics Consortium (ADMC). (2024). Comparative brain metabolomics reveals shared and distinct metabolic alterations in Alzheimer's disease and progressive supranuclear palsy. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 20(12), 8294-8307.
      5. Cahill, K. M., Huo, Z., Tseng, G. C., Logan, R. W., & Seney, M. L. (2018). Improved identification of concordant and discordant gene expression signatures using an updated rank-rank hypergeometric overlap approach. Scientific Reports, 8(1), 9588.
      6. Fröhlich, A. S., Gerstner, N., Gagliardi, M., Ködel, M., Yusupov, N., Matosin, N., Czamara, D., Sauer, S., Roeh, S., Murek, V., Chatzinakos, C., Daskalakis, N. P., Knauer-Arloth, J., Ziller, M. J., & Binder, E. B. (2024). Single-nucleus transcriptomic profiling of human orbitofrontal cortex reveals convergent effects of aging and psychiatric disease. Nature Neuroscience, 27(10), 2021-2032.
      7. Green, G. S., Fujita, M., Yang, H.-S., Taga, M., Cain, A., McCabe, C., Comandante-Lou, N., White, C. C., Schmidtner, A. K., Zeng, L., Sigalov, A., Wang, Y., Regev, A., Klein, H.-U., Menon, V., Bennett, D. A., Habib, N., & De Jager, P. L. (2024). Cellular communities reveal trajectories of brain ageing and Alzheimer's disease. Nature, 633(8030), 634-645.
      8. Hannon, E., Dempster, E. L., Davies, J. P., Chioza, B., Blake, G. E. T., Burrage, J., Policicchio, S., Franklin, A., Walker, E. M., Bamford, R. A., Schalkwyk, L. C., & Mill, J. (2024). Quantifying the proportion of different cell types in the human cortex using DNA methylation profiles. BMC Biology, 22(1), 17.
      9. Johnson, E. C. B., Carter, E. K., Dammer, E. B., Duong, D. M., Gerasimov, E. S., Liu, Y., Liu, J., Betarbet, R., Ping, L., Yin, L., Serrano, G. E., Beach, T. G., Peng, J., De Jager, P. L., Haroutunian, V., Zhang, B., Gaiteri, C., Bennett, D. A., Gearing, M., ... Seyfried, N. T. (2022). Large-scale deep multi-layer analysis of Alzheimer's disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nature Neuroscience, 25(2), 213-225.
      10. Johnson, E. C. B., Dammer, E. B., Duong, D. M., Ping, L., Zhou, M., Yin, L., Higginbotham, L. A., Guajardo, A., White, B., Troncoso, J. C., Thambisetty, M., Montine, T. J., Lee, E. B., Trojanowski, J. Q., Beach, T. G., Reiman, E. M., Haroutunian, V., Wang, M., Schadt, E., ... Seyfried, N. T. (2020). Large-scale proteomic analysis of Alzheimer's disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nature Medicine, 26(5), 769-780.
      11. Maitra, M., Mitsuhashi, H., Rahimian, R., Chawla, A., Yang, J., Fiori, L. M., Davoli, M. A., Perlman, K., Aouabed, Z., Mash, D. C., Suderman, M., Mechawar, N., Turecki, G., & Nagy, C. (2023). Cell type specific transcriptomic differences in depression show similar patterns between males and females but implicate distinct cell types and genes. Nature Communications, 14(1), 2912.
      12. Mathys, H., Boix, C. A., Akay, L. A., Xia, Z., Davila-Velderrain, J., Ng, A. P., Jiang, X., Abdelhady, G., Galani, K., Mantero, J., Band, N., James, B. T., Babu, S., Galiana-Melendez, F., Louderback, K., Prokopenko, D., Tanzi, R. E., Bennett, D. A., Tsai, L.-H., & Kellis, M. (2024). Single-cell multiregion dissection of Alzheimer's disease. Nature, 632(8026), 858-868.
      13. Novotny, B. C., Fernandez, M. V., Wang, C., Budde, J. P., Bergmann, K., Eteleeb, A. M., Bradley, J., Webster, C., Ebl, C., Norton, J., Gentsch, J., Dube, U., Wang, F., Morris, J. C., Bateman, R. J., Perrin, R. J., McDade, E., Xiong, C., Chhatwal, J., ... Harari, O. (2023). Metabolomic and lipidomic signatures in autosomal dominant and late-onset Alzheimer's disease brains. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 19(5), 1785-1799.
      14. Plaisier, S. B., Taschereau, R., Wong, J. A., & Graeber, T. G. (2010). Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures. Nucleic Acids Research, 38(17), e169.
      15. Shireby, G., Dempster, E. L., Policicchio, S., Smith, R. G., Pishva, E., Chioza, B., Davies, J. P., Burrage, J., Lunnon, K., Seiler Vellame, D., Love, S., Thomas, A., Brookes, K., Morgan, K., Francis, P., Hannon, E., & Mill, J. (2022). DNA methylation signatures of Alzheimer's disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nature Communications, 13(1), 5620.
      16. Tasaki, S., Xu, J., Avey, D. R., Johnson, L., Petyuk, V. A., Dawe, R. J., Bennett, D. A., Wang, Y., & Gaiteri, C. (2022). Inferring protein expression changes from mRNA in Alzheimer's dementia using deep neural networks. Nature Communications, 13(1), 655.
      17. Varma, V. R., Wang, Y., An, Y., Varma, S., Bilgel, M., Doshi, J., Legido-Quigley, C., Delgado, J. C., Oommen, A. M., Roberts, J. A., Wong, D. F., Davatzikos, C., Resnick, S. M., Troncoso, J. C., Pletnikova, O., O'Brien, R., Hak, E., Baak, B. N., Pfeiffer, R., ... Thambisetty, M. (2021). Bile acid synthesis, modulation, and dementia: A metabolomic, transcriptomic, and pharmacoepidemiologic study. PLoS Medicine, 18(5), e1003615.
      18. Wan, Y.-W., Al-Ouran, R., Mangleburg, C. G., Perumal, T. M., Lee, T. V., Allison, K., Swarup, V., Funk, C. C., Gaiteri, C., Allen, M., Wang, M., Neuner, S. M., Kaczorowski, C. C., Philip, V. M., Howell, G. R., Martini-Stoica, H., Zheng, H., Mei, H., Zhong, X., ... Logsdon, B. A. (2020). Meta-Analysis of the Alzheimer's Disease Human Brain Transcriptome and Functional Dissection in Mouse Models. Cell Reports, 32(2), 107908.
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost.

      They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Thank you for your kind words.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification.

      Thank you for this comment, which made us realize that we had not adequately explained the key insight of Figure S3. We have expanded the caption of Figure S3 to clarify:

      “DASM selection factors match the pattern seen in experimental measurements, while masked language models show artifacts from the codon table.

      The experimental data (left two panels) show a slight decrease in median scores for amino acids requiring multiple nucleotide mutations (“multiple”) versus single mutations (“single”).

      DASM captures this pattern, showing similar distributions for both categories.

      In contrast, AbLang and ESM assign radically lower scores to multinucleotide amino acid substitutions, consistent with the masked language modeling objective learning codon-level mutation probabilities as described in the main text (Figure 1a).”

      This figure directly supports our claim: the experimental fitness data show similar distributions for single-mutation vs multiple-mutation amino acids, yet AbLang2 and ESM assign dramatically different scores to these groups, while DASM does not.

      Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction.

      It's an interesting idea to consider enhancing existing models. However, this approach faces some challenges. Most fundamentally, it is difficult to recast AbLang and other such models in an evolutionary framework: the masked language objective is simply not an evolutionary one. We have written a whole paper working to do this (https://doi.org/10.1371/journal.pcbi.1013758) and the results were middling despite our best efforts. Specifically regarding multihit, the effects of multihit are minor compared to the codon table effects, and those require the structure of codon-based evolutionary model.

      Further descriptions of model components and validation metrics could help make the manuscript more readable.

      We have clarified several aspects of the model in the revision: we now describe the Thrifty neutral model in the introduction, clarify the transformer architecture and wiggle activation function in the Methods, and explain the joint branch-length optimization procedure.

      In the introduction we now describe Thrifty:

      “This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F 5-mer model.”

      In the Methods we clarify the architecture:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.

      This function asymptotes to zero for highly deleterious mutations and grows sub-linearly for beneficial ones.”

      And the joint optimization:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Thank you for your kind words.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument.

      This is an interesting idea! However, it seems to us that this approach has some fundamental limitations. Existing models operate on amino acid sequences with no nucleotide representation, so while they can be implicitly biased by the codon table, they have no signal to separate selection from effects related to the codon table and SHM rates.

      We interpret this comment as proposing that we could use fine-tuning on functional data to pull out the selection components (that would only affect the functional data) versus the mutation component. That sounds like an interesting research project. We would be concerned that there are correlations between mutability and selective effects (e.g., CDRs are both more mutable and under different selection), creating identifiability problems unless separate data sources are used as we do here.

      Additionally, the fine-tuning approaches we are aware of are taskspecific: they require labeled data from a specific assay (binding to antigen X, expression in system Y) that may or may not relate to the general evolutionary selection signal. Also, such approaches are limited to the specific data used and may not do a good job of guiding the model to a signal that is not present in the training data.

      By structuring the model as we do, we obtain the evolutionary interpretation directly from phylogenetic signal without requiring taskspecific supervision.

      In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

      We agree, and this is fundamental to any general purpose model. Predictions of binding patterns for a specific target requires information about that target to be specified in the training data. We look forward to developing such task-specific models in the future.

      We have added a paragraph to the Discussion clarifying this limitation:

      “The current generation of DASM model does not use any antigen-labeled training data.

      The signal that it leverages to infer some limited ability to predict binding comes from natural affinity maturation.

      This affinity maturation comes through natural repertoires and so represents a mix of all of the antigens to which the sampled individuals have been exposed.”

      Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in > a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Thank you.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight.

      We are also excited about the ability of these models to provide interpretable predictions. We have dedicated an entire paper to this direction: “A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder" in MBE (https://doi.org/10.1093/molbev/msaf186). The interpretations offered by that paper overturn some of the oversimplified dogma about how natural selection works in antibodies (purifying in FWK and diversifying in CDR), giving a more nuanced sitewise perspective. The paper also highlights the importance of specific structural features of the antibodies.

      This eLife paper, on the other hand, is focused on comparison to antibody language models and benchmarking zero-shot prediction on functional tasks.

      We have better highlighted this new paper in our revision with:

      “We have dedicated a companion paper to leveraging this interpretability to provide new perspectives on the operating rules of affinity maturation (Matsen et al., MBE 2025): that work provides a nuanced sitewise perspective on natural selection in antibodies that challenges classical oversimplified views of selection patterns.”

      The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

      We appreciate the concern and the desire to reveal all the factors that lead to a strong performance result. For this particular paper, we feel that this is less of a concern because we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. We now describe how other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      Regarding baseline approaches, our previous paper includes comparisons to simpler models for the evolutionary objective. Here we focus on comparison to antibody language models for functional prediction. Comparing between state-of-the-art models is the standard practice for papers in this field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We recommend modest amounts of revision, discussed below:

      Major comments:

      (1) In the first section of the results, there is extensive discussion on shortcomings of existing antibody language models like AbLang2 that seems to associate all of the performance gap with the inability to separate non-synonymous mutations separated by 1 or 2+ substitutions.

      In reality, some of the lower likelihoods in the 2+ substitution case could actually reflect real fitness deficits (while others could indeed be rarer occurrences in the training data). The authors should either moderate these claims or do an analysis that leverages antibody deep mutational scanning data to show that, conditioned on the fitness of the antibody (probably expression) being the same (either all high or all low), AbLang2 still artefactually considers rarer-training/less-codon-accessible variants to be less fit.

      As described above, we believe that this is addressed by Figure S3, but if not please correct us.

      (2) Some in the machine learning for antibody community might view the set of benchmarked datasets to be incomplete and somewhat arbitrarily selected, though we do think this is a good start, and the results are promising. A dataset commonly used in this field that is missing from this paper is from Shehata et al. (https://pubmed.ncbi.nlm.nih.gov/31553901/). A binding affinity experiment that is also commonly used in the field is from Phillips et al. (https://elifesciences.org/articles/71393) - this dataset measures combinatorial changes of framework regions on binding, which may be especially relevant here.

      We're glad to have the opportunity to clarify this, thanks.

      We based our evaluations on the April 2024 version of the FLAb benchmarking project (https://doi.org/10.1101/2024.01.13.575504) which preceded our work and thus was not subject to selection bias by us. We took the largest data sets in that repository. After this we became aware of the rich data sets offered by the Whitehead lab that provided binding measurements for many variants for a number of antigens, and added that to the evaluation set.

      We have clarified this in the manuscript:

      “We based our evaluations on the April 2024 version of the FLAb benchmarking project, which preceded our work and thus was not subject to selection bias by us.

      We also benchmarked high-throughput binding data (more recent than FLAb) from the Whitehead lab that provided affinity measurements across many variants and antigens.”

      The Shehata dataset is interesting but doesn't fit so much in the DASM mold: it is a survey of biophysical properties across many independent antibodies rather than a deep investigation of point mutants of a smaller collection of focal antibodies.

      FLAb has grown to include the Phillips dataset. We are working full-tilt on the next version of DASM and will be including many other datasets in our paper on DASM2. Thanks for the tip!

      (3) Similar to the above comment, we were also extremely curious as to why the authors did not test data from DeWitt et al. (https://pubmed.ncbi.nlm.nih.gov/40661619/). Instead, the authors only make a cryptic reference to this study on lines 201-6, but we could not even find a figure describing the results discussed on these lines. It would be great to actually include this data.

      We agree, however, our model is for human rather than mouse. We would like to train a mouse model in the future but have not yet lined up the appropriate data.

      (4) The authors should comment on potential data leakage if the SHM trajectories used in training have a similar sequence or antigen similarity to the benchmark expression/binding datasets.

      This is a good question that we should clarify. Our model is trained only on evolutionary trajectories and not functional data. Evaluation is then done on functional data without fine-tuning. Because these evaluation data are categorically different from the training data and thus data leakage is not a problem. Recall that our model is zero-shot: it only considers evolutionary trajectories and not functional data as such. In a similar way, other self-supervised models such as MLMs do not exclude seeing an antibody in the training data when they are doing functional prediction.

      We have clarified this in the manuscript with

      “Because the DASM is trained exclusively on evolutionary trajectories rather than functional measurements, evaluation on expression and binding benchmarks is strictly zero-shot with no risk of data leakage.”

      Relatedly, what happens if this approach is applied to completely de novo antibodies?

      We direct this reviewer to the Shanehsazzadeh dataset that involves antibodies that were suggested by an AI algorithm rather than observed in nature.

      If the reviewer is referring to completely synthetic antibody molecules, such as those generated by inverse folding, we have not attempted this.

      (5) It makes sense that you included the multihit correction as a response to your earlier instantiation (without this correction) underestimating the probabilities of multiple mutations in a codon associated with a single amino acid substitution (lines 476-477).

      However, this could potentially make for a somewhat unfair comparison to existing methods: if, say, we took AbLang (or another comparator) and also applied a multi-hit correction (even in some naive way at inference time), how would that compare to DASM? If this comparison favors DASM, it would show that models need more than just such a correction on top of existing methods to do good sequence scoring--which would only amplify the impact of the results.

      Thank you for this suggestion. We believe that we have addressed it in the response to the public reviews, but please let us know if not.

      Minor comments:

      (1) It would be worth explicitly defining/summarizing the mutation model used in the study, e.g. giving an overview of Thrifty in the introduction or where it first appears.

      Thanks, we have done this:

      “Our approach separates mutation and selection processes by encoding functional effects in a Deep Amino acid Selection Model (DASM) while explicitly modeling mutation using a separate fixed model trained on neutrally evolving data.

      This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F (Yaari et al., 2013) 5-mer model.”

      (2) Paragraph starting on line 58: it sounds like you're suggesting that masked deep learning models will learn certain features of genomes in a certain order. We suggest that you weaken the language, giving examples of various things the model could learn, not implying that such models will necessarily learn the most useful features after the less useful ones.

      We have fixed this by removing the "First... Second... Third... Finally" ordering:

      “It could memorize the germline genes and learn about the probabilities of V(D)J recombination.

      It could learn the codon table, as according to this table some aminoacid mutations are much more likely than others. It could learn rates of somatic hypermutation...

      It could also learn about the impact of amino acid mutations on antibody function through natural selection in the course of affinity maturation, which is the desired signal.

      However, this desired signal is confounded by the preceding factors.”

      (3) Line 72: You make a strong claim that existing models conflate mutation and selection without knowing for sure that they didn't successfully learn these components separately (it seems this would require a lot of mechanistic interpretability). The language could be softened here.

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (4) Line 79: Say a bit more about the separate fixed mutation model here. Why shouldn't we worry about this choice (especially the word "fixed") biasing your results? Does the empirical performance of your method suggest this doesn't really matter?

      We have added to the description of the fixed mutation model, as described above.

      As described in the public response, training SHM models on out-of-frame sequences is an established methodology for characterizing mutation in the absence of selection. In principle one could jointly train a model of SHM and selection, but one could have identifiability problems as there is a correlation between more mutable sites (e.g. in the CDRs) and those under relaxed selection. Using out-of-frame sequences gives a clean an independent description of the SHM process.

      (5) Line 81: on what benchmarks does it outperform? State briefly.

      Great suggestion. Done:

      “The DASM, trained on substantially less data, outperforms AbLang2 and general protein language models including ESM2 and ProGen2-small. This outperformance holds on the largest benchmark datasets of the FLAb collection and on recent high-throughput binding assays.”

      (6) Paragraph starting on line 90: The topic sentence reads a bit vague to us. Do you mean that you want to learn the extent to which models are regurgitating nucleotide similarity of AAs in determining the scores associated with AAs at masked sites?

      Thank you. We have updated to

      "We first sought to understand the extent to which processes such as neutral mutation rate and the codon table influence antibody language model prediction at masked sites."

      (7) Paragraph starting on line 108: feels speculative and maybe better for the discussion...

      We appreciate this comment, but we have decided to keep the content where it is. Although this would make sense as a Discussion item we feel like it fits well here right next to the evidence, and the structure of our Discussion doesn't really have a place for it.

      (8) Paragraph starting on line 116: don't say "sequences from [12]" or "method of [15]." Explain what these are before giving the citation.

      Whoops! Thanks. We have fixed these.

      (9) Line 134: Consider giving a brief definition of perplexity?

      Thanks. We added our favorite definition:

      “Perplexity (as defined in the Methods) is the standard way of evaluating the plausibility of a sequence according to a model: it is the acrosssite geometric mean of the inverse probability of the observed amino acid.”

      (10) Line 154: A citation here could be useful to support the claim that these models are learning phylogeny.

      We have replaced with the more clearly established "codon table":

      “We implemented a model to learn amino-acid preferences of antibodies without being influenced by germline genes, the codon table, or SHM biases.”

      (11) Lines 161-162: Given that phylogenetic inference methods can be tough to scale, we're curious how you managed to get 2 million PCPs from the data? Did you construct a bunch of different phylogenies (in > parallel)?

      Indeed! We now clarify in the methods section that these trees were run in parallel across clonal families:

      “As in our previous work, tree inference and ancestral sequence reconstruction were performed per clonal family with the K80 substitution model...

      Because these clonal families are independent these phylogenetic inferences were run in parallel.”

      (12) Line 173-174: Can you say more about the joint optimization of the branch lengths? Are you conditioning on a phylogenetic tree topology only, and leaving the branch lengths unknown? Do you account for the fact that these branch lengths in the same phylogenetic tree aren't independent?

      Thanks for pointing out the need to clarify these points. We have done so in the methods section and provided a pointer to the methods section in the main text.

      In the main text we now say:

      “We trained DASMs of several sizes (~1M, ~4M, ~7M) using joint optimization of branch length t and parameters of the DASM (see Methods for details).”

      And in the Methods:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      (13) Line 358: Yes, in a trivial sense, separating mutation and selection means that we know exactly how each of those two components has been learned. We would be curious if you could say anything about mechanistic interpretability within the deep learning selection model. If not, could this be a future research direction?

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (14) Lines 384-386--indeed. Do you have any proposals for how a phylogeny could be constructed at this scale?

      As above this is not one big phylogeny but many, which invites parallelization.

      Reviewer #2 (Recommendations for the authors):

      (1) I agree that a full study of fine-tuning strategies for all possible alternative models is beyond the scope of the paper. However, a little bit of fine-tuning would go a long way to demonstrate how easy (or hard) it is to extract the relevant signal from a general protein language model embedding.

      As described in our response to the public reviews, we appreciate this point but have decided to focus on the core novelty of the paper and leave fine-tuning experiments to future work.

      (2) The authors might want to add some discussion about what signals their models capture with regard to binding affinity (averages), and how this limitation might be addressed in future work.

      As described in our response to the public reviews, we have added a paragraph to the Discussion clarifying this limitation.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I think more references have to be provided re: Antibody "foundation" language models, e.g. adding AntiBERTy and the two versions of AntiBERTa.

      We have added citations to those two models, although we weren't sure what the second version of AntiBERTa was. There are very many antibody language models. If we could use number ranges we would cite a dozen or more, but I hesitate to add many of them in the eLife format, which has parenthetical citations. If there are others that you consider essential don't hesitate to suggest them.

      (2) A key point of the approach is the disentanglement of “mutation” and “selection”, as mentioned in the introduction. However, the explanation of what the authors mean by mutation and selection comes only later. I would anticipate it in the introduction for clarity.

      This is a great point. The revised intro has this in the second sentence:

      “Natural antibodies are generated through V(D)J recombination, and refined by somatic hypermutation and affinity-based selection in germinal centers.”

      and the "While the masked..." paragraph now more clearly calls out selection.

      (3) Line 133: expression of what? Could the authors also explain mechanistically why expression should be impacted by a mutation? In what conditions do these data sample expression?

      We have clarified that it is expression in a phage display library:

      “To do so, we used the largest dataset of the FLAb collection of benchmarks, which measures the effect of single mutations on expression in a phage display library.”

      (4) Line 142: Clarify that 0.49 and 0.3 are correlation coefficients. Also, what type of correlation coefficient is this?

      Thanks for the catch! They are Pearson correlations as we now describe.

      (5) Line 173: The hyperparametric search should have been more documented (with a description of how it was carried out and plots).

      As described in our response to the public reviews, we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. Other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      (6) Line 358: The authors say that 'DASMs provide direct interpretability'. However, this is not really inspected. A valuable addition would be to show how such interpretability is made possible, how it can recapitulate existing biological knowledge or provide hints for antibody engineering.

      As described above, this is addressed in detail in our previous paper.

      (7) Line 398: 'Inferred insertions or deletions were reversed, so that all sequences align to the naive sequence without gaps.' Could the authors comment on whether this is a limitation of the approach, why it wasn't dealt with and whether it could be the direction of future work?

      Funny you should mention this! We have been planning out such an extension in detail recently. We have added a sentence in the discussion:

      “We also have plans to extend the DASM framework to estimate the effect of natural selection on insertion and deletion events.”

      (8) Line 430-431: Could the authors clarify 'shared' over what? Also, I believe these two lines really describe the DASM architecture. This should be spelt out more clearly and tied to the description provided in lines 173-175. A diagram of the architecture would be a valuable addition to provide a full picture of the model (this could be added to the general diagram of the modelling approach of Figure S8).

      We have clarified in the text that this is indeed a description of the DASM architecture -- thanks for the catch:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.”

      The architecture is very “stock” - just the default torch TransformerEncoder, so I don't think that it merits a diagram. We have expanded our discussion of the simple architecture in the revision. This sits in contrast to the setup for the loss function, which is quite custom and is the subject of Figure 2 and Figure S8.

      (9) Another general remark is that, to fully showcase the predictive advantage offered by DAMS with all the modelling choices entailed, one could show the performance of simpler models, like the mutation model alone (with no selection factors), or models where selection factors are just learnt independently for each site, or are learnt with a simple linear layer instead of a transformer (these are just ideas of some simpler approach that can set baselines over which DASM improvement can be shown).

      This is a great suggestion. The primary focus of this paper is in comparing to alternate antibody language models in terms of functional prediction.

      These simpler models could be used for comparing the evolutionary objective, which we did in our previous paper (https://doi.org/10.1093/molbev/msaf186). We note that a sitewise model with fixed sites cannot really be appropriately formulated due to sequences being of different lengths.

      Additional changes

      In addition to the reviewer-requested changes, we added a comparison of ESM2 model sizes (650M vs 3B parameters) on the Koenig benchmark. We found that scaling ESM2 from 650M to 3B parameters did not improve performance. Indeed, the larger model showed slightly degraded correlations, particularly for light chain predictions. This is consistent with recent observations that medium-sized protein language models can outperform larger ones on transfer learning tasks (Vieira et al., Sci. Rep. 2025). We added Table S2 documenting these results and cite this finding in the main text to justify our use of the 650M model throughout the analyses. After doing this, we realized for the Shanehsazzadeh evaluation we had accidentally used ESM2-3B instead of ESM2-650M. The corrected ESM2-650M values are slightly lower (0.191 and 0.308 for sequence lengths 119 and 120, respectively, compared to the previous values of 0.248 and 0.337). This correction does not affect our conclusions, as DASM substantially outperforms ESM2 on this benchmark before and after the change.

      We also realized in the course of revision that we had been scoring AbLang2 using the masked-marginals pseudo-perplexity approach for the single-mutant Koenig dataset (Figure 1c), rather than the standard persequence pseudo-perplexity used elsewhere in the paper. For maskedmarginals, probabilities are computed using only wild-type context, whereas standard pseudo-perplexity uses each variant's own context.

      The masked-marginals approach has a simple interpretation: for singlemutation variants, it is a linear transformation of the log ratio of the variant amino acid probability to the wild-type amino acid probability, both evaluated under wild-type context. This log-odds ratio directly measures how much the model prefers the mutation over the original residue.

      We found that masked-marginals performed better for AbLang2 on this dataset, so we continued using it for Figure 1c. However, for the benchmarking table (Table 1), we switched to per-sequence pseudoperplexity as for the other comparisons in the paper, following the standard benchmarking protocol defined in FLAb (Chungyoun et al., 2024). We document both approaches in the Methods section:

      “An alternative “masked-marginals” approach scores variants using only wild-type context.

      For a wild-type sequence w, masked-marginals computes . for all amino acids a at each position i once, then uses these wild-type-derived probabilities to compute pseudoperplexity for any variant x...

      For a single-mutation variant x that differs from wild-type w only at position j, all terms except position j cancel when comparing to wild-type, giving . Thus, the log-probability difference between variant and wild-type amino acids equals, up to an additive constant that depends only on the wild-type sequence, negative n times the log pseudo-perplexity of the variant.

      For Figure 1c on the single-mutant Koenig dataset, we found that this approach gave a higher correlation for AbLang2 and so used it in that figure.

      For benchmarking comparisons (Table 1), we followed standard practice and used per-sequence pseudo-perplexity.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we have addressed each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we have included summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we now report both the empirical selection statistics and the corresponding converted p-values in either the main text or supplement, and both outputs are also provided in the full summary files. This dual approach will allow readers to fully interpret the results under both perspectives.

      Expanded discussion of admixture timing and population structure: We have carefully considered the reviewers' suggestions to incorporate additional descriptions of population structure or demographic analyses, and have done so in our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so we have now made the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed in our Data Availability statement.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S6 and S7. To better highlight this result, we now also include the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      We agree that more fine-scale demographic analyses would be informative. We now additionally provide an estimation of the admixture date in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes and discussion using the DATES software which is optimized for ancient genomes.

      We have encountered problems with using different standard date estimation software, including DATES, which give very inconsistent and unstable results. As we note in our text, we suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands, low LD differentiation between the source populations, or multiple pulses of admixture, which may be breaking one or more of the assumptions of these methods. Assessing the limitations of these methods is beyond the scope of this current manuscript; however, we will continue working on this problem for future studies, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In revision, we clarify in the Main Text - Results - HLA-B Allele Frequencies and Discussion that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions were not originally specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S6 and S7. As stated in our response to Reviewer #1, in our revisions, we now more clearly state the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis.

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in Main Text Figure 2, we now additionally provide the distribution in ROH lengths across all individuals for each cohort in a new Supplemental Figure S3. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts. As the reviewer points out, these longer ROHs are possibly indicative of a more recent or stronger bottleneck in the Faroes relative to the comparison cohorts. We highlight this result in Main Test - Results - Population Structure and Relatedness.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example, even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale or even earlier, based on the DATES estimates. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (dated to 260 years BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. In the original manuscript, we mentioned this as a likely possibility in the Main Text - Discussion: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” In our revisions, we further included the DATES estimations of the timing of admixture in the modern and historical Faroese samples, which pre-date the timing of settlement in both cases. We highlight these points in the Discussion. And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split (Discussion and Results - Signals of Positive Selection). iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In our revisions, we now clarify in the Discussion the limitations and time-scale at which the iHS statistic may detect selection. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations. We highlight this point in the Discussion.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We originally did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation.

      However, given the reviewer’s comment, we have now included the frequencies as well as these caveats in the Discussion. We additionally calculated the LCT allele frequency in other ancient samples, and assuming that we had good proxies for the sources at the time of admixture, we calculated the expected allele frequency in the admixed ancestors of the Faroese founders (Discussion), but again note the limitations in using such a calculation in this context.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we have additionally supplied both the standardized iHS / XP-EHH values in Supplementary Fig. S10 as well as these values transformed to p-values in Main Text Fig. 3. Additionally, both outputs are provided in the publicly available selection scan results files. We provide the method for obtaining p-values in the subsection “Selection scan” from the Methods section - we used a method developed earlier by Fariello et al.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in our revisions, we have updated the Main Text - Discussion to acknowledge this possibility.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please note that there was disagreement among the reviewers regarding the reporting of outliers.

      As stated in our response to the public reviews, given the disagreement, we include both the empirical selection statistics as well as the converted p-values in the main text, supplement and selection scan files.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2:

      Define labels / explain why they differ from 1000k populations / make them consistent throughout the manuscript.

      We apologize for the error in labels for Figure 2. These are the same populations used in other figures and analyses. We have fixed this in our revisions so that the labels are consistent with the rest of the manuscript.

      (2) Figure S2 label:

      "The matrix is rescaled after subsetting the individuals, so although the scales are different, the overall structure remains the same." I do not understand this sentence. The samples are different, the scale is different, the apparent pattern is different - what overall structure is supposed to be the same?

      We apologize that the language was not clear in the figure label. The scales between panels A and B are different, because popkin rescales the kinship labels after subsetting so that the minimum kinship is zero. This is necessary when subsetting individuals from an already estimated kinship matrix particularly when subsetting from global populations to a single region. From the popkin documentation: “This rescaling is required when subsetting results in a more recent Most Recent Common Ancestor (MRCA) population compared to the original dataset (for example, if the original data had individuals from across the world but the subset only contains individuals from a single continent)” (https://rdrr.io/cran/popkin/man/rescale_popkin.html).

      We also described this in the Methods - Population Genetics - Kinship and runs of homozygosity section: “When calculating the kinship matrix for the Faroese WGS cohort only, we used the rescale_kinship() function, which will change the most recent common ancestor and give different absolute values, but the overall relationship structure in the subpopulation remains the same.”

      That is, the relative kinship within the Faroese cohort remains consistent, despite the different scale.

      It is difficult to see the kinship of Faroese individuals in the larger plot with all cohorts, which is why we subset and visualize the Faroese cohort alone. We have updated the Fig. S2 label language to make this more clear.

      (3) "Iron Age Wet Europe"

      We have corrected this typo to “Iron Age West Europe.”

      I'm confused if the ancient Faroese were part of the imputation panel: Figure 5 legend implies they are, methods imply they are not.

      The ancient samples are not imputed with the modern Faroese and reference samples, but they are the imputed data downloaded from Allentoft et al. and merged with the modern Faroese cohort. We specify that we downloaded imputed ancient samples in both the Methods - Fine-scale structure estimation using ancient genomes and in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes. The description of the imputation panel in the Methods - Bioinformatics - Variant calling and imputation refers only to the modern samples.

      (4) Kinship:

      The kinship of the Faroes is useful (and nice) as a QC analysis showing the genetic data matches the expectations from the pedigree. I don't know what I should learn from the kinship of the 1000kg samples (I'd assume one could learn something about bottleneck strength from this), but it's not developed/discussed.

      The global kinship matrix provides complementary information to PCA and ROH, as another way to quantify and visualize the relationships within and between populations. Additionally, as the reviewer mentioned, bottlenecks increase kinship within populations. Given that popkin estimates kinship measured from a Most Recent Common Ancestor, we can best observe this increase in kinship when comparing to other global populations. We more clearly delineate what can be observed from Fig. S2A versus Fig. S2B in the Results - Population Structure and Relatedness.

      Reference

      (1) Gretzinger, J. et al. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 610, 112–119 (2022)

      (2) Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    1. Romeo. Give me that mattock and the wrenching iron. Hold, take this letter; early in the morning 2960See thou deliver it to my lord and father. Give me the light: upon thy life, I charge thee, Whate'er thou hear'st or seest, stand all aloof, And do not interrupt me in my course. Why I descend into this bed of death, 2965Is partly to behold my lady's face; But chiefly to take thence from her dead finger A precious ring, a ring that I must use In dear employment: therefore hence, be gone: But if thou, jealous, dost return to pry 2970In what I further shall intend to do, By heaven, I will tear thee joint by joint And strew this hungry churchyard with thy limbs: The time and my intents are savage-wild, More fierce and more inexorable far 2975Than empty tigers or the roaring sea. Balthasar. I will be gone, sir, and not trouble you. Romeo. So shalt thou show me friendship. Take thou that: Live, and be prosperous: and farewell, good fellow. Balthasar. [Aside] For all this same, I'll hide me hereabout: 2980His looks I fear, and his intents I doubt. [Retires] Romeo. Thou detestable maw, thou womb of death, Gorged with the dearest morsel of the earth, Thus I enforce thy rotten jaws to open, 2985And, in despite, I'll cram thee with more food! [Opens the tomb] Paris. This is that banish'd haughty Montague, That murder'd my love's cousin, with which grief, It is supposed, the fair creature died; 2990And here is come to do some villanous shame To the dead bodies: I will apprehend him. [Comes forward] Stop thy unhallow'd toil, vile Montague! Can vengeance be pursued further than death? 2995Condemned villain, I do apprehend thee: Obey, and go with me; for thou must die. Romeo. I must indeed; and therefore came I hither. Good gentle youth, tempt not a desperate man; Fly hence, and leave me: think upon these gone; 3000Let them affright thee. I beseech thee, youth, Put not another sin upon my head, By urging me to fury: O, be gone! By heaven, I love thee better than myself; For I come hither arm'd against myself: 3005Stay not, be gone; live, and hereafter say, A madman's mercy bade thee run away. Paris. I do defy thy conjurations, And apprehend thee for a felon here. Romeo. Wilt thou provoke me? then have at thee, boy! 3010 [They fight] Page. O Lord, they fight! I will go call the watch. [Exit] Paris. O, I am slain! [Falls] 3015If thou be merciful, Open the tomb, lay me with Juliet. [Dies] Romeo. In faith, I will. Let me peruse this face. Mercutio's kinsman, noble County Paris! 3020What said my man, when my betossed soul Did not attend him as we rode? I think He told me Paris should have married Juliet: Said he not so? or did I dream it so? Or am I mad, hearing him talk of Juliet, 3025To think it was so? O, give me thy hand, One writ with me in sour misfortune's book! I'll bury thee in a triumphant grave; A grave? O no! a lantern, slaughter'd youth, For here lies Juliet, and her beauty makes 3030This vault a feasting presence full of light. Death, lie thou there, by a dead man interr'd. [Laying PARIS in the tomb] How oft when men are at the point of death Have they been merry! which their keepers call 3035A lightning before death: O, how may I Call this a lightning? O my love! my wife! Death, that hath suck'd the honey of thy breath, Hath had no power yet upon thy beauty: Thou art not conquer'd; beauty's ensign yet 3040Is crimson in thy lips and in thy cheeks, And death's pale flag is not advanced there. Tybalt, liest thou there in thy bloody sheet? O, what more favour can I do to thee, Than with that hand that cut thy youth in twain 3045To sunder his that was thine enemy? Forgive me, cousin! Ah, dear Juliet, Why art thou yet so fair? shall I believe That unsubstantial death is amorous, And that the lean abhorred monster keeps 3050Thee here in dark to be his paramour? For fear of that, I still will stay with thee; And never from this palace of dim night Depart again: here, here will I remain With worms that are thy chamber-maids; O, here 3055Will I set up my everlasting rest, And shake the yoke of inauspicious stars From this world-wearied flesh. Eyes, look your last! Arms, take your last embrace! and, lips, O you The doors of breath, seal with a righteous kiss 3060A dateless bargain to engrossing death! Come, bitter conduct, come, unsavoury guide! Thou desperate pilot, now at once run on The dashing rocks thy sea-sick weary bark! Here's to my love! 3065[Drinks] O true apothecary! Thy drugs are quick. Thus with a kiss I die. [Dies] [Enter, at the other end of the churchyard, FRIAR] 3070LAURENCE, with a lantern, crow, and spade] Friar Laurence. Saint Francis be my speed! how oft to-night Have my old feet stumbled at graves! Who's there? Balthasar. Here's one, a friend, and one that knows you well. Friar Laurence. Bliss be upon you! Tell me, good my friend, 3075What torch is yond, that vainly lends his light To grubs and eyeless skulls? as I discern, It burneth in the Capel's monument. Balthasar. It doth so, holy sir; and there's my master, One that you love. 3080 Friar Laurence. Who is it? Balthasar. Romeo. Friar Laurence. How long hath he been there? Balthasar. Full half an hour. Friar Laurence. Go with me to the vault. 3085 Balthasar. I dare not, sir My master knows not but I am gone hence; And fearfully did menace me with death, If I did stay to look on his intents. Friar Laurence. Stay, then; I'll go alone. Fear comes upon me: 3090O, much I fear some ill unlucky thing. Balthasar. As I did sleep under this yew-tree here, I dreamt my master and another fought, And that my master slew him. Friar Laurence. Romeo! 3095[Advances] Alack, alack, what blood is this, which stains The stony entrance of this sepulchre? What mean these masterless and gory swords To lie discolour'd by this place of peace? 3100[Enters the tomb] Romeo! O, pale! Who else? what, Paris too? And steep'd in blood? Ah, what an unkind hour Is guilty of this lamentable chance! The lady stirs. 3105 [JULIET wakes] Juliet. O comfortable friar! where is my lord? I do remember well where I should be, And there I am. Where is my Romeo? [Noise within] Friar Laurence. I hear some noise. Lady, come from that nest Of death, contagion, and unnatural sleep: A greater power than we can contradict Hath thwarted our intents. Come, come away. Thy husband in thy bosom there lies dead; 3115And Paris too. Come, I'll dispose of thee Among a sisterhood of holy nuns: Stay not to question, for the watch is coming; Come, go, good Juliet, [Noise again] 3120I dare no longer stay. Juliet. Go, get thee hence, for I will not away. [Exit FRIAR LAURENCE] What's here? a cup, closed in my true love's hand? Poison, I see, hath been his timeless end: 3125O churl! drunk all, and left no friendly drop To help me after? I will kiss thy lips; Haply some poison yet doth hang on them, To make die with a restorative. [Kisses him] 3130Thy lips are warm. First Watchman. [Within] Lead, boy: which way? Juliet. Yea, noise? then I'll be brief. O happy dagger! [Snatching ROMEO's dagger] This is thy sheath; 3135[Stabs herself] there rust, and let me die. [Falls on ROMEO's body, and dies] [Enter Watch, with the Page of PARIS] Page. This is the place; there, where the torch doth burn. 3140 First Watchman. The ground is bloody; search about the churchyard: Go, some of you, whoe'er you find attach. Pitiful sight! here lies the county slain, And Juliet bleeding, warm, and newly dead, Who here hath lain these two days buried. 3145Go, tell the prince: run to the Capulets: Raise up the Montagues: some others search: We see the ground whereon these woes do lie; But the true ground of all these piteous woes We cannot without circumstance descry. 3150 [Re-enter some of the Watch, with BALTHASAR] Second Watchman. Here's Romeo's man; we found him in the churchyard. First Watchman. Hold him in safety, till the prince come hither. [Re-enter others of the Watch, with FRIAR LAURENCE] Third Watchman. Here is a friar, that trembles, sighs and weeps: 3155We took this mattock and this spade from him, As he was coming from this churchyard side. First Watchman. A great suspicion: stay the friar too. [Enter the PRINCE and Attendants] Prince Escalus. What misadventure is so early up, 3160That calls our person from our morning's rest? [Enter CAPULET, LADY CAPULET, and others] Capulet. What should it be, that they so shriek abroad? Lady Capulet. The people in the street cry Romeo, Some Juliet, and some Paris; and all run, 3165With open outcry toward our monument. Prince Escalus. What fear is this which startles in our ears? First Watchman. Sovereign, here lies the County Paris slain; And Romeo dead; and Juliet, dead before, Warm and new kill'd. 3170 Prince Escalus. Search, seek, and know how this foul murder comes. First Watchman. Here is a friar, and slaughter'd Romeo's man; With instruments upon them, fit to open These dead men's tombs. Capulet. O heavens! O wife, look how our daughter bleeds! 3175This dagger hath mista'en—for, lo, his house Is empty on the back of Montague,— And it mis-sheathed in my daughter's bosom! Lady Capulet. O me! this sight of death is as a bell, That warns my old age to a sepulchre.

      romeo kills paris and places his body inside juliets tomb believing juliet is dead he drinks the poison and dies beside her friar arrives just as juliet is awakening but romeo is already dead when the friar leaves she sees romeos body and decide to stab herself with the dagger

    1. 18.4. Repair and Reconciliation# The idea of repair (or reconciliation) has shown up a couple of times already, both in the role of shame in child development, and in the Enforcing Social Norms: The Morality of Public Shaming paper. Let’s look more at what a repair might or might not look like. 18.4.1. Limits of Reconciliation# When we think about repair and reconciliation, many of us might wonder where there are limits. Are there wounds too big to be repaired? Are there evils too great to be forgiven? Is anyone ever totally beyond the pale of possible reconciliation? Is there a point of no return? One way to approach questions of this kind is to start from limit cases. That is, go to the farthest limit and see what we find there by way of a template, then work our way back toward the everyday. Let’s look at two contrasting limit cases: one where philosophers and cultural leaders declared that repairs were possible even after extreme wrongdoing, and one where the wrongdoers were declared unforgivable.1 Nuremberg Trials# After the defeat of Nazi Germany, prominent Nazi figures were put on trial in the Nuremberg Trials. These trials were a way of gathering and presenting evidence of the great evils done by the Nazis, and as a way of publicly punishing them. We could consider this as, in part, a large-scale public shaming of these specific Nazis and the larger Nazi movement. Some argued that there was no type of reconciliation or forgiveness possible given the crimes committed by the Nazis. Hannah Arendt argued that no possible punishment could ever be sufficient: The Nazi crimes, it seems to me, explode the limits of the law; and that is precisely what constitutes their monstrousness. For these crimes, no punishment is severe enough. It may well be essential to hang Göring, but it is totally inadequate. Hannah Arendt/Karl Jaspers correspondence, 1926-1969 See also: Eichmann in Jerusalem: A Report on the Banality of Evil by Hannah Arendt Truth and Reconciliation Commission# In South Africa, when the oppressive and violent racist apartheid system ended, Nelson Mandela and Desmond Tutu set up the Truth and Reconciliation Commission. The commission gathered testimony from both victims and perpetrators of the violence and oppression of apartheid. We could also consider this, in part, a large-scale public shaming of apartheid and those who hurt others through it. Unlike the Nuremberg Trials, the Truth and Reconciliation Commission gave a path for forgiveness and amnesty to the perpetrators of violence who provided their testimony. See also: What Archbishop Tutu’s ubuntu credo teaches the world about justice and harmony 18.4.2. Steps for Repentance# For when reconciliation is possible, what would it look like? In the article Famous abusers seek easy forgiveness. Rosh Hashanah teaches us repentance is hard. by Rabbi Danya Ruttenberg, she outlines a set of steps for “repentance” needed for someone to have their relationship with others repaired: “The bad actor must own the harm perpetrated, ideally publicly” “They must do the hard internal work to become the kind of person who does not harm in this way — which is a massive undertaking, demanding tremendous introspection and confrontation of unpleasant aspects of the self” “They must make restitution for harm done, in whatever way that might be possible” “Then — and only then — they must apologize sincerely to the victim” “Lastly, the next time they are confronted with the opportunity to commit a similar misdeed, they must make a different, better choice” 18.4.3. Repair Example# On February 6, 2022, Jeremy Schneider became the Twitter “main character of the day” for posting the following Tweet, which was widely condemned as being mean and not understanding other people’s experiences: Fig. 18.1 Jeremy Schneider’s Tweet# In what was an unusual turn of events for a Twitter “main character of the day,” Jeremy Schneider later made an apology that was mostly accepted by the Twitter users who had criticized his Tweet: Fig. 18.2 Part 1 of Jeremy Schneider’s apology# Fig. 18.3 Part 2 of Jeremy Schneider’s apology# 18.4.4. Reflection questions# Do you think there are situations where reconciliation is not possible? What would reconciliation look like (if possible), when a social media platform is used in a genocide (see: Meta urged to pay reparations for Facebook’s role in Rohingya genocide) Does Jeremy Schneider’s apology cover the five steps of repentance listed by Rabbi Danya Ruttenberg? Pick a situation where someone is being publicly shamed. Who is responsible for accepting or rejecting their apology/repentance? Pick a social media platform and a situation where someone is being publicly shamed. What might that person do to try to repair or reconcile after the public shaming? Pick a social media platform. In what ways does that platform make it difficult to repair or reconcile after public shaming? 1 We give these two examples to illustrate how important it is to appreciate the breadth of views on this incredibly difficult question, not to imply that one view or the other is preferable. The Nuremberg Trials and the Truth and Reconciliation Commission are both attempts at responding to great evils, and we believe it is important to understand different views of people who suffered. So take your time to think through your intuitions about these limit cases, and research different perspectives on these events (and other atrocities), and then work your way back to the everyday context of social media posting. { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./ch18_public_shaming" }, predefinedOutput: true } kernelName = 'python3' previous 18.3. Perspectives on the Ethics of Public Shaming

      I think reconciliation is possible in some situations, but it requires real effort from the person who caused harm. They need to admit what they did, reflect on why it was wrong, and sincerely apologize. Jeremy Schneider’s apology seems to follow many of these steps because he admitted his tweet was mean, explained how he reflected on it, and promised to think more carefully before posting in the future. However, on social media it is often difficult to repair harm because posts spread quickly and large numbers of people may continue criticizing someone even after they apologize.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      We were cautious in our gating strategy since the TCR𝛿+ CD3e+ subset is rather small and so low signal/background noise ratio can be an issue if the gates used are too broad/generous. There is some inevitable low level background staining with the TCR𝛿 that sits just above the bulk of the negative population and is CD3ε -ve. Although this background represents a tiny fraction of total cells, we were wary of gate contamination into our TCR𝛿+ CD3e<sup>+</sup> subset and we wanted a gating strategy that could be applied across other organs too. We do not, however, believe this conservative strategy is impacting on measurements progenitor numbers across strains or our conclusions, since the size of this progenitor population in the various IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains was never impacted by the mutations. But to reassure the reviewer, we show our conservative gate as compared with a very broad TCR𝛿 gate and see we are not missing a substantial population of CD25+ cells just below our gate. This also helps illustrate how close the background from the CD27<sup>int</sup> expressing αβ thymocytes (right column) comes to the TCR𝛿+ CD3+ gate and the importance of tight lineage gating.

      Author response image 1.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells.

      Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data.

      Apologies for not being clearer on this point. We have studied conventional αβ T cell development in these strains in considerable detail, and these studies are published and discussed in some detail in the introduction in paragraph 3 on page 3-4 and in cited references Schmidt-Supprian et al 2004, SIlva et al 2014, Xing et al 2016, Webb et al 2019, Carty et al 2023. These detail how IKK expression is critical for thymic development of αβ T cells and their peripheral survival, and dissects the role of NF-κB activation and cell death regulation by IKK. However, we now add new discussion (page 11-12) that considers the potential impact of altered αβ T cell development in the strains used for this study.

      We agree that trans-conditioning is also an important consideration, since CD4 TH17 T cells can enhance type 17 𝛾𝛿 T cell development (10.1038/icb.2011.50). This is of relevance to the limited conclusions we draw concerning type 17 𝛾𝛿 T cells. The REL and IKK deficient strains do lack effector populations, including type 17 αβ T cells, so it is possible that the absence of type 17 αβ T cells in these strains does contribute to the modest impact of IKK deletion in the type 17 𝛾𝛿 subset. We now highlight this information and discuss in the manuscript (page 11-12).

      Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      We now provide both the absolute frequencies of different 𝛾𝛿 subsets and their relative frequencies to one another, as supplementary figure 2. We still believe assessing absolute numbers is the gold standard, since the differential impact of gene deletions on the αβ T cell compartments in different strains will effect whether or not αβ T cells are present, and therefore overall representation of 𝛾𝛿 T cells can vary considerably between strains. Hence, absolute numbers are more reliable measure of cell abundance.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      We did of course process and calculate numbers of different subsets in both lymph nodes and spleen. Where we saw loss of peripheral 𝛾𝛿 subsets, or rescue, this was reflected in seperate analysis of both organs and we did not see any organs specific effects in the mouse strains analysed. We therefore took the initial view that presenting aggregate data was most efficient and least repetitive representation of data. However, we very much recognise the reviewers concern, and interest to see these data, so have now included representative plots across both organs for figure 1D, and show cell numbers of lymph nodes and spleen separately, as well as together, for figures 1, 2, 4 and 7, and these plots reflect the differences observed when we combined data. We did not break down the data for all figures (e.g. figures 3 and 5) as it was more cumbersome for more complex multi-strain comparisons and so attempt to balance clarity and transparency against unnecessary repetitive data presentation.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      We did share a similar concern, but think this is not an issue where subsets disappear and are almost completely absent, such as in IKK1/2 KO and Casp8 KO settings. Where we saw rescue with RIPK1<sup>D138N</sup> in Casp8ΔT<sup>CD2</sup> strains, we were keen to demonstrate that the populations we saw restored did exhibit their expected function, and so confirmed this in figure 5C by intracellular cytokine staining after a short 4h restimulation in vitro. This also served to validate our gating strategy, since what we designated as Type 1 cells - CD27+CD122+CD44<sup>int</sup> cells were the only source of IFN-gamma, while CD27–CD44<sup>hi</sup> CD122<sup>lo</sup> cells were the only source of IL-17. Adaptive/ naive cells made neither cytokine. So while we did not include nuclear stains, we were satisfied that the cytokine assays validated the gating strategy.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      Yes, we completely agree with this assessment and conclusion. While kinase dead RIPK1 does provide some rescue, this appears relatively modest, and instead supports the view, validated in figure 7, that maybe the dominant function of IKK in 𝛾𝛿 T cells is to activate NF-κB dependent survival signals. Nevertheless, RIPK1<sup>D138N</sup> does provide some significant rescue, which allows some peripheral cells to repopulate and demonstrates that IKK is repressing RIPK1 mediated cell death. It is actually not trivial to assess the relative importance of IKK-RIPK1 and IKK-NF-κB functions. In the IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice, we prevent RIPK1 induced death, but still lack the NF-κB-dependent survival signal. Consistent with this, the ~1log reduction in 𝛾𝛿 numbers between WT and IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice is actually similar to what we observe in the absence of REL subunits (Fig. 7) which is a smaller reduction than we observe in IKKΔT<sup>CD2</sup> mice. What would have been ideal is to have a scenario where IKK regulation of RIPK1 was defective but NF-κB survival signalling was intact. This would reveal the full impact of loosing IKK dependent regulation of RIPK1 alone, which we suspect would result in substantial cell death that could not be blocked by NF-κB. Unfortunately, we not have or know of suitable mouse mutants to test this. This is quite a nuanced discussion and we now clarify the scope and extent of conclusions we can draw (p. 7, 11).

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      Thank you - we include a discussion of these papers now (p12).

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

      Yes, we agree these do like quite different and could be confusing. The lymph nodes from IKKΔT<sup>CD2</sup> lack αβ T cells and B cells, and so the cellularity is much lower than normal. Consequently, the percentage representation of remaining cells can be more noisy, while total cellularity calculations are more consistent. This is not an issue in the other strains that all have more cells in lymph nodes. We now show plots from spleen of the same mice which appear better aligned with additional splenic data shown in Figure 1.

      Reviewer #2 (Public review):

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      We address this important point in the discussion (p11-12). The impacts of gene deletions upon αβ and 𝛾𝛿 T cells operate independently of one another (as also discussed in response to reviewer 1). For instance, the phenotype of αβ T cells is identical in IKKΔT<sup>CD2</sup> and IKKΔT<sup>CD4</sup> mice - 𝛾𝛿 T cells are only targeted in IKKΔT<sup>CD2</sup> mice. Similarly, the phenotype of 𝛾𝛿 T cells is similar in IKKΔT<sup>CD2</sup> vs Casp8.IKKΔT<sup>CD2</sup> strains. αβ T cells are absent from IKKΔT<sup>CD2</sup> but present in near normal numbers in Casp8.IKKΔT<sup>CD2</sup> mice. Others have also noted that 𝛾𝛿 T cell development is normal in Rag deficient mice (10.1126/science.1604321). In any case, an absence of αβ T cells is expected to promote 𝛾𝛿 T cell survival in the absence of competition for common utilised cytokines such as IL-7 and IL-15, though we do not see much evidence for this in mice with and without αβ T cells such as IKKΔT<sup>CD2</sup> vs Casp8. IKKΔT<sup>CD2</sup> strains. We do now discuss the potential contribution of trans-conditioning for type 17 𝛾𝛿 T cell development (p12).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      In IKKΔT<sup>CD2</sup> and Casp8. IKKΔT<sup>CD2</sup> deficient strains, we infer a defect in survival, since they lack peripheral 𝛾𝛿 T cells, despite normal thymic development. Their absence made it hard to assess proliferation and migration, though 𝛾𝛿 T cells were absent from all lymphoid organs. The conclusions that defective survival is responsible for the absence of 𝛾𝛿 T cells in the different strains is also supported by the rescue of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains by kinase dead RIPK1D138N. Furthermore, the presence of small numbers of residual populations in lymph nodes and spleen of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains demonstrates that migration patterns were normal. Were cells unable to recirculate, they might be expected to fail to leave the thymus, or to accumulate in the spleen. We so no evidence of either of these scenarios.

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      We did not unfortunately, assess chain usage, choosing rather to rely of phenotypic identity of specific subsets, which we show in figure 5C, was extremely robust. IL-17 was only secreted by CD27– CD44<sup>hi</sup> 𝛾𝛿 T cells, while IFN-gamma was only secreted by CD27+ CD44<sup>hi</sup> 𝛾𝛿 T cells. We argue that the production of these key effector cytokines is the most direct test of a subsets functional identity and the phenotypic designation is robust.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      Yes, we agree this remains an open question around the transcriptional mechanisms by which NFκB signalling promotes cell survival, and one best addressed in future studies. We did not perform cytokine staining more widely, because the cytokine assay relies on short term re-stimulation of T cells with PMA and ionomycin. PMA activates PKC which in turn activates NF-κB signalling to elicit the cytokine response measured in this assay. As such, the results of such assays would be hard to interpret. We agree it would be interesting to investigate the functional consequences of REL deficiency in future studies, although this may need a more nuanced setting where 𝛾𝛿 T cells are not lost as a result of their defective survival.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      We did attempt these experiments with collaborators by analysing type 17 𝛾𝛿 T cell development in fetal thymic organ culture (FTOC). However, the GM mice are not so easy to breed and generating the large numbers of embryos required to set up the FTOCs proved too challenging and we were unable to generate these data.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      Given time and resources, it would have been ideal to confirm necroptotic cell death by alternative knockouts, such as RIPK3 or MLKL. However, formation of the necrosome is dependent on kinase active RIPK1, since autophosphorylation of RIPK1 changes its conformation to allow recruitment of RIPK3 and MLKL and formation of the necrosome. Therefore, the rescue of CASPASE8 deficient T cells from cell death by kinase dead RIPK1 is very solid genetic evidence of necroptosis.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

      This was assessed in p105/RelA knockout strain, which only express cREL. What we lacked was an assessment of what RelA/p50 dimers can support in the absence of cREL. We do however, show the impact of RelA single deficiency, and RelA/p50 deficiency.

      In truth, we had many REL deficient strains and it was challenging to make all the combinations we wanted. However, we try to compensate for this by discussing what cREL:cREL dimers and cREL:P50 dimers are capable of doing by analysing 𝛾𝛿 T cell development in p105/RELA DKO and RELA KO mice - these do show that cREL:P50 can compensate in the absence of RELA, but cREL:cREL cannot.

      Reviewer #3 (Public review):

      Weaknesses:

      The paper would benefit greatly from a graphical abstract that could summarize the key findings, making the key findings accessible to the general immunology or biochemistry reader. Ideally, this graphic would distinguish the requirements for NF-κB signals sustaining thymic γδ T cell differentiation from peripheral maintenance, taking into account the various subsets and signaling pathways required. In addition, the authors should consider adding further literature comparing the requirements for NF-κB /necroptosis pathways in regulating other non-conventional T cell populations, such as iNKT, MAIT, or FOXP3+ Treg cells. These data might help position the requirements described here for γδ T cells compared to other subsets, with respect to homeostatic cues and transcriptional states.

      Thank you - we have added such discussions. We are happy to add a graphical abstract if journal constraints permit this.

      Last and least, there are multiple grammatical errors throughout the manuscript, and it would benefit from further editing. Likewise, there are some minor errors in figures (e.g., Figure 3A, add percentage for plot from IKKDT.RIPK1D138N mouse; Figure 7, “Adative").

      Thank you !

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary of findings and key conclusions This manuscript asks how pharmacologic targeting of the outer mitochondrial membrane protein MIRO1 (RHOT1) with a MIRO1-binding compound (MR3) reshapes immunosuppressive programs in the glioma tumor microenvironment (TME). The core of the paper is a cross-species transcriptomic comparison that combines an in vivo mouse dataset with an ex vivo human perturbation dataset. Model systems and approach (as described): • Mouse in vivo: GL261-Luc intracranial glioma in C57BL/6J mice; MR3 is administered intracranially at the implantation site (10 µM in 5 µL DMSO) on days 11 and 18, and tumors are harvested on day 22 for single-nucleus RNA-seq (snRNA-seq). • Mouse snRNA-seq: NeuN-based nuclei sorting, 10x Genomics v3.1; alignment to mm10; Seurat-based integration and annotation. Tumor-cell calling is supported by CNV inference (SCEVAN/CopyKAT). One MR3-treated sample is excluded after QC, leaving 3 control vs 2 MR3-treated samples (11,940 NeuN− nuclei). • Human ex vivo: freshly resected glioma cores from 3 patients are cultured with 10 µM MR3 or DMSO for 24 h, followed by bulk RNA-seq (STAR alignment to hg19; DESeq2 for differential expression). • Cross-species integration: the analysis is restricted to 1:1 orthologs and protein-coding genes shared across datasets; inferred cell-cell signaling is explored with CellChat. Main findings (as presented): • MR3 shifts expression of a subset of glioma-associated genes toward a non-tumor-like direction ("rescued genes") and is associated with large changes in inferred cell-type composition in the mouse snRNA-seq dataset (including a marked drop in the fraction of nuclei annotated as tumor: 44.5% to 4.3%; Fig. 1E). • Across TCGA-vs-GTEx (glioma-upregulated genes) and three MR3 response analyses (mouse snRNA-seq, mouse pseudo-bulk, and human bulk RNA-seq), PARP11/Parp11 is reported as the only gene that is consistently upregulated in glioma and consistently downregulated by MR3 (Fig. 2B). • Within the mouse myeloid compartment, Parp11 is most enriched in MAC4 and MAC1, while MAC1 shows high Cd274 (Pdl1/PD-L1). MR3 reduces Parp11 in MAC4/MAC1 and reduces Cd274 in MAC1 (Fig. 2H). • CellChat analysis suggests that in controls MAC1 is the dominant sender of PD-L1/PD-1 signaling to CD8+ T cells (Fig. 3C), and that this PD-L1/PD-1 interaction is strongly diminished after MR3 (Fig. 3E). • The authors propose a paracrine model in which MAC4-derived PGE2 (via Ptges3) sustains Parp11 expression in MAC1 through cAMP/PKA/CREB, promoting PD-L1-mediated T-cell suppression; MR3 disrupts this circuitry (Fig. 4). Major comments 1. Strength of the conclusions Two parts of the story felt well supported by the data as shown. First, the cross-species convergence on PARP11/Parp11 is a clear and potentially useful result (Fig. 2B). Second, the myeloid subclustering plus CellChat analysis makes a coherent case that PD-L1/PD-1 signaling in this model is dominated by a specific macrophage subset (MAC1) and changes after MR3 (Fig. 2H, Fig. 3). Where I was less convinced is when the manuscript moves from "transcriptomic and modeling evidence" to causal statements such as "MIRO1-mediated axis driving immunosuppression" and "MR3 reduces tumor burden by reactivating immunity." At the moment, several central inferences remain indirect: • Causality is inferred primarily from transcriptomic shifts and ligand-receptor inference rather than functional immune readouts.

      -We thank the Reviewer for the constructive evaluation. We have toned down the claims throughout the manuscript with tracking.

      • __ On-target attribution to MIRO1 hinges on MR3 being a MIRO1 binder; the study does not include a genetic MIRO1 perturbation or a target-engagement/epistasis test in the relevant immune compartments (and the authors acknowledge this limitation in the Discussion).__ -We have examined on-target activity of MR3 in our other papers. For example, by depleting Miro1 with CRISPRi in glioma cells (Miro1 KD cells), we found that it phenocopied the effect of MR3. We also expressed Miro1-7A, a drug-resistant mutant of Miro1 predicted to be unable to bind MR3 (1) in Miro1 KD glioma cells, which rendered glioma cells insensitive to MR3 treatment. These data demonstrate that in cellular glioma models, Miro1 is the target of MR3 and MR3 exerts its functions via directly binding to Miro1.

      We have also excluded off-target effect of MR3 by examining other mitochondrial GTPases (1, 2) including Miro2.

      We agree these data were not done specifically in immune compartments, and have acknowledged it in Discussion and added more explanation in Introduction citing our published papers.

      • __ The very large reduction in "tumor cell proportion" (Fig. 1E) is striking but is still a composition measure of recovered nuclei; it is not, on its own, a direct measurement of tumor size/burden and could be sensitive to differential nuclei recovery or cell loss during processing.__ -We agree that the "tumor cell proportion" in Fig. 1E represents the composition of recovered nuclei and is not, by itself, a direct measurement of tumor size or burden. We have removed "tumor burden" throughout the manuscript to avoid confusion.

      To determine whether the observed reduction might reflect technical bias, we examined the quality control metrics across all samples. Of the six initial samples (three control and three treated), one treated sample (TN1) showed clear quality concerns and was therefore excluded from downstream analysis.

      For the remaining samples, the distributions of detected genes per nucleus and total RNA counts per nucleus were similar between groups. The percentage of mitochondrial reads was consistently low, and only a small fraction of nuclei was removed during filtering, indicating overall comparable nuclei quality. Notably, the treated samples yielded similar or even higher total numbers of recovered nuclei, despite showing a lower tumor cell proportion. Please refer to new Fig. S1A for these results.

      Together, these observations suggest that the decrease in tumor cell proportion is unlikely to be explained simply by differential nuclei recovery, sequencing depth, or filtering effects. That said, we recognize that compositional differences in single-nucleus RNA sequencing data do not provide a direct measurement of tumor burden. We have revised the manuscript to clarify this point and to indicate that independent future approaches would be required for definitive assessment.

      I think the paper can go forward in its current scope, but the strength of the claims should match the level of evidence. If the authors want to keep strong, causal language in the title/abstract ("driving immunosuppression," "reduces tumor burden"), then I consider one or two targeted validation experiments essential (see below). Alternatively, the authors can temper the language and position the mechanistic model more explicitly as a hypothesis generated from the transcriptomic analysis.

      -We thank the Reviewer! We have toned down the claims throughout the manuscript to make the data consistent with the conclusion.

      __ Statements that should be labeled as preliminary/speculative (unless additional validation is added) • MAC4-derived PGE2 as the upstream driver of MAC1 Parp11/PD-L1: plausible and nicely consistent with Ptges3 being MAC4-high in controls and reduced with MR3 (Fig. 4A), but not demonstrated.__

      -We have changed the conclusion of this part to:

      Together, these bioinformatic findings suggest that MAC4 may produce PGE₂, which could act on nearby MAC1 cells in a paracrine manner to increase Parp11 expression, although this model needs to be functionally validated.

      • __ MIRO1 _→ mtDNA _→ cGAS/STING _→_ Ptges3 as a mechanistic chain: interesting, but currently framed largely by pathway knowledge plus modest expression changes (Supplementary Fig. S5).__ -We have added: "which requires future functional investigation."

      • __ "MR3 reactivates anti-tumor immunity to reduce tumor burden": the gene set enrichment and CellChat shifts are consistent with immune activation, but immune-mediated tumor control is not directly tested.__ -We have toned down these claims on tumor burden and only conclude as: MR3 may enhance anti-tumor immune responses.

      __ Replication and statistics Mouse snRNA-seq replication is limited after QC (3 control vs 2 MR3-treated animals). With n=2 treated, it is hard to know whether some of the biggest composition and cluster-level changes are robust to animal-to-animal variability.__

      -As also explained to Rev 2, we originally planned 3 mice per group. Despite losing one after QC, sample-level pseudobulk PCA analysis (treating each mouse as one replicate) of the mice shows clear separation of treated from untreated groups (new Fig. S2C), supporting technical reproducibility despite a small n. The two MR3-treated samples clustered together and were clearly separated from controls, indicating that the transcriptional effect of MR3 exceeds inter-animal variability (new Fig. S2C). The reduction in tumor cell proportion was also observed in both treated animals (new Fig. S2F). We have added this description to the Results (Page 5, lines 116-118) and included a new figure showing the tumor cell proportion for each animal (new Fig. S2F).

      We acknowledge this is a limitation, but as the Reviewer also pointed out that our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Thus, our methods and resource should be still valid and useful to the community.

      Relatedly, the snRNA-seq differential expression is performed with Seurat FindMarkers (Wilcoxon rank-sum). Per-cell testing can inflate significance if biological replicate structure is not accounted for (pseudoreplication). I suggest the authors clarify exactly how they handled sample-level replication for the key DE results and, where possible, re-run the main DE comparisons using a sample-aware approach (e.g., pseudo-bulk within cell types/subclusters).

      -We thank the reviewer for raising this important point. In the original analysis, differential expression was performed using Seurat's FindMarkers function which performs per-cell testing. We acknowledge that this approach can overestimate significance if biological replicate structure is not explicitly accounted for.

      To address this, we re-ran the key differential expression analyses using a pseudo-bulk approach: counts were aggregated per cell type/subcluster per sample, and DE testing was performed across samples rather than individual cells. The main results and conclusions remain consistent with the original analysis, while this approach ensures that statistical significance properly reflects biological replication (new FigS3. D-F).

      For the human bulk RNA-seq, the methods indicate 3 patient tissues split across MR3 vs DMSO for 24 h. In DESeq2, a paired design (including patient as a blocking factor) would be important to avoid patient-to-patient variability dominating the treatment signal; the manuscript should confirm whether the design formula accounted for this.

      -In the revised manuscript, we re-ran the DESeq2 analysis using a paired design with patient as a blocking factor and compared DMSO and MR3 within each patient (P1-P3). The results are consistent with our previous analysis. PARP11 remains significantly downregulated (raw p-value Finally, several places in the Methods define significance using p-value cutoffs (e.g., GEPIA3 TCGA/GTEx analysis uses p 1; human DE uses p = 1). Because multiple testing is substantial in all of these analyses, I recommend reporting FDR-adjusted values consistently (and being explicit about whether figures/tables show raw or adjusted p-values).

      -We have now used FDR-adjusted values for the TCGA/GTEx analysis and have updated Fig. 1C (top left), Results, and Methods accordingly. PARP11 remains significant after FDR correction.

      For the human bulk RNA-seq, very few genes pass an adjusted p 2FC| > 1 across all four differential expression analyses and updated the corresponding description in Methods.

      __ Do the data support the macrophage-to-CD8 suppression claim? The CellChat PD-L1/PD-1 network figures are suggestive (Fig. 3C/E), but ligand-receptor inference is not the same as demonstrating functional T-cell inhibition. At minimum, I would like to see one orthogonal readout (flow or immunostaining) showing that PD-L1__ protein on myeloid cells and PD-1 on CD8 T cells change in the expected directions after MR3, and that CD8 T cells show an activation/effector signature at the protein level.

      -We agree this would be clearly the next step in functional studies, but the current manuscript is focused on transcriptomic analysis and method building, so we have toned down any claims at the functional level.

      In addition, we have observed that T cells after MR3 treatment show upregulation of cytotoxicity- and IFN-response-related genes consistent with enhanced effector function at the transcriptional level. We have added new Fig. S6A and explanation in Result.

      __ PARP11: mediator vs marker The cross-species PARP11 result is the most convincing and potentially generalizable finding in the manuscript (Fig. 2B). However, in the specific context of this study, PARP11 is still best supported as a conserved MR3-responsive candidate rather than a demonstrated causal driver of PD-L1-mediated suppression. If the authors want to argue PARP11 is an effector of the pathway (rather than a marker), they should either soften the language or add a minimal functional linkage experiment within the existing scope (see "Optional" experiments below).__

      -We have softened the overall language throughout the manuscript to emphasize the correlation and PARP11 as a marker and to reflect the bioinformatic nature of the study. As this paper's main goal is method development and resource building, with already 11 figures, we think functional experiments could be done in another paper.

      __ Reproducibility and clarity of methods I appreciate that the authors provide a code/data portal (MiroScape) and a GitHub link. To make the study as reproducible as possible, I recommend: • Deposit raw sequencing reads for both mouse and human datasets (GEO/SRA) and include accession numbers in the manuscript.__

      -We have just deposited all raw data. Accession numbers will be provided once it is public.

      • __ Provide a short, consolidated "computational reproducibility" note with software versions and key parameters (Seurat, CellChat, STAR, DESeq2, etc.).__ -Added

      • __ Clarify pseudo-bulk construction (what is aggregated, at what level, and how many biological replicates contribute to each pseudo-bulk comparison).__ -Added

      • __ Add a brief summary of MR3 provenance/validation and what "MIRO1-binding" means operationally in the context of these experiments (especially for readers outside the MIRO1 field).__ -We have added this in Introduction.

      Experiments requested (kept within the existing claims) I am intentionally not suggesting new lines of experimentation. The experiments below are aimed only at supporting the paper's current central claims. I separate them into items I consider essential vs optional, depending on how strongly the authors want to phrase mechanistic conclusions.

      -We thank the Reviewer. We have toned down the claims to reflect the bioinformatic nature of the paper. We will perform suggested experiments below in another paper.

      Essential if the title/abstract continue to use strong causal language • Protein-level validation of the PD-L1/PD-1 axis and CD8 activation in the GL261 model. A focused flow cytometry panel (myeloid PD-L1; CD8 PD-1 plus one or two effector markers such as GZMB/IFNG/Ki67) or multiplex IF/IHC on tumor sections would substantially strengthen the central MAC1 ____→____ CD8 claim. • An orthogonal measure of tumor burden in the same treatment paradigm. The manuscript currently treats the drop in the fraction of nuclei annotated as tumor (Fig. 1E) as a reduction in tumor burden; I recommend including IVIS longitudinal data and/or histologic tumor area/volume at harvest to support this statement. • If feasible, modestly increase in vivo biological replication (the snRNA-seq analysis currently has n=2 treated after QC). Even adding one additional treated animal that passes QC would help. Feasibility (rough guidance only; core pricing varies widely by institution): a repeat GL261 cohort to harvest tumors for flow and/or histology typically takes ~3-6 weeks end-to-end. A small flow panel plus core time is often on the order of a few thousand USD (antibodies and cytometry), while basic histology/IF quantification might be in the hundreds to low-thousands. If the authors already have stored tissue from the existing cohort, some of this could be faster/cheaper. Optional (only if the authors want the MAC4 ____→____ PGE2 ____→____ Parp11 mechanism to be more than a model) • Measure PGE2 (ELISA or targeted lipidomics) in tumor lysates/conditioned media from control vs MR3-treated samples, or provide a closer proxy for PGE2 pathway engagement in the relevant clusters. Optional (only if the authors want to argue PARP11 is an effector) • A minimal functional linkage experiment (in vitro) testing whether PARP11 perturbation phenocopies the relevant aspect of MR3 in macrophages (e.g., PD-L1 levels and/or the ability to suppress CD8 activation in a co-culture). This could be done with a PARP11 inhibitor or knockdown. I do not think in vivo genetics are required for this manuscript, but some functional tie would prevent overinterpretation.

      __ Minor comments A. Analysis/experimental clarifications that seem straightforward • Human DESeq2: please clarify whether the DESeq2 design was paired by patient (i.e., patient as a blocking factor).__

      -See above. We re-ran the human differential expression analysis using a paired design with patient as a blocking factor and explained in Methods.

      • __ snRNA-seq DE: please clarify whether any sample-aware method was used for the key DE conclusions (especially Parp11/Cd274 changes) rather than per-cell statistics alone.__ -See above. The key DE results are based on sample-level pseudobulk (each mouse as one replicate). The two MR3-treated samples cluster together in pseudobulk PCA (new Fig. S2C), and the tumor reduction is seen in both animals (new Fig. S2F), supporting robustness to animal variability.

      • __ CellChat: because min.cells filtering is used (min.cells = 20), please note this explicitly in figure legends where subclusters appear only in one condition, so readers understand why certain labels are missing.__ -We have edited the Fig 3 legend accordingly.

      __ Figure and text consistency issues I noticed several figure/legend/citation issues that look like simple fixes: • Fig. 3 legend panel labeling: the legend text refers to the PD-L1/PD-1 chord plot as (C) MR3− and (D) MR3+, but (D) is the heatmap panel; the chord plots are (C) and (E). This should likely read (C) MR3− and (E) MR3+.__

      -Yes, and corrected.

      • __ Fig. 5 panel reference: the Results text refers to the Cross Species module as Fig. 5F, but the Fig. 5 legend defines panels (A-E) and labels (E) as "Cross Species module." Please reconcile (either change the text to Fig. 5E or add a panel F).__ -Changed to "E".

      • __ Discussion figure citation: the Discussion cites Ptges3/PGE2 evidence as "(Figure 3)," but Ptges3 is shown in Fig. 4A and the model is in Fig. 4B.__ -Added "Figure 4A-B" there.

      • __ Fig. 1D numbers: the Results text states 509/1,602 (mouse) and 15/106 (human) "rescued" genes (Fig. 1D), but the Fig. 1D pie charts are labeled with different totals (mouse total 3490; human total 104). Please reconcile the denominators and ensure the figure matches the text and analysis choice (bulk vs snRNA vs filtered gene sets).__ -For the cross-species analysis, we only counted genes with human-mouse orthologs so that the two datasets were compared in the same gene space. This avoids inflation from species-specific genes. We have added a clarification in the figure legend.

      • __ Fig. 2 legend: there is a stray quote in "lymphoid subclusters" (appears as subclusters").__ -removed.

      __ Presentation and framing • Tone down or carefully qualify statements equating snRNA-seq composition shifts with reduced tumor burden (or add an orthogonal tumor-burden measurement as suggested above).__

      -We have removed "tumor burden" throughout the manuscript.

      • __ Where possible, tie mechanistic language explicitly to the level of evidence ("consistent with," "suggests," "model proposes") so readers do not over-interpret the transcriptomic inference.__ -done.

      • __ Consider adding a small schematic in the Results or a short "interpretation" sentence in the figure legends explaining what the CellChat plots do and do not show, since non-specialists can misread these as direct interaction measurements.__ -We have added explanations in Fig 3 legends for CellChat and emphasized the transcriptomic nature of the data.

      __ Prior literature The PARP11 immunotherapy literature is cited appropriately. For the PGE2 angle, it may help readers if the authors add one or two glioma-focused references on PGE2-mediated myeloid/T-cell suppression (if not already in the full reference list).__

      -We have added two more papers showing PGE2 may induce MDSCs and immunosuppresion in glioma (3) (4).

      Significance

      Nature and significance of the advance The advance here is primarily conceptual and resource-oriented. Conceptually, the work connects a mitochondrial regulator (MIRO1) to a specific, testable immunosuppressive circuit in the glioma TME. Technically, the cross-species perturbation framework and the accompanying MiroScape portal should be useful to groups looking for conserved, drug-responsive immune programs.

      Context within the existing literature Immunosuppression in glioma and the importance of tumor-associated myeloid populations are well established, as is the limited success of checkpoint blockade in GBM. The manuscript's proposed MAC4/MAC1 paracrine model and its emphasis on PD-L1/PD-1 signaling adds a focused, hypothesis-generating view of how particular macrophage states might sustain CD8 dysfunction. The identification of PARP11 as a conserved MR3-responsive gene also fits with emerging work implicating PARP11 in immunoregulatory programs and response to immunotherapy.

      Audience • Neuro-oncology and glioma TME researchers (myeloid heterogeneity, immune suppression). • Tumor immunology groups interested in myeloid-driven checkpoint resistance. • Researchers working on mitochondrial stress signaling and immunometabolism. • Computational biologists building cross-species or multi-modal integration frameworks. Reviewer expertise and limitations Keywords: glioma microenvironment; macrophage/microglia biology; tumor immunology; single-cell/nucleus transcriptomics; computational ligand-receptor inference. Limitations: I am not a medicinal chemist, so I cannot deeply evaluate MR3 chemistry, PK/PD, or specificity beyond what is presented. I also did not evaluate the full web-portal implementation beyond the manuscript description.

      Reviewer #2

      Evidence, reproducibility and clarity

      The authors study responses to MIRO1 inhibition in a mouse model of GL261 GBM and in human tissue pieces treated ex vivo. They provide an interesting link between mitochondrial function and potential therapeutic outcomes in a tumor type that is typically challenging to treat. The manuscript is written clearly, in correct English language and figures are well structured and easy to interpret. -We thank the Reviewer for the positive comments. We want to clarify that the compound binds to Miro1 and doesn't inhibit Miro1's GTPase activity (1). We have now added explanation in Introduction.

      __ Major critique: 1. However, I need to stress that study is based of few experiments with low robustness. The predominant experiment is single-nuclei RNAseq analysis of GL261 tumors implanted into mice, constituting 3 CTRL and 2 treated mice, due to removal of 3rd animal following sequencing (low recovery of high quality nuclei). Therefore, the sample group is small. This is understandable for snRNA-seq experiment (although 3 animals in treated group is somewhat necessary), but the efficiency of treatment with MR3 should be better documented in a larger cohort of animals. Crucial changes in distribution of cell types or polarisation of myeloid cells should be confirmed with flow cytometry, which is more feasible on a larger cohort.__

      -We agree. As explained to Rev 3, the current paper is focused on conceptual and methodical advances and providing a resource to the community, which is already big with 11 figures. As Rev 1 mentioned, our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Importantly, PCA analysis of the mice at the animal level showed clear separation of treated from untreated groups and the reduction in tumor cell proportion was also observed in both treated animals (new Fig. S2C, F), supporting technical reproducibility despite a small n. Thus, our methods and resource should be still valid and useful to the community. Exploring the tumor-reducing efficacy of MR3 or combined treatments (e.g. with anti-PD-L1 or PARP11 inhibitor) in larger cohorts is an exciting next step.

      __ Human model does not seem robust (also, only 3 patients). Very few genes are affected by treatment (incomparably less than in mice), which poses a question if the model is sufficient to study the effect of the treatment. This should be at least discussed and arguments should be stated why such model is suitable.__

      -We agree and the observed variability in treatment response is actually expected and consistent with the well-established molecular and phenotypic heterogeneity of human glioma. Importantly, despite this diversity, we identified one gene (PARP11) consistently altered across all patient's samples and mouse model. This cross-species reproducibility supports the biological and translational relevance of the finding of PARP11. We have now added this to Discussion.

      In addition, we reanalyzed the human bulk RNA-seq using a paired design with patient as a blocking factor as suggested by another reviewer, which increased the number of DE genes (new Fig. 1C).

      __ Fig. S1E shows that actually few genes are commonly affected between human and mouse experiments. So conclusion about "conserved" modulation by MR3 seem an overstatement.__

      -We meant "Parp11" is conserved. We have deleted "conserved" throughout the manuscript when we didn't refer specifically Parp11 to avoid confusion.

      __ Mechanistic conclusions about PARP11, PGE, PD-L1 etc are not documented by any wet lab experiments, just by bioinformatic modelling.__ -We have scrutinized the Main Text to emphasize this.

      Minor: 1. Authors should discuss choice of GL261 model. It is immunogenic and does not resemble human GBM ideally, so the choice should be explained.

      -Although GL261 model demonstrates higher immunogenicity compared to human GBM, this feature enables evaluation of immune-modulating therapies and mechanisms in an immune-competent setting. This model still preserves critical aspects of glioma biology, including immunosuppressive TME, invasive behavior, and intracranial growth (5). Thus, this model provides a suitable platform for our study of mechanistic investigation of immune cells in the TME. We have now added this to Method.

      __ In clustering of mouse snRNAseq data, T cells seem underclustered, e.g. Treg cluster clearly constitutes half of Il2ra-positive and negative cells, the latter probably being conventional CD4+ T cells (usually CD4+ T cells in GL261 are 50:50 Treg and conventional). This can affect further conclusions on cell:cell interactions.__

      -We thank the reviewer for this important observation. We agree that in the former annotation, it was improper to annotate all the CD4+ T cells as Treg cells, given the limited expression of Foxp3, Il2ra and other Treg marker genes. Consequently, the previously annotated "Treg cluster" likely includes both regulatory-like and conventional CD4+ T cells.

      We have further clustered the CD4+ T cell population and found that if we divided CD4+ T cells into conventional CD4+ T and Treg cells, it yielded few Treg cells for downstream analysis (~50). This would compromise the robustness and reliability of our following analysis (CellChat/DEA/etc).

      To address this, we have revised our annotation and now refer to this population more conservatively as "regulatory-like CD4+ T cells" rather than bona fide Tregs. Importantly, this subset still exhibits elevated expression of immunoregulatory molecules and is associated with CD8+ T cell dysfunction, preserving the main conclusions regarding immune suppression within the tumor microenvironment. We have updated the Results, Figures, and Discussion accordingly to clarify this revised annotation and its implications for cell-cell interactions.

      Please refer to following new figures for the updated annotation and associated results:

      Fig. 2G-H, Fig. 3A-G, Fig. S4C-D,G, Fig. S5-B-G, Fig. S6A.

      Significance

      The study provides an interesting conclusion and potentially relevant discovery. However, in opinion of this reviewer, the performed experiments do not strengthen this sufficiently, especially in terms of mechanical insights and weak data on human samples. In the line of general literature on new treatments of GBM and testing thereof in mouse model, this study lacks mechanistic insights and solid data on therapeutic efficiency.

      -As mentioned above, the goal of this paper is to provide novel methods to integrate datasets, resource building, and identify markers in the glioma TME. It will serve as useful resources to the community and form the foundation for future therapeutic validation in larger cohorts. We have acknowledged the limitations in the revised manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): ____

      The authors of "Cross-Species Transcriptomic Integration Reveals a Conserved, MIRO1-Mediated Macrophage-to-T-Cell Signaling Axis Driving Immunosuppression in Glioma" present transcriptomic, both bulk RNA Seq and single nucleus RNA Seq, from GL261 murine gliomas treated with the Miro1 targeting compound MR3. RNA Seq data from human tumor explants treated with MR3 is also presented. The authors compared DEGs from their treated tissues with publicly available RNA Seq data sets comparing DEGs from normal tissue and Glioma tumors. The goal being to identify genes modulated by MR3 that may be underlying glioma growth, TME changes, and immunosuppression. There is a significant amount of data presented, with in-depth analysis conducted on the sequencing data sets. The manuscript is lacking in mechanistic depth and this reviewer feels that the results are over-interpreted, especially without any additional conformational assays run to confirm the interpretation of the sequencing data. There were many bold statements made (lines 109-110, 117, 130-131, 142-144, 163-165) that I felt did not have enough evidence to back up their claims. __

      -We have toned down these places mentioned above:

      Line 109-110: Deleted now

      Line 117: Deleted now

      Line 130-131: Deleted now

      Line 142-144: Deleted: "highly differentially expressed", the rest of the sentence is supported by our data.

      Line 163-165: Deleted now

      As explained later, our paper is focused on bioinformatic analysis and resource and method building. In-depth functional studies will be performed in another paper.

      __A significant concern is the lack of conformation that MR3 is targeting Miro1 in these models. __

      -We have done this in another manuscript where we show that in cellular glioma models, Miro1 is the target of MR3 and MR3 exerts its functions via directly binding to Miro1.

      __Previous publications from the authors have shown evidence that MR3 reduces Miro1 expression in cell and fly models. Sometimes this requires the co application of FCCP or antimycin A. Thus, the results attributed within cannot be attributed to Miro1 changes but rather any on or off-target effect of MR3. __

      -We originally discovered MR3 by ligand-based in silico modeling and thermal shift direct binding assay (1, 2). Thus, MR3 is a Miro1 binder (stated in Abstract and Introduction too, now we have added more background in Introduction). Indeed, sometimes we saw MR3 reduced Miro1 protein levels under certain conditions, for example, in vivo in flies after days of feeding (1, 2), or in PD cells upon Antimycin A or CCCP treatment (1, 2, 6, 7). MR3 mostly likely exerts its function via altering Miro1 protein-protein interactions (8) and Miro1 protein is subsequently degraded in proteasomes following complex dissociation or after posttranslational modifications (1, 2) (8). We have stated this hypothesis in Result section (page 10, possible model).

      In our other papers we have excluded off-target effect of MR3 by examining other mitochondrial GTPases (1, 2) including Miro2, and by showing Miro1 KD glioma cells phenocopied the effects of MR3 and drug-resistant Miro1 mutant in glioma cells rendered insensitivity to MR3. These data show Miro1 is the main target of MR3.

      We have added more explanations to the Introduction.

      __Understanding that mouse studies are expensive and time-consuming, and the acquisition of human tissue is not trivial, the sample sets are still small. Further confirmation of findings in cell models, organoids etc. would strengthen the findings and justify the smaller sample size of mice and human tissue. __

      -We agree and we have another in-depth study. However, the current paper is focused on conceptual and methodical advances and providing a resource to the community, which is already big with 11 figures. As Rev 1 mentioned, our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Thoroughly understanding Miro1's role in glioma TME is our next goal as stated in Discussion and is beyond the scope of the current study.

      __The website MiroScape will be a very useful tool in the proper hands. ____

      1. Confirm activity of MR3 on Miro1 in relevant samples. Direct downregulation? Modulation of other targets known to be altered by MR3? __

      -As mentioned above, we have shown in tumor cells, MR3 disrupts pathogenic Miro1-protein interactions without the need to reduce Miro1 protein. There is currently no other target known to be altered by MR3, not even Miro2, demonstrated before (1, 2). We have added more explanations in Main Text.

      __ Conduct further mechanistic work to validate claims inferred by differentially expressed genes.__

      -As mentioned above, our current paper is focused on bioinformatic methods and resource building. Further mechanistic work will be performed in another paper.

      __ Significantly temper claims related cell targeting, direct communication between cells and overarching responses inferred from Sequencing data. -Done. See above and Main Text.

      Reviewer #3 (Significance (Required)):

      My laboratories expertise lies in signaling related to mitochondrial structure and function. We have investigated the Miro1 protein and effects on cellular responses related to Miro1 expression. We have tested the MR3 compound in our own systems with limited success. Therefore my major concerns lie in validating the on-target activity of the compound in their models. __-As explained above, in our other papers we have thoroughly examined on-target activity of MR3 by courter-screening other Miro1 related/similar proteins (1, 2, 6, 7) and by using Miro1 KD cells. We have now added more explanations in Main Text.

      __ With additional mechanistic validation this could be a very significant study. Using advanced model systems as the authors do allows for a comprehensive understanding of tissue responses. This is far advanced from simple single cell line culture studies but also adds significant complexity to the interpretation of the data. I am a strong believer that Sequecing data must be validated with functional assays.__

      -We agree and are actively conducting those studies. However, bioinformatic analysis and method and resource building are sometimes too comprehensive to combine with functional data which may take years to obtain. We think our paper's method, markers identified in TME, and resources will be very useful to the community.

      References

      1. Hsieh CH, Li L, Vanhauwaert R, Nguyen KT, Davis MD, Bu G, Wszolek ZK, Wang X. Miro1 Marks Parkinson's Disease Subset and Miro1 Reducer Rescues Neuron Loss in Parkinson's Models. Cell metabolism. 2019;30(6):1131-40 e7. Epub 2019/10/01. doi: 10.1016/j.cmet.2019.08.023. PubMed PMID: 31564441; PMCID: PMC6893131.
      2. Li L, Conradson DM, Bharat V, Kim MJ, Hsieh CH, Minhas PS, Papakyrikos AM, Durairaj AS, Ludlam A, Andreasson KI, Partridge L, Cianfrocco MA, Wang X. A mitochondrial membrane-bridging machinery mediates signal transduction of intramitochondrial oxidation. Nat Metab. 2021. Epub 2021/09/11. doi: 10.1038/s42255-021-00443-2. PubMed PMID: 34504353.
      3. Mi Y, Guo N, Luan J, Cheng J, Hu Z, Jiang P, Jin W, Gao X. The Emerging Role of Myeloid-Derived Suppressor Cells in the Glioma Immune Suppressive Microenvironment. Front Immunol. 2020;11:737. Epub 2020/05/12. doi: 10.3389/fimmu.2020.00737. PubMed PMID: 32391020; PMCID: PMC7193311.
      4. Dean PT, Hooks SB. Pleiotropic effects of the COX-2/PGE2 axis in the glioblastoma tumor microenvironment. Front Oncol. 2022;12:1116014. Epub 20230126. doi: 10.3389/fonc.2022.1116014. PubMed PMID: 36776369; PMCID: PMC9909545.
      5. Mathios D, Kim JE, Mangraviti A, Phallen J, Park CK, Jackson CM, Garzon-Muvdi T, Kim E, Theodros D, Polanczyk M, Martin AM, Suk I, Ye X, Tyler B, Bettegowda C, Brem H, Pardoll DM, Lim M. Anti-PD-1 antitumor immunity is enhanced by local and abrogated by systemic chemotherapy in GBM. Science translational medicine. 2016;8(370):370ra180. Epub 2016/12/23. doi: 10.1126/scitranslmed.aag2942. PubMed PMID: 28003545; PMCID: PMC5724383.
      6. Bharat V, Durairaj AS, Vanhauwaert R, Li L, Muir CM, Chandra S, Kwak CS, Le Guen Y, Nandakishore P, Hsieh CH, Rensi SE, Altman RB, Greicius MD, Feng L, Wang X. A mitochondrial inside-out iron-calcium signal reveals drug targets for Parkinson's disease. Cell Rep. 2023;42(12):113544. Epub 2023/12/07. doi: 10.1016/j.celrep.2023.113544. PubMed PMID: 38060381.
      7. Bharat V, Hsieh CH, Wang X. Mitochondrial Defects in Fibroblasts of Pathogenic MAPT Patients. Front Cell Dev Biol. 2021;9:765408. Epub 2021/11/23. doi: 10.3389/fcell.2021.765408. PubMed PMID: 34805172; PMCID: PMC8595217.
      8. Kwak CS, Du Z, Creery JS, Wilkerson EM, Major MB, Elias JE, Wang X. Optogenetic Proximity Labeling Maps Spatially Resolved Mitochondrial Surface Proteomes and a Locally Regulated Ribosome Pool. bioRxiv. 2025. Epub 2026/01/07. doi: 10.64898/2025.12.21.693523. PubMed PMID: 41497653; PMCID: PMC12767525.
    1. Author response:

      General Statements

      Our study provides important mechanistic insights into how the perinuclear actomyosin network PANEM facilitates the interaction of unfavorably positioned chromosomes, i.e. peripheral and polar chromosomes, with the mitotic spindle in early mitosis to ensure their correct segregation in subsequent anaphase. All reviewers agree that our study makes important contribution to the field of mitosis and chromosome segregation. They make positive comments on our manuscript, for example, ‘The work highlights the PANEM as a key spatial and temporal element of chromosome congression’, ‘The work is an excellent addition to the field’, and ‘the concept of PANEM could be integrated into textbooks and models of chromosome congression’. All three reviewers also acknowledge the high quality of the data, rigorous and accurate analyses, and convincing quantification in our study. Reviewers 1 and 3 give several comments and suggestions for revision of our manuscript. Please find our point-by-point revision plan of the manuscript from page 3.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Figure 4I: This panel is currently unclear and should be drastically simplified.

      We will follow this suggestion and simplify this figure. For example, we plan to remove the column of “Start” because it is obvious and does not provide much new information.

      I recommend to reorganize figures as follows:

      Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.

      We will follow this suggestion and reorganize Figure 1 accordingly.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      C. Expansion of PANEM functional analysis

      To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]

      Test PANEM function in at least one additional cell line that displays PANEM to rule out cellline-specific effects.

      As suggested, we will study the effect of PANEM contraction in one or two additional cell lines that form PANEM during prophase. For example, we plan to inhibit the PANEM contraction and study the outcome, focusing on the generation of polar chromosomes, which is the major defect after the inhibition of PANEM contraction in U2OS cells.

      Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.

      As suggested, we will investigate the outcome (e.g. generation of polar chromosomes) of reduced PANEM contraction in unsynchronized U2OS cells, and address whether the two subsets of cells, where centrosomes’ separation occurs before and after NEBD, show any difference in the outcome.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Minor Comments

      Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.

      As suggested, we will move the final paragraph of the Discussion to make a new final section in the Results. Moreover, as suggested, we will study the outcome of inhibiting PANEM contraction in cell lines other than U2OS, and add the results to the new final section in the Results.

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      As suggested, we will include kinetochore tracking data as supplemental data in the revised manuscript.

      Minor points

      (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.

      The same question has been raised by Reviewer #1’s major point. We will undergo new experiments to directly address this question in a revised manuscript. If we do not obtain interpretable results, we will discuss this issue further in the Discussion, as suggested.

      (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?

      This issue is closely relevant to point 2 above. As discussed above, we will first address this issue experimentally. If we do not obtain interpretable results, we will discuss this issue further in the Discussion.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).

      As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.

      We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as pointed out by the reviewer. Such a difference may have arisen due to different cell types (e.g. some cells form the PANEM and others do not: Figure S7), a different extent in the inhibition of PANEM formation, and/or the inhibition of cell rounding and cytokinesis (e.g. if cytokinesis is more sensitive to inhibitors than is the PANEM formation, we may not observe the possible effects on early chromosome movements due to PANEM inhibition while cytokinesis is still affected). As suggested, we discussed this topic in the Discussion (page 15, second paragraph). 

      Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 13, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 13, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 13, second paragraph).

      C. Expansion of PANEM functional analysis

      Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 14; ‘It has been a widely accepted view…’).

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).

      We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.

      Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 16): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Introduction

      Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.

      As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.

      Results (by subheading)

      First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).

      As suggested, we cited these references at the indicated part of the first section of the Results (page 5).

      Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017

      (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).

      Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).

      Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.

      We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.

      Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.

      These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.

      Discussion

      When discussing cortical actin, cite key reviews on its presence and function during mitosis:

      Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).

      As suggested, we have cited all these review papers in the Discussion (page 15), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.

      To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. We also did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 23).

      With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 23-24).

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?

      Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.

      (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 22).

      Minor points

      (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.

      As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, S4 and S5.

      (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.

      We checked the sensitivity of cell lines in Figure S7B to blebbistatin (the original form of azBB) on DepMap. There was no plausible difference between PANEM+ and PANEM- cell lines, although the blebbistatin sensitivity data were available only for 4 cell lines (HCT116, MCF7, U2OS and HT29) in Figure S7B. Nonetheless, because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity may not necessarily reflect how essential the PANEM contraction is for chromosome congression.

      (5) Are congression times delayed in lines that naturally lack PANEM?

      For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (forming PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have newly discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 16).

      (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?

      The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. In the original manuscript, it was not clear that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores, while the start of Phase 4 was defined differently for the two groups. This was confusing in the original manuscript. We have now clarified these points in the Method section (page 23).

      (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?

      In Figure S2E, we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBB-treated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for Figure S2E.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

      Description of analyses that authors prefer not to carry out

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.

      In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.

      New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.

      If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).

      New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.

      If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).

      In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Results (by subheading)

      Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).

      We respect this comment. However, if biorientation were established more rapidly for centrally located kinetochores, it would advance the initiation of congression, but would not necessarily change congression speed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable work investigates the role of protein N-glycosylation in regulating T-cell activation and function and suggests that B4GALT1 is a potential target for tumor immunotherapy. The strength of evidence is solid, and further mechanistic validation could be provided.

      We sincerely thank the editor and reviewers for their time and constructive feedback. Your recognition of our work is much appreciated. We clarify our mechanistic studies as stated below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

      We are very sorry that we did not highlight our results in Fig. 5f-h enough. In those figures, we demonstrated the interaction between TCR and CD8 increased significantly in B4GALT1 deficient T-cells, by FRET assays. To confirm the important role of TCR-CD8 interaction in mediating the functions of B4GALT1 in regulating T-cell functions, such as in vitro killing of target cells, we artificially tethered TCR and CD8 by a CD8β-CD3ε fusion protein and tested its functions in both WT and B4GALT1 knockout CD8<sup>+</sup> T-cell. Our results demonstrate that such fusion protein could bypass the effect of B4GALT1 knockout in CD8<sup>+</sup> T-cells (Fig. 5g-h). Together with the results that B4GALT1 directly regulates the galactosylation of TCR and CD8, those results strongly support the model that B4GALT1 modulates T-cell functions mainly by galactosylations of TCR and CD8 that interfere their interaction.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      Thank reviewer for reminding. We verified the efficiency of some gRNAs by T7E1 assay. We will add those data in supplementary results in revised version later.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      Please see response to reviewer #1.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      Thank reviewer for suggestion! Unfortunately, there are multiple-sites of TCR and CD8 involved in N-glycosylation (https://glycosmos.org/glycomeatlas). We worry that mutations of all these sites may not only affect glycosylation of TCR and CD8 but also other essential functions of those proteins.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

      Thank reviewer for this great suggestion to expand the roles of B4GALT1 in autoimmune and infection diseases. However, since in current manuscript we are mainly focusing on tumor immunology, we think we should leave these studies for future works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy. However, authors need to directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements. In addition, blocking PD1 has been shown to enhance antitumor effect, whereas the presented data in this study suggest that the activation of PD1 expression in the condition of B4GALT1 deficiency in T cells enhanced antitumor effect. How to reconcile this discrepancy? Finally, several minor questions need to be addressed to strengthen the conclusions in this manuscript.

      (1) We used a FRET (Fluorescence Resonance Energy Transfer) assay to measure interaction between TCR and CD8. FRET signals of TCR-CD8 increased significantly in B4GALT1 deficient T-cells, compared with control cells (Fig. 5f). For functional outcomes of this interaction, we observed enhanced T-cell killing activities in B4GALT1 deficient CD8<sup>+</sup> T-cells (Fig. 3f and Fig. 5h).

      To confirm whether reduced TCR-CD8 interaction is the major cause of TCR activation phenotypes in B4GALT1 knockout CD8<sup>+</sup> T-cells, we generated a construct in which we fused the CD8b ectodomain (ECD) with CD3e to artificially tether TCR with CD8 (Fig.5g). Overexpression of such CD8β-CD3ε fusion led to enhanced in vitro killing activities in control wild-type CD8<sup>+</sup> T-cells. On the other hand, in B4GALT1 deficient CD8<sup>+</sup>T-cells, such enhanced T-cell killing activities by fusion construct was significantly diminished (Fig.5h), suggesting it bypassed the regulation by B4GALT1.

      (2) PD-1 is both an early T-cell activation marker upon TCR activation and a T-exhausted marker under consecutive or repeated stimulations. In our screenings, PD-1 was used as an early activation marker for T-cells.

      We have clarified this in new Discussion section.

      (1) The present data relies on statistical graphs (e.g., bar and line charts) for all data, excluding the bioinformatics analysis. Including data such as flow cytometry plots, photomicrographs, or immunohistochemistry staining images will provide more direct support for the conclusions.

      Thank the reviewer for valuable suggestions! We added original flow cytometry gating strategies for Cas9 screening sorting (Fig. S1a), TIL analysis (Fig.S5), and FRET assay (Fig. S8) in revised version to provide more direct support for our conclusions.

      (2) To further validate the enhanced tumor infiltration phenotype resulting from B4GALT1 knockout, the following data would strengthen the manuscript:

      (a) Flow cytometric analysis of TILs or immunofluorescence data from tumor sections.

      Thank the reviewer for valuable suggestion! We added original flow cytometry gating strategies for TILs in Fig. S5 in revised version.

      (b) Assessment of in vivo T cell proliferation, for example, by tracking changes in the proportion of CD8+ T cells in the peripheral blood over time.

      We analyzed in vivo T-cell proliferation within tumor by CFSE (carboxyfluorescein succinimidyl ester) analysis. As shown in Fig. S6b, 6 days after infusion, B4GALT1 knockout OT-I T-cell showed increased proliferation within tumors, comparing with wild type control OT-I cells.

      (c) Evaluation of the proliferation and activation status of OT-1 CD8+ T cells specifically in the draining lymph nodes of the mouse model.

      Thank the reviewer for valuable suggestion! We plan to perform this experiment in the future.

      (3) The authors provide evidence that B4GALT1 knockout enhances CD8+ T cell function in both mouse models and human TCR-T cells (in vitro). Definitive support for the translational potential of this strategy would come from showing that B4GALT1-knockout human TCR-T cells also mediate potent in vivo function (NSG tumor-bearing model may be a better choice).

      Thank the reviewer for valuable suggestion! We are going to perform those experiments in the future. However, we do not expect that in vitro and in vivo (NSG mice) experiments will show much different results, which may also not add too much for current manuscript.

      (4) It would be preferable to include data on T cell activation and effector function (e.g., flow cytometry for IL-2, TNF-α, and IFN-γ, or ELISPOT) following stimulation with an OVA-specific peptide or co-culturing of OVA-expressing tumor cells with B4GALT1-knockout OT-1 CD8 T cells, especially the changes in the TILs compared with the non-targeting control group.

      Following co-culturing of B16-OVA tumor cells with B4GALT1-knockout or wild-type OT-I CD8<sup>+</sup> T-cells, the RNA levels and secretion levels of TNFα and IFNγ were detected by RT-qPCR and ELISA, respectively (Fig. 3c). B4GALT1-deficient OT-I T-cells showed increased expression of T-cell activation and cytotoxic markers such as IFNγ and TNFα.

      (5) What is the correlation between the expression of B4GALT1, PD-1, and TCR activation markers at various time points during a long-term T cell co-culture with tumor cells?

      Thanks for the reviewer for valuable suggestion! We don’t have this data now. While we agree that exploring this might be interesting, we think it falls outside the scope of the current study.

      (6) In line 136: Regarding the genetic targeting of B4GALT1 in T cells, it is unclear whether single or multiple gRNAs were used and if potential off-target effects were assessed. To fully validate the model, it would be important to clarify these strategies, and it is essential to include data on the knockout efficiency at both the protein (e.g., Western blot) and mRNA levels.

      We are sorry about the unclear statements for gene knockout strategy. In current study, single sgRNAs were used in all experiments for gene knockout. B4galt1 sg2 was used in Fig. 3a. Both B4galt1 sg1 and sg2 were used in Fig. S1d. We clarified this in each figure legend in revised version.

      The phenotypes of B4galt1 knockout T-cells could be rescued by overexpression of either a short or long isoform of mouse B4galt1 cDNA (Fig. 3b), indicating that potential off-target effects could be excluded.

      The sgRNA knockout efficiencies were confirmed by T7E1 assay in revised version (Fig. S2). Regrettably, anti-mouse B4galt1 antibody didn’t work in western blot.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study compares four models - VALOR (dynamic visual-text alignment), CLIP (static visual-text alignment), AlexNet (vision-only), and WordNet (text-only) - in their ability to predict human brain responses using voxel-wise encoding modeling. The results show that VALOR not only achieves the highest accuracy in predicting neural responses but also generalizes more effectively to novel datasets. In addition, VALOR captures meaningful semantic dimensions across the cortical surface and demonstrates impressive predictive power for brain responses elicited by future events.

      Strengths:

      The study leverages a multimodal machine learning model to investigate how the human brain aligns visual and textual information. Overall, the manuscript is logically organized, clearly written, and easy to follow. The results well support the main conclusions of the paper.

      (1) My primary concern is that the performance difference between VALOR and CLIP is not sufficiently explained. Both models are trained using contrastive learning on visual and textual inputs, yet CLIP performs significantly worse. The authors suggest that this may be due to VALOR being trained on dynamic movie data while CLIP is trained on static images. However, this explanation remains speculative. More in-depth discussion is needed on the architectural and inductive biases of the two models, and how these may contribute to their differences in modeling brain responses.

      Thank you for this thoughtful comment. We agree that attributing VALOR’s advantage over CLIP solely to ‘dynamic (video) versus static (image) pretraining’ would be incomplete, and that the architectural and inductive biases of the two models are central to understanding the observed performance gap.

      Both VALOR and CLIP use contrastive learning to align visual and textual representations, but they differ in several key inductive biases that are particularly relevant for modeling brain responses during continuous movie viewing. First, VALOR is trained to align temporally extended video segments with text, introducing an explicit temporal integration window that aggregates information across consecutive frames. This encourages representations that maintain context, stabilize semantics across time, and encode event-level structure. Second, VALOR’s alignment operates at the level of multi-second narrative units, rather than isolated visual snapshots, biasing the model toward representations that are sensitive to unfolding events and cross-frame consistency.

      In contrast, CLIP processes frames independently and aligns single static images with text. As a result, it lacks an intrinsic mechanism for temporal binding, context accumulation, or event-level representation. While CLIP can capture rich visual–semantic associations at the image level, it is less well suited to represent higher-order temporal structure, which is known to strongly drive responses in association cortex during naturalistic narrative perception.

      We therefore interpret VALOR’s superior encoding performance as reflecting not only exposure to dynamic audiovisual data, but also inductive biases—temporal integration and event-level alignment—that more closely match how the brain integrates information over time during movie watching. We have revised the Discussion (p. 16) to articulate these architectural and representational differences explicitly, rather than attributing the effect solely to training data modality.

      (On page 16) “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity.”

      (2) The methods section lacks clarity regarding which layers of VALOR and CLIP were used to extract features for voxel-wise encoding modeling. A more detailed methodological description is necessary to ensure reproducibility and interpretability. Furthermore, discussion of the inductive biases inherent in these models-and their implications for brain alignment - is crucial.

      Thank you for this comment. We agree that reproducibility and interpretability require precise specification of which model representations were used for voxel-wise encoding, as well as clearer discussion of the inductive biases inherent in these models and their implications for brain alignment.

      In the revised Methods, we now explicitly specify the feature sources for both models. For CLIP (ViT-B/32), we use the final pooled image embedding after projection into the shared image–text space, extracted frame-by-frame; one representative frame is sampled per TR, and its projected embedding serves as the regressor. For VALOR, we use the final joint video–text projection head, yielding a 512-dimensional embedding computed at the segment/TR level that integrates information across consecutive frames and aligns each multi-second video segment with its associated text. These procedures are now described step-by-step in the Methods (p. 21).

      In addition, we expanded the Discussion (p. 16) to explicitly articulate the models’ inductive biases and their relevance for brain alignment. In particular, we contrast CLIP’s image-level, framewise alignment—which lacks intrinsic temporal integration—with VALOR’s event-level, temporally extended video–text alignment, which biases representations toward context maintenance and narrative continuity. This distinction helps explain why the two models differ in their ability to predict neural responses during continuous movie viewing.

      (Methods, On page 21)

      “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.”

      (Discussion, On page 16)

      “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity. More broadly, this difference reflects distinct inductive biases in how the two models represent visual–linguistic information. CLIP is optimized for framewise image–text correspondence, encouraging representations that emphasize instantaneous visual semantics but remain agnostic to temporal structure. In contrast, VALOR is explicitly biased toward aggregating information over multiple consecutive frames and aligning representations at the level of temporally extended events. These inductive biases favor context maintenance, semantic stabilization, and narrative coherence over time, which are known to be critical for driving responses in association cortex during continuous movie perception.”

      (3) A broader question remains insufficiently addressed: what is the purpose of visual-text alignment in the human brain? One hypothesis is that it supports the formation of abstract semantic representations that rely on no specific input modality. While VALOR performs well in voxel-wise encoding, it is unclear whether this necessarily indicates the emergence of such abstract semantics. The authors are encouraged to discuss how the computational architecture of VALOR may reflect this alignment mechanism and what implications it has for understanding brain function.

      Thank you for this important conceptual question. We agree that improved voxel-wise encoding performance does not, by itself, imply the emergence of fully amodal or modality-independent semantic representations in the brain. In the revision, we therefore avoid framing our findings as evidence for abstract amodal semantics and instead clarify a more constrained interpretation.

      Specifically, we suggest that visual–text alignment may support the stabilization and coordination of scene-level meaning across modalities and over time, rather than the formation of modality-free semantic codes. From this perspective, VALOR’s advantage reflects inductive biases that promote (i) integration of visual information over multi-second windows and (ii) alignment of temporally extended visual events with linguistic descriptions, yielding representations that are more temporally stable, context-sensitive, and constrained by language.

      We therefore interpret VALOR’s superior encoding performance as identifying cortical regions whose responses are better captured by temporally stabilized, cross-modal representations, rather than as evidence that these regions encode fully abstract semantics independent of input modality. We have expanded the Discussion (p. 16) to articulate this interpretation and to clarify the implications of video–text alignment for understanding how the brain integrates perception and language during naturalistic cognition.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time. At the same time, these findings do not imply that visual–text alignment in the brain gives rise to fully amodal, modality-independent semantic representations. Instead, we suggest that alignment between visual and linguistic signals may serve to stabilize and coordinate scene-level meaning across modalities and over time. From this perspective, VALOR’s architecture—by integrating visual information over multi-second windows and aligning temporally extended video segments with language—provides a computational proxy for how the brain may use linguistic constraints to organize, disambiguate, and maintain coherent representations of unfolding events. The observed encoding gains therefore highlight regions engaged in temporally stabilized, cross-modal integration during naturalistic perception, rather than providing evidence for abstract semantic codes divorced from sensory input.”

      (4) The current methods section does not provide enough details about the network architectures, parameter settings, or whether pretrained models were used. If so, please provide links to the pretrained models to facilitate reproducible science.

      We appreciate this comment and agree that our original description of model sources and implementation details was not sufficiently explicit. These details are essential for both reproducibility and interpretability. We have now made these specifications explicit in the revised Methods.

      In particular, we now state for each model:

      VALOR. We use the publicly released pretrained VALOR-large checkpoint. For each movie segment, we extract the joint video–text projection head output (512-D) that encodes the aligned segment-level audiovisual semantics. We report the checkpoint source, the segment duration (in frames/seconds), and how these segment-level embeddings are temporally aligned to TRs for voxel-wise encoding.

      CLIP (ViT-B/32). We use the standard pretrained CLIP weights. For each video frame, we extract the final pooled image representation after projection into CLIP’s shared image–text embedding space (512-D). We also clarify that one representative frame is sampled and aligned to each TR, and that these projected embeddings are used as regressors in the encoding model.

      AlexNet. We use the ImageNet-pretrained AlexNet. We take activations from conv5, and then apply PCA to reduce them to 512 dimensions before mapping them to the fMRI time series.

      For each model, the revised Methods now specify: the pretrained source/checkpoint, the layer or head from which features were taken, output dimensionality, any preprocessing or dimensionality reduction, and the temporal alignment procedure used to generate TR-level regressors. These revisions appear in the updated Methods (page 21).

      (On page 21) “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) P features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      Reviewer #2 (Public review):

      Fu and colleagues have shown that VALOR, a model of multimodal and dynamic stimulus features, better predicts brain responses compared to unimodal or static models such as AlexNet, WordNet, or CLIP. The authors demonstrated the robustness of their findings by generalizing encoding results to an external dataset. They demonstrated the models' practical benefit by showing that semantic mappings were comparable to another model that required labor-intensive manual annotation. Finally, the authors showed that the model reveals predictive coding mechanisms of the brain, which held a meaningful relationship with individuals' fluid intelligence measures.

      Strengths:

      Recent advances in neural network models that extract visual, linguistic, and semantic features from real-world stimuli have enabled neuroscientists to build encoding models that predict brain responses from these features. Higher prediction accuracy indicates greater explained variance in neural activity, and therefore a better model of brain function. Commonly used models include AlexNet for visual features, WordNet for audio-semantic features, and CLIP for visuo-semantic features; these served as comparison models in the study. Building on this line of work, the authors developed an encoding model using VALOR, which captures the multimodal and dynamic nature of real-world stimuli. VALOR outperformed the comparison models in predicting brain responses. It also recapitulated known semantic mappings and revealed evidence of predictive processing in the brain. These findings support VALOR as a strong candidate model of brain function.

      (1) The authors argue that this modeling contributes to a better understanding of how the brain works. However, upon reading, I am less convinced about how VALOR's superior performance over other models tells us more about the brain. VALOR is a better model of the audiovisual stimulus because it processes multimodal and dynamic stimuli compared to other unimodal or static models. If the model better captures real-world stimuli, then I almost feel that it has to better capture brain responses, assuming that the brain is a system that is optimized to process multimodal and dynamic inputs from the real world. The authors could strengthen the manuscript if the significance of their encoding model findings were better explained.

      We thank the reviewer for this thoughtful comment and agree with the premise that a model preserving multimodal and temporal structure might a priori be expected to better predict brain responses to naturalistic stimuli. Our intent is not to claim that higher accuracy alone explains brain function, but rather that where and how VALOR improves prediction provides diagnostic insight into cortical processing. We have revised the Discussion to make this distinction explicit.

      Specifically, we clarify three ways in which VALOR’s gains are scientifically informative rather than merely unsurprising:

      (1) Anatomical specificity of improvement. VALOR’s advantage is not uniform across the cortex; gains are largest in regions implicated in multi-second, cross-modal integration. This spatial pattern constrains where the brain accumulates information over time and stabilizes visual representations using linguistic context.

      (2) Model as a computational probe. Beyond prediction accuracy, VALOR’s feature space recovers large-scale semantic organization without manual annotation and enables targeted tests of predictive processing. Features reflecting upcoming content selectively improve fits in specific regions, consistent with anticipatory coding during continuous narrative perception.

      (3) Link to individual differences. Individuals whose neural responses are better captured by anticipatory features show higher fluid intelligence, suggesting that VALOR indexes meaningful variability in forward-looking representations rather than merely tracking stimulus complexity.

      Accordingly, we have revised the Discussion (p. 16) to frame VALOR as a tool for mapping cortical integration profiles, probing semantic and predictive structure, and linking representational dynamics to cognition, rather than asserting that higher encoding accuracy alone explains brain function.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time.”

      (2) In Study 3, the authors show high alignment between WordNet and VALOR feature PCs. Upon reading the method together with Figure 3, I suspect that the alignment almost has to be high, given that the authors projected VALOR features to the Huth et al.'s PC space. Could the authors conduct non-parametric permutation tests, such as shuffling the VALOR features prior to mapping onto Huth et al.'s PC space, and then calculating the Jaccard scores? I imagine that the null distribution would be positively shifted. Still, I would be convinced if the alignment is higher than this shifted null distribution for each PC. If my understanding of this is incorrect, I suggest editing the relevant Method section (line 508) because this analysis was not easy to understand.

      Thank you for this helpful comment and for pointing out a potential source of confusion. We apologize that the original Methods description was not sufficiently clear. Importantly, VALOR features were never projected into the Huth et al. PC space, and no optimization or rotation toward the WordNet basis occurred at any stage.

      The analysis proceeded as follows:

      (1) VALOR PCs. We first fit voxel-wise encoding models using VALOR features on the Huth et al. dataset. We then applied PCA to the resulting cortical weight maps, yielding spatial components (‘VALOR PCs’) that summarize shared patterns of VALOR feature weights across the cortex.

      (2) WordNet PCs. We used the semantic principal components reported by Huth et al. (2012) directly as published, with no refitting, projection, or modification using VALOR.

      (3) Correspondence analysis. Only after obtaining these two independent sets of cortical maps did we threshold each to their top-loading vertices and compute Jaccard overlap between VALOR PCs and WordNet PCs.

      Although a permutation that shuffles VALOR features prior to projection addresses a scenario that does not apply here, we agree that the Methods description should more clearly convey the independence of the two decompositions. We have therefore revised the Methods (p. 24) to describe the procedure step-by-step and explicitly state that no projection, refitting, or optimization toward the WordNet basis was performed.

      (On page 24) “We first fit voxel-wise encoding models using VALOR features for each of the five participants in the Huth et al. dataset. For each participant, this yielded a weight map linking each VALOR feature to each voxel. We then stacked these weight maps across participants to form a single voxel-by-feature weight matrix and applied principal component analysis (PCA). The top four principal components from this analysis (“VALOR PCs”) captured shared spatial patterns of VALOR feature weights across cortex. To interpret these components, we projected VALOR feature vectors from >20,000 video segments in the VALOR training set onto each VALOR PC, which revealed dominant semantic axes (e.g., mobility, sociality, civilization). For comparison, we used the semantic principal components reported by Huth et al. (2012) from their WordNet-based encoding model; these “WordNet PCs” were taken directly from the published work and were not refit or reweighted using VALOR.”

      (3) In Study 4, the authors show that individuals whose superior parietal gyrus (SPG) exhibited high prediction distance had high fluid cognitive scores (Figure 4C). I had a hard time believing that this was a hypothesis-driven analysis. The authors motivate the analysis that "SPG and PCu have been strongly linked to fluid intelligence (line 304)". Did the authors conduct two analyses only-SPG-fluid intelligence and PCu-fluid intelligence-without relating other brain regions to other individual differences measures? Even if so, the authors should have reported the same r-value and p-value for PCu-fluid intelligence. If SPG-fluid intelligence indeed holds specificity in terms of statistical significance compared to all possible scenarios that were tested, is this rationally an expected result, and could the authors explain the specificity? Also, the authors should explain why they considered fluid intelligence to be the proxy of one's ability to anticipate upcoming scenes during movie watching. I would have understood the rationale better if the authors had at least aggregated predictive scores for all brain regions that held significance into one summary statistic and found a significant correlation with the fluid intelligence measure.

      We thank the reviewer for this careful and constructive comment and agree that greater transparency about analytic intent, specificity, and rationale is needed. We have revised the manuscript accordingly.

      (1) Analytic scope and a priori restriction. The analysis in Fig. 4C was hypothesis-driven and restricted a priori to two regions — superior parietal gyrus (SPG) and precuneus (PCu) — based on convergent evidence linking frontoparietal and medial parietal systems to fluid reasoning, relational integration, and domain-general cognitive control. Importantly, we did not conduct a whole-brain search across regions or behaviors to identify the strongest correlation post hoc.

      (2) Specificity and reporting. In response to the reviewer’s request, we now report the full results for both hypothesized regions. Prediction horizon in SPG showed a statistically reliable association with fluid intelligence, whereas PCu showed a positive but weaker trend that did not survive correction. Reporting both results makes the regional specificity explicit rather than implicit.

      (3) Why SPG over PCu? Although both regions are implicated in fluid cognition, SPG has been more consistently linked to active maintenance and manipulation of relational structure and top-down attentional control, whereas PCu is more often associated with internally oriented and mnemonic processes. We therefore interpret the stronger SPG association as consistent with a role for sustained, externally driven predictive processing during continuous perception, rather than as evidence of exclusivity.

      (4) Why fluid intelligence? We do not equate fluid intelligence with “anticipation” per se. Rather, we used gF as an a priori proxy for domain-general capacities — maintaining and updating relational context over multi-second windows, integrating multiple constraints, and exerting flexible control — that are plausibly recruited when anticipating upcoming events during naturalistic narratives. The reported relationship is associative and hypothesis-consistent, not causal.

      (5) Why not aggregate across regions? We agree that aggregation could reveal more global relationships; however, our goal in this analysis was to test whether predictive timescales in theoretically motivated control regions relate to individual differences, rather than to maximize correlation by pooling heterogeneous regions. We now clarify this rationale in the Results.

      These clarifications and additional statistics have been incorporated in the revised Results section (p. 14).

      (On page 14) “Finally, we examined whether prediction horizons were linked to individual differences in cognition. We focused on fluid intelligence (gF) because gF is widely taken to index domain-general capacities such as maintaining and updating relational context over several seconds, integrating multiple constraints, and exerting flexible top-down control — functions that should support anticipating what will happen next in a continuous narrative. We targeted two parietal regions, the SPG and the PCu, which have both been repeatedly linked to gF and high-level cognitive control in the individual-differences literature 36,37. For each participant, we correlated fluid cognition scores with that participant’s average prediction horizon in each region. As shown in Fig. 4c, individuals with longer prediction horizons in SPG showed higher fluid cognition scores (SPG: r = 0.172, FDR-corrected p = 0.047). PCu showed a similar positive trend (PCu: r = 0.111, FDR-corrected p = 0.146) but did not reach significance. These associations suggest that the ability to sustain a longer predictive timescale during naturalistic perception co-varies with broader fluid cognitive capacity. No additional brain regions or behavioral measures were examined in this analysis.”

      Reviewer #3 (Public review):

      In this work, the authors aim to improve neural encoding models for naturalistic video stimuli by integrating temporally aligned multimodal features derived from a deep learning model (VALOR) to predict fMRI responses during movie viewing.

      Strengths:

      The major strength of the study lies in its systematic comparison across unimodal and multimodal models using large-scale, high-resolution fMRI datasets. The VALOR model demonstrates improved predictive accuracy and cross-dataset generalization. The model also reveals inherent semantic dimensions of cortical organization and can be used to evaluate the integration timescale of predictive coding.

      This study demonstrates the utility of modern multimodal pretrained models for improving brain encoding in naturalistic contexts. While not conceptually novel, the application is technically sound, and the data and modeling pipeline may serve as a valuable benchmark for future studies.

      (1) Lines 95-96: The authors claim that "cortical areas share a common space," citing references [22-24]. However, these references primarily support the notion that different modalities or representations can be aligned in a common embedding space from a modeling perspective, rather than providing direct evidence that cortical areas themselves are aligned in a shared neural representational space.

      We thank the reviewer for this important clarification. We agree that the cited works do not provide direct evidence that cortical areas themselves are aligned in a single neural representational space. Rather, they demonstrate that representations derived from different modalities can be mapped into a shared embedding space from a modeling and computational perspective.

      We have therefore revised the text to avoid overstatement and to more precisely reflect what these studies support. In the revised manuscript (p. 4), we now frame the claim in terms of a shared representational framework or feature space used for modeling, rather than implying that cortical areas themselves intrinsically share a unified neural space. This clarification aligns the conceptual claim with the scope of the cited literature.

      (On page 4) “As a result, researchers are turning to multimodal deep learning, which learns from visual, linguistic, and auditory streams to model complex brain functions. This trend is supported by neuroscience evidence that cortical responses across regions can be jointly modeled within a common representational space.”

      (2) The authors discuss semantic annotation as if it is still a critical component of encoding models. However, recent advances in AI-based encoding methods rely on features derived from large-scale pretrained models (e.g., CLIP, GPT), which automatically capture semantic structure without requiring explicit annotation. While the manuscript does not systematically address this transition, it is important to clarify that the use of such pretrained models is now standard in the field and should not be positioned as an innovation of the present work. Additionally, the citation of Huth et al. (2012, Neuron) to justify the use of WordNet-based annotation omits the important methodological shift in Huth et al. (2016, Nature), which moved away from manual semantic labeling altogether. Since the 2012 dataset is used primarily to enable comparison in study 3, the emphasis should not be placed on reiterating the disadvantages of semantic annotation, which have already been addressed in prior work. Instead, the manuscript's strength lies in its direct comparison between data-driven feature representations and semantic annotation based on WordNet categories. The authors should place greater emphasis on analyzing and discussing the differences revealed by these two approaches, rather than focusing mainly on the general advantage of automated semantic mapping.

      Thank you for this thoughtful and constructive comment. We agree with the reviewer that the field has largely transitioned away from manual semantic annotation toward features derived from large-scale pretrained models (e.g., CLIP, GPT-style architectures), and that this shift is now standard rather than a novelty of the present work.

      We have revised the manuscript to clarify this positioning. Our goal is not to claim automated semantic extraction as an innovation, but rather to demonstrate how a multimodal, temporally informed video–text model can be used as a direct feature space for voxel-wise encoding of naturalistic movie fMRI data. VALOR is used as a representative example of this broader class of pretrained models, and our emphasis is on the general modeling approach rather than on promoting a specific architecture.

      We also agree that our original discussion underemphasized the important methodological shift introduced in Huth et al. (2016, Nature), which moved away from manual semantic labeling in the context of continuous spoken narratives. We now explicitly acknowledge this work and clarify that our use of WordNet-based annotations from Huth et al. (2012) serves a different purpose: it provides an interpretable, historically grounded benchmark for comparison in Study 3, rather than a claim that semantic annotation remains necessary or state-of-the-art.

      In response to the reviewer’s suggestion, we have revised the Results (p.10) and Discussion (p.18) to place greater emphasis on what is revealed by directly comparing data-driven multimodal features with category-based semantic annotation under matched conditions. Specifically, we focus on how these two approaches converge at the level of large-scale semantic organization while differing in their flexibility, temporal resolution, and dependence on human-defined categories. These revisions better reflect the current state of the field and sharpen the manuscript’s central contribution as a principled comparison between modeling approaches, rather than a general argument for automated semantic mapping.

      (On page 10) “Study 3: Comparing data-driven multimodal representations with category-based semantic annotation

      A central question in naturalistic encoding is how data-driven feature representations derived from pretrained models relate to more interpretable, category-based semantic annotations that have historically been used to study cortical semantic organization. Although recent work has shown that pretrained language and vision–language models can capture semantic structure without explicit annotation, category-based approaches such as WordNet remain valuable as interpretable reference frameworks. Here, we leverage the WordNet-based semantic components reported by Huth et al. (2012) 5 not as a state-of-the-art alternative, but as a historically grounded benchmark, allowing a controlled comparison between data-driven multimodal representations and manually defined semantic categories under matched naturalistic movie stimuli.”

      (On page 18) “Study 3 demonstrates the utility of video–text alignment models for probing higher-order semantic representations during naturalistic perception. Our comparison between VALOR-derived representations and WordNet-based semantic components highlights an important distinction between data-driven and category-based approaches to modeling meaning in the brain. While multimodal pretrained models offer flexible, high-dimensional representations that capture semantic structure without explicit annotation, category-based frameworks provide interpretability and theoretical anchoring 4,48. Using WordNet-based labeling from prior work as an interpretable reference point, we show that VALOR automatically extracts semantic dimensions—including mobility, sociality, and civilization—that closely mirror those identified using manual semantic categories (Fig. 3). The observed alignment between VALOR PCs and WordNet semantic components suggests that large-scale semantic organization emerges consistently across these approaches, even though they differ in how semantic structure is defined and learned. This convergence supports the use of pretrained multimodal models as practical encoding tools for naturalistic stimuli, while also underscoring the continued value of interpretable semantic benchmarks for understanding which aspects of meaning are represented across cortex. We do not argue that semantic annotation is required for modern encoding models; rather, WordNet-based features serve here as a historically grounded and interpretable reference for contextualizing data-driven multimodal representations.”

      (3) The authors use subject-specific encoding models trained on the HCP dataset to predict group-level mean responses in an independent in-house dataset. While this analysis is framed as testing model generalization, it is important to clarify that it is not assessing traditional out-of-distribution (OOD) generalization, where the same subject is tested on novel stimuli, but rather evaluating which encoding model's feature space contains more stimulus-specific and cross-subject-consistent information that can transfer across datasets.

      We thank the reviewer for this helpful clarification and agree that the type of generalization tested here should be described more precisely. Our analysis does not assess classical within-subject out-of-distribution (OOD) generalization, in which the same individual is tested on novel stimuli.

      Instead, for each HCP participant we train a subject-specific encoding model and transfer it to predict group-mean responses in an independent in-house dataset collected at a different site, with different participants, different movies, and different acquisition conditions. This design evaluates which encoding model’s feature space contains stimulus-locked representations that are consistent across individuals and robust to changes in dataset and experimental context, rather than within-subject stimulus novelty per se.

      We have revised the Results (p. 10) and Discussion section (p. 17) to explicitly describe this analysis as a test of cross-subject and cross-dataset transferability of stimulus representations, and to clarify the distinction from traditional OOD generalization.

      (On Page 10) “Although this analysis is not a classical within-subject out-of-distribution generalization test, it evaluates the extent to which different feature spaces capture stimulus-locked representations that are consistent across subjects and transferable across datasets, stimuli, and acquisition environments.”

      (On Page 17) “By contrast, VALOR exhibited stronger generalization in a cross-cohort, cross-stimulus, and cross-site transfer evaluation.”

      (4) Within this setup, the finding that VALOR outperforms CLIP, AlexNet, and WordNet is somewhat expected. VALOR encodes rich spatiotemporal information from videos, making it more aligned with movie-based neural responses. CLIP and AlexNet are static image-based models and thus lack temporal context, while WordNet only provides coarse categorical labels with no stimulus-specific detail. Therefore, the results primarily reflect the advantage of temporally-aware features in capturing shared neural dynamics, rather than revealing surprising model generalization. A direct comparison to pure video-based models, such as Video Swin Transformers or other more recent video models, would help strengthen the argument.

      We thank the reviewer for this baseline-focused comment and agree that, in naturalistic movie paradigms, a temporally structured audiovisual model would be expected to outperform static or unimodal feature spaces. Our intent in this comparison is therefore not to claim a surprising advantage, but to isolate which inductive biases matter for cross-dataset transfer of movie-evoked neural responses.

      The baseline models were chosen deliberately to span feature spaces that are widely used and interpretable in cognitive neuroscience: AlexNet (vision-only, frame-based), WordNet (human-defined semantic categories without learned visual features), and CLIP (static image–text alignment without temporal context). Comparing VALOR against these established baselines under matched preprocessing, TR alignment, and dimensionality control allows us to attribute performance differences specifically to temporal integration and audiovisual alignment, rather than to generic model capacity.

      We agree that a direct comparison with purely visual spatiotemporal encoders (e.g., Video Swin or TimeSformer-style models) would further dissociate the contribution of temporal visual processing from cross-modal video–text alignment. We now explicitly note this as an important direction for future work and frame VALOR as one representative of a broader class of multimodal video models, rather than as a uniquely optimal solution (Discussion, p. 16).

      (On page 16) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (5) Moreover, while WordNet-based encoding models perform reasonably well within-subject in the HCP dataset, their generalization to group-level responses in the Short Fun Movies (SFM) dataset is markedly poorer. This could indicate that these models capture a considerable amount of subject-specific variance, which fails to translate to consistent group-level activity. This observation highlights the importance of distinguishing between encoding models that capture stimulus-driven representations and those that overfit to individual heterogeneities.

      Thank you for this thoughtful observation. We agree with the reviewer’s interpretation. In our analyses, WordNet-based models perform reasonably well when fit and evaluated within individual HCP participants, but their performance degrades substantially when transferred to predict group-averaged responses in the independent SFM dataset. This dissociation suggests that, while WordNet annotations capture meaningful variance at the individual level, a larger fraction of that variance may be subject-specific or idiosyncratic, and therefore does not translate into consistent, stimulus-locked responses at the group level.

      One motivation for our cross-dataset, cross-subject evaluation is precisely to distinguish encoding models that primarily capture shared stimulus-driven structure from those whose apparent performance depends more strongly on individual heterogeneity. In this context, the reduced transferability of WordNet-based models highlights a potential limitation of category-based semantic features for capturing population-consistent neural dynamics during naturalistic viewing.

      We note that this effect likely reflects multiple factors rather than a single failure mode, including differences in annotation schemes, labeling granularity, and semantic coverage across datasets. By contrast, video–text models provide time-aligned linguistic features directly from the stimulus itself, reducing reliance on dataset-specific human annotation and exhibiting stronger transfer across cohorts. We have clarified this interpretation in the revised Discussion (p. 17).

      (Page 17) “Together, these findings underscore the importance of distinguishing encoding models that primarily capture shared, stimulus-driven neural structure from those whose performance relies more heavily on subject-specific heterogeneity, particularly when evaluating generalization across participants and datasets.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Methods section, please clarify which specific layer of VALOR the 512-dimensional feature vector was extracted from.

      Thank you for this suggestion. We have revised the Methods to state explicitly that the 512-dimensional feature vector is extracted from VALOR’s joint video–text projection head, i.e., the final projection layer of the contrastive alignment module that maps video and text representations into a shared embedding space. We also clarify that these 512-D embeddings are computed at the segment/TR level and then time-aligned to the BOLD signal (Methods, p. 21).

      (On page 21) “We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.”

      (2) It would be helpful to include more detailed descriptions of the network architectures and parameters for all models used.

      Thank you for the suggestion. We have revised the Methods to include model-specific subsections for all feature spaces used (VALOR, CLIP, AlexNet, and WordNet). For each model, we now explicitly report (i) the backbone architecture and training objective, (ii) the exact feature source (layer or projection head) and output dimensionality, and (iii) how features were temporally aligned to the BOLD signal. All models were used with their publicly released pretrained parameters, without additional fine-tuning. These additions are intended to improve transparency and reproducibility (Methods, p. 21).

      (On page 21) “Movie Feature Extraction

      (1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models. Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      (3) In Figure 3, consider following Huth et al.'s approach by using 3-4 distinct colors to visualize semantic representations across the cortical surface more clearly.

      Thank you for this excellent suggestion. We have generated an alternative visualization using a discrete 3–4 color scheme following Huth et al. to display the semantic components on the cortical surface. This version makes the spatial correspondence between components and the boundaries between cortical territories easier to see. We now include this visualization in the Supplement (Fig. S3)

      (4) In Figure 2, the brain renderings are too small. Please consider creating a separate, enlarged figure with clearer delineation of relevant ROIs.

      We appreciate this suggestion and agree that clear delineation of ROIs is important. We evaluated larger brain renderings; however, within the multi-panel layout of Fig. 2, enlarging them compressed accompanying plots/legends and introduced visual crowding, which reduced overall readability. To preserve a balanced layout and consistent typography across panels, we have kept the current rendering size in the main text and added Fig. S4 with enlarged brain renderings showing clearer ROI boundaries for the same ROIs.

      Reviewer #2 (Recommendations for the authors):

      (1) From the introduction, I feel like naïve readers would have a hard time understanding what semantic models (e.g., WordNet) are, which the authors write are based on "labor-intensive and subjective manual annotation of semantic content". It would be straightforward to explain the process-how scientists have written descriptions or denoted categories of what's happening within a TR and transformed these into embedding vectors based on language models. This description would explain what the authors mean by "labor-intensive, time-consuming, and subjective". Related to this point, the authors seem to be using the words "semantic model/feature" and "linguistic model/feature" interchangeably, which may exacerbate the confusion.

      Thank you for this helpful suggestion. We agree that naïve readers would benefit from a clearer explanation of how “semantic” models such as WordNet are constructed and from a more precise distinction between semantic and linguistic features.

      In response, we expanded the Introduction (p. 3) to explicitly describe the process by which semantic features are generated via dense human annotation (i.e., raters label objects, actions, and events within each TR and map these labels onto a predefined ontology to form feature vectors), clarifying why this approach is labor-intensive, time-consuming, and subject to rater variability.

      To avoid disrupting the conceptual flow of the Introduction, we placed the explicit terminology clarification in the Methods section (p. 22), where feature extraction is described. There, we now define semantic features as human-annotated, category-based representations of scene content, and linguistic features as continuous language embeddings derived automatically from pretrained language or vision–language models. These revisions are intended to improve clarity and consistency for both expert and non-expert readers.

      (On page 3) “Critically, semantic models often rely on dense human annotation. In early naturalistic encoding studies, trained raters watched the stimulus and labeled what was happening within each TR or short time window—for example, identifying objects, actions, or events present in the scene. These labels were then mapped onto a predefined semantic ontology (such as WordNet), yielding high-dimensional categorical feature vectors that served as regressors in encoding models. While this approach provides interpretable semantic features, it is labor-intensive, time-consuming, and inherently subjective, as annotations depend on rater judgment, labeling guidelines, and dataset-specific conventions, limiting scalability and reproducibility.”

      (On page 22) “Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models.”

      (2) Figure 1A does not look like an accurate schematic of the encoding method. For example, shouldn't the "Train" give rise to weight matrices, and Movies come from moments at Test? I would appreciate it if this schematic figure would explain what the encoding model is to naïve readers.

      (3) Figure 1B emphasizes that VALOR is utilizing multimodal features, but does not emphasize that the model is trained on dynamic video. The current figure looks like the model extracted visual and linguistic features from a screenshot of the video, much like the CLIP model.

      Thank you for this helpful comment. We agree that the original Fig. 1A did not sufficiently clarify what is learned during training versus what is applied during testing, and that this distinction is particularly important for naïve readers unfamiliar with encoding models. We also agree that the original Fig. 1B did not sufficiently emphasize that VALOR is trained on dynamic video segments, and that the schematic could be misinterpreted as aligning a single video frame with text, similar to CLIP-style image–text models.

      We have revised Fig. 1A (p. 6) to make the encoding procedure explicit and pedagogical. Specifically, we now clearly depict that, during the training phase (HCP dataset), voxel-wise encoding models learn feature-to-voxel weight matrices from stimulus features and BOLD responses. These learned weights are explicitly labeled as voxel-wise weight matrices and visually associated with the training stage. In the testing/generalization phase (SFM dataset), we now indicate that these learned weights are held fixed and applied to features extracted from novel movies to generate predicted BOLD responses. Additional labels were added to distinguish “Training (learn weights)” from “Testing/Transfer (apply fixed weights)” and to clarify that the encoding model implements a linear mapping from stimulus features to voxel responses. We have also rewritten the Fig. 1 legend (p. 6) to explicitly explain the encoding workflow in words, including (i) the learning of voxel-specific weights during training, (ii) their reuse during cross-dataset transfer, and (iii) how generalization performance is evaluated. These changes are intended to ensure that Fig. 1A accurately reflects the encoding methodology and is understandable to readers without prior experience with encoding models.

      We have revised Fig. 1B (p. 6) to explicitly highlight the temporal nature of the video input used by VALOR. In the updated schematic, the visual stream is depicted as a sequence of consecutive frames spanning multiple seconds, grouped into a video segment, rather than as a single static image. Additional labels indicate that VALOR encodes temporally extended video clips and aligns them with corresponding textual descriptions in a shared embedding space via contrastive learning. We have also updated the figure legend (p. 6) to clarify that VALOR operates on multi-frame video segments and explicitly models temporal structure, distinguishing it from static image–text models such as CLIP. These changes are intended to make clear that VALOR’s advantage derives not only from multimodality, but also from learning representations over time.

      (4) Regarding Figure 2, why were paired t-tests conducted in one-sided comparisons? Shouldn't this be two-sided, given that there is no reason to assume one is higher or lower than another?

      Thank you for raising this point. We agree that, in the absence of a preregistered directional hypothesis, paired comparisons should be evaluated using two-sided statistical tests.

      In response, we have re-run all paired comparisons reported in Figure 2 (p. 9) using two-sided paired t-tests, recomputed the corresponding p-values and false discovery rate (FDR) corrections, and updated the significance markers in the figure and captions accordingly. Importantly, this change does not alter the qualitative pattern of results or the main conclusions reported in the manuscript.

      (5) Regarding Study 4, I am curious whether the results are specific to forward-looking representations (predictive coding) or whether the results broadly reveal regions that are sensitive to contexts. For example, if the authors were to incorporate nearby past scenes in the analysis rather than the nearby future scenes, would different brain regions light up?

      Thank you for this thoughtful question. We agree that it is important to distinguish forward-looking (predictive) representations from more general sensitivity to temporal context. In Study 4, we deliberately operationalized prediction using future-aligned features, such that only information from upcoming scenes was incorporated into the encoding model. Accordingly, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic context sensitivity.

      To make this interpretive scope explicit, we have added a clarifying sentence at the beginning of the Study 4 paragraph in the Discussion (p.18), noting that our analysis incorporates only future-aligned features and that directly contrasting past- and future-aligned features will be an important direction for future work. This clarification is intended to clearly bound our claims while addressing the reviewer’s conceptual distinction..

      (On page 18) “In Study 4, we used a video-text alignment model to investigate predictive coding mechanisms. Because our analysis incorporates only future-aligned features, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic sensitivity to temporal context; directly contrasting past- and future-aligned features will be an important direction for future work.”

      (6) In the paragraph starting in line 447, were WordNet feature time series also reduced to 512 dimensions like the rest of the model features?

      Thank you for the question. In the main analyses, WordNet feature time series were not reduced to 512 dimensions and were instead used at their full dimensionality (859 features).

      For comparability with the other feature spaces, we additionally conducted a control analysis in which WordNet features were reduced to 512 dimensions using PCA. The PCA was fit within each training fold to avoid information leakage, and the resulting 512-D features were evaluated using the same encoding pipeline. This PCA-reduced version performed slightly worse than the full 859-D WordNet representation. Accordingly, we report results from the full 859-D WordNet features in the main text. We have clarified this point in the Methods section (p. 22).

      (On page 22) “We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text.”

      (7) I don't think authors have written what VALOR stands for.

      Thank you for the reminder. We now define the VALOR acronym at its first mention in the Abstract and Introduction and use the abbreviation thereafter.

      (On page 2) “Using a state-of-the-art deep learning model (VALOR; Vision-Audio-Language Omni-peRception)”

      (On page 5) “To answer this, we apply a video-text alignment encoding framework, using VALOR (Vision-Audio-Language Omni-peRception)—a high-performing, open-source model that aligns visual and linguistic features over time—to predict brain responses during movie watching.”

      (8) When calculating equation (3), please make sure that the correlation values are Fisher's r-to-z transformed.

      Thank you for this reminder. We confirm that all correlation coefficients used in Equation (3) are now Fisher r-to-z transformed prior to any averaging, contrasts, or statistical testing, and this procedure is now explicitly stated in the Methods. We have also updated Fig. 4a (p. 15) to reflect this transformation. Importantly, applying the r-to-z transformation does not change the qualitative pattern of results or their statistical significance.

      (9) I wasn't able to check the OSF data/codes because it required permission.

      Thank you for flagging this, and we apologize for the inconvenience. We have removed the permission restriction and set the OSF repository to public read-only access, which should resolve the issue.

      Reviewer #3 (Recommendations for the authors):

      (1) The current approach extracts features from a single "best" layer of each model, which may be suboptimal for predicting neural responses. Prior work has shown that combining features across multiple layers through optimized fusion strategies (e.g., St-Yves et al., 2023) or using model ensembles (e.g., Li et al., 2024) can substantially improve encoding performance. The authors may consider these more comprehensive approaches either as additional baselines or as alternative directions to enhance model accuracy.

      Thank you for this constructive suggestion. We agree that combining features across multiple layers or using optimized fusion and ensemble strategies, as demonstrated in recent work (e.g., St-Yves et al., 2023; Li et al., 2024), can substantially improve absolute encoding performance.

      In the present study, however, we intentionally evaluated each model using its single best-performing layer within a matched encoding pipeline. This design choice was made to maintain model-agnostic comparability and interpretability, and to ensure that performance differences could be attributed primarily to the type of representation (e.g., temporally informed video–text features versus static or unimodal features), rather than to differences in model complexity, parameter count, or fusion strategy. Importantly, this constraint was applied uniformly across all models and therefore does not favor VALOR over the baselines.

      We now explicitly note in the Discussion (p. 19) that multilayer fusion and ensemble approaches represent a natural and promising extension of our framework and are likely to further improve absolute prediction accuracy. Our goal in the current work was to establish the practical utility and generalizability of temporally aligned video–text features for naturalistic movie fMRI under a controlled and comparable evaluation setting..

      (On page 19) “Third, for comparability across models we evaluated each model using its single best-performing layer within a matched encoding pipeline rather than using multilayer fusion or ensembling, which allowed us to attribute performance differences to representational format but likely underestimates the absolute performance ceiling.”

      (2) Given the naturalistic video-based task, the manuscript would benefit from including state-of-the-art video-only models (e.g., Video Swin Transformer, VideoMAE, and other more recent architectures) as explicit baselines. These models are designed to capture spatiotemporal structure without relying on language input and would provide a more targeted comparison to assess the specific contribution of temporal visual processing.

      Thank you for this thoughtful suggestion. We agree that state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE) are highly relevant baselines for naturalistic movie paradigms and would provide a more targeted comparison for isolating the contribution of temporal visual processing independent of language input.

      In the present study, our primary goal was not to exhaustively benchmark all possible video architectures, but to evaluate whether temporally informed video–text features can serve as a practical and general-purpose encoding framework that improves upon the models most commonly used in cognitive neuroscience for naturalistic fMRI (e.g., AlexNet for vision, WordNet for semantic annotation, and CLIP for static multimodal alignment). Using these established baselines allowed us to place our results in direct continuity with prior neuroimaging work and to attribute performance differences to representational format under a controlled encoding pipeline.

      We agree that incorporating modern video-only spatiotemporal encoders is an important next step, particularly for disentangling the relative contributions of temporal visual structure and cross-modal video–text alignment. We now explicitly note this point in the Discussion (p.19) as a limitation and future direction, and view such comparisons as a natural extension of the current framework within the same TR-aligned encoding setup.

      (On page 19) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (3) An additional consideration is the scale of the AI models used for feature extraction. Previous studies (e.g., Matsuyama et al., 2023) have indicated that model size - particularly the number of parameters - can influence neural prediction performance, independently of architecture. A discussion or analysis of how model size contributes to the observed encoding gains would help clarify whether improvements are due to the representational quality of the model or simply its scale

      Thank you for this important point. We agree that model scale—particularly parameter count—can influence neural prediction performance independently of architecture, as noted in prior work (e.g., Matsuyama et al., 2023).

      In the present study, our primary goal was to evaluate whether temporally informed video–text representations provide practical advantages over unimodal and static multimodal baselines that are widely used in cognitive neuroscience for naturalistic movie fMRI, under a matched encoding pipeline. We did not perform a systematic scale-controlled analysis in this revision because doing so would require training or evaluating multiple size-matched variants across video-only and video–text architectures, which is beyond the scope of the current work.

      We therefore agree that part of the observed performance gains may reflect model capacity in addition to representational format, and we caution against attributing all improvements solely to cross-modal alignment or temporal structure. We now explicitly acknowledge this limitation in the Discussion and note that comparing size-matched video-only and video–text models within the same pipeline is an important next step for disentangling model scale from representational content.

      (On page 19) “Finally, part of VALOR’s advantage may reflect model capacity: larger pretrained models often yield higher encoding accuracy, so repeating these analyses with size-matched image-only and image–text models will be critical for disentangling model scale from representational content.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the current study, Huang et al. examined ACC response during a novel discrimination-avoid task. The authors concluded that ACC neurons primarily encode post-action variables over extended periods, reflecting the animal's preceding actions rather than the outcomes or values of those actions. Specifically, they identified two subgroups of ACC neurons that responded to different aspects of the actions. This work represents admirable efforts to investigate the role of ACC in task-performing mice. However, in my opinion, alternative explanations of the data were not sufficiently explored, and some key findings were not well supported.

      Strengths:

      The development of the new discrimination-avoid task is applauded. Single-unit electrophysiology in task-performing animals represents admirable efforts and the datasets are valuable. The identification of different groups of encoding neurons in ACC can be potentially important.

      Weaknesses:

      One major conclusion is that ACC primarily encodes the so-called post-action variables (specifically shuttle crossing). However, only a single example session was included in Figure 2, while in Supplementary Figure 2 a considerable fraction of ACC neurons appears to respond to either the onset of movement or ramp up their activity prior to movement onset. How did the authors reach the conclusion that ACC preferentially respond to shuttle crossing?

      We now include more example sessions and the main results from individual animals (Fig. 3; Figs. S2–S3; Fig. 8). Overall, the results are consistent across recording sessions and animals.

      While shuttle crossings were the primary reference for most analysis, using shuttle initiation as a reference led to similar conclusions (Fig.4). Namely, we found that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%; Types 1b & 2b) post-shuttle activity changes (Fig.4), while only a subset exhibits ramping pre-shuttle activity (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited.

      In Figure 4, it was concluded that ACC neurons respond to action independent of outcome. Since these neurons are active on both correct and incorrect shuttle but not stay trials, they seem to primarily respond to overt movement. If so, the rationale for linking ACC activity and adaptive behavior/ associative learning is not very clear to me. Further analyses are needed to test whether their firing rates correlated with locomotion speed or acceleration/deceleration. On a similar note, to what extent are the action state neurons actually responding to locomotion-related signals? And can ACC activity actually differentiate correct vs. incorrect stays?

      In this study, we highlight two distinct groups of ACC neurons: action-state and action-content neurons. Both groups of neurons tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors, suggesting that their activity is not directly driven by locomotion. Furthermore, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms A→B or B→A shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns. Finally, we analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of neurons (<15%) show speed-correlated activity (Fig.5), suggesting that most ACC neurons do not encode movement-related information. Taken together, these findings support the distinction between ACC activity and locomotion encoding.

      As for the small subset of speed-related neurons, it remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from the nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex (Fig. S2). Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Lastly, given that the ACC neurons display no or limited activity during stay trials, their activity generally does not differentiate correct vs. incorrect stays (Fig.S7). However, ACC activity does show moderate differentiation between room-A vs. room-B stays (Fig.S7).

      Given that a considerable amount of ACC neurons encode 'action content', it is not surprising that by including all neurons the model is able to make accurate predictions in Figure 6. How would the model performance change by removing the content neurons?

      We thank the reviewer for this thoughtful analysis idea. Excluding action-content neurons drastically reduces decoding accuracy (Fig.8), suggesting that they are the main drivers for differentiating rooms AB vs. BA shuttles.

      Moving on to Figure 7. Since Figure 4 showed that ACC neurons respond to movement regardless of outcome, it is somewhat puzzling how ACC activity can be linked to future performance.

      As discussed earlier (point #2), ACC activity does not simply reflect locomotion itself. We interpret the post-shuttle ACC activity as encoding both the preceding shuttle state (shuttle or stay) and shuttle content (rooms AB or BA). Regardless of the outcome (safety or shock), such encoding is essential for cue–action–outcome associative learning, because both positive and negative feedback can drive learning. The level of post-shuttle ACC activity may reflect task engagement, with greater engagement facilitating learning and improving future performance.

      Two mice contributed about 50% of all the recorded cells. How robust are the results when analyzing mouse by mouse?

      We have added further analysis of highlighting the results of each mouse. Although the total number of recorded neurons varied across mice, the major findings were consistent. In every mouse, we observed sustained post-shuttle ACC activity (Fig.S2), and population-level ACC activity reliably decoded shuttle contents (rooms AB vs. BA; Fig.8).

      Lastly, the development of the new discrimination-avoid task is applauded. However, a major missing piece here is to show the importance of ACC in this task and what aspects of this behavior require ACC.

      We appreciate this feedback. We are currently conducting additional experiments to determine whether inhibiting ACC activity during distinct time windows disrupts task learning. We hope to publish a follow-up paper on these findings in the near future.

      Reviewer #2 (Public review):

      Summary:

      The current dataset utilized a 2x2 factorial shuttle-escape task in combination with extracellular single-unit recording in the anterior cingulate cortex (ACC) of mice to determine ACC action coding. The contributions of neocortical signaling to action-outcome learning as assessed by behavioral tasks outside of the prototypical reward versus non-reward or punished vs non-punished is an important and relevant research topic, given that ACC plays a clear role in several human neurological and psychiatric conditions. The authors present useful findings regarding the role of ACC in action monitoring and learning. The core methods themselves - electrophysiology and behavior - are adequate; however, the analyses are incomplete since ruling out alternative explanations for neural activity, such as movement itself, requires substantial control analyses, and details on statistical methods are not clear.

      Strengths:

      (1) The factorial design nicely controls for sensory coding and value coding, since the same stimulus can signal different actions and values.

      (2) The figures are mostly well-presented, labeled, and easy to read.

      (3) Additional analyses, such as the 2.5/7.5s windows and place-field analysis, are nice to see and indicate that the authors were careful in their neural analyses.

      (4) The n-trial + 1 analysis where ACC activity was higher on trials that preceded correct responses is a nice addition, since it shows that ACC activity predicts future behavior, well before it happens.

      (5) The authors identified ACC neurons that fire to shuttle crossings in one direction or to crossings in both directions. This is very clear in the spike rasters and population-scaled color images. While other factors such as place fields, sensory input, and their integration can account for this activity, the authors discuss this and provide additional supplemental analyses.

      Weaknesses:

      (1) The behavioral data could use slightly more characterization, such as separating stay versus shuttle trials.

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1). Additionally, we provide new data from extended training sessions (Fig.S2).

      (2) Some of the neural analyses could use the necessary and sufficient comparisons to strengthen the authors' claims.

      We have now used the necessary and sufficient comparisons where applicable. In the SVM decoding analysis, we show that population ACC activity is sufficient to decode AB or BA shuttles. We also show that excluding action-content, but not other ACC neurons, drastically reduces decoding accuracy, suggesting that these neurons are necessary for the decoding (Fig.8).

      (3) Many of the neural analyses seem to utilize long time windows, not leveraging the very real strength of recording spike times. Specifics on the exact neural activity binning/averaging, tests, classifier validation, and methods for quantification are difficult to find.

      We chose to perform our neural analyses on a longer time scale, given the sustained activity we see in the data. To further justify that decision, we now provide additional results highlighting the sustained activity of ACC neurons in our task (Fig.2; Fig.S2). Additionally, we now provide more specifics of the neural analyses in Methods section.

      (4) The neural analyses seem to suggest that ACC neurons encode one variable or the other, but are there any that multiplex? Given the overwhelming evidence of multiplexing in the ACC a bit more discussion of its presence or absence is warranted.

      This is an interesting point of discussion, and we thank the reviewer for pointing this out. Overall, our results suggest that individual ACC neurons preferentially engage in only one of the proposed functions, rather than multiplexing across them. For example, action-state and action-content ACC neurons primarily engage in action monitoring, but not in decision-making, planning, or outcome tracking. Nevertheless, we cannot rule out the possibility that other ACC neurons, through their distinct connectivity or location in different ACC subregions, engage in other proposed functions. Thus, when considering the ACC as a whole, its function may still be multiplexed.

      Another possible reason we do not see clear multiplexing of neurons may be due to the dynamic nature of our task. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation. Since values are not fixed and change based on context, value-related responses may not be reflected in the ACC in our tasks.

      We have now incorporated the above discussions into our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors record from the ACC during a task in which animals must switch contexts to avoid shock as instructed by a cue. As expected, they find neurons that encode context, with some encoding of actions prior to the context, and encoding of neurons post-action. The primary novelty of the task seems to be dynamically encoding action-outcome in a discrimination-avoidance domain, while this is traditionally done using operant methods. While I'm not sure that this task is all that novel, I can't recall this being applied to the frontal cortex before, and this extends the well-known action/context/post-context encoding of ACC to the discrimination-avoidance domain.

      While the analysis is well done, there are several points that I believe should be elaborated upon. First, I had questions about several details (see point 3 below). Second, I wonder why the authors downplayed the clear action coding of ACC ensembles. Third, I wonder if the purported 'novelty' of the task (which I'm not sure of) and pseudo-debate on ACC's role undermines the real novelty - action/context/outcome encoding of ACC in discrimination-avoidance and early learning.

      Strengths:

      Recording frontal cortical ensembles during this task is particularly novel, and the analyses are sophisticated. The task has the potential to generate elegant comparisons of action and outcome, and the analyses are sophisticated.

      Weaknesses:

      I had some questions that might help me understand this work better.

      (1) I wonder if the field would agree that there is a true 'debate' and 'controversy' about the ACC and conflict monitoring, or if this is a pseudodebate (Line 34). They cite 2 very old papers to support this point. I might reframe this in terms of the frontal cortex studying action-outcome associations in discrimination-avoidance, as the bulk of evidence in rodents comes from overtrained operant behavior, and in humans comes from high-level tasks, and humans are unlikely to get aversive stimuli such as shocks.

      We appreciate this feedback. We have revised the Introduction and Discussion.

      (2) Does the purported novelty of the task undermine the argument? While I don't have an exhaustive knowledge of this behavior, the novelty involves applying this ACC. There are many paradigms where a shock triggers some action that could be antecedents to this task.

      We argue our newly designed discrimination–avoidance task is unique for several reasons. First, it requires animals to discriminate both sensory cues and environment contexts. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation, which reflects a conceptual advance over previous techniques. Furthermore, by removing valence from the cues, this design helps disentangle the ACC’s potential role in value encoding from other cognitive functions.

      Second, this task involves robust, ethologically relevant actions (i.e., shuttles), unlike many established paradigms that rely on less naturalistic behaviors such as saccades or lever presses. We view this as a key distinction from prior approaches, as even previous paradigms that utilize shutting responses or other naturalistic responses, fail to incorporate dynamic integration of cues and contexts.

      Finally, the clear temporal separation between actions and outcomes further helps disentangle the ACC’s roles in action monitoring vs. outcome tracking.

      (3) The lack of details was confusing to me:

      (a) How many total mice? Are the same mice in all analyses? Are the same neurons? Which training day? Is it 4 mice in Figure 3? Five mice in line 382? An accounting of mice should be in the methods. All data points and figures should have the number of neurons and mice clearly indicated, along with a table. Without these details, it is challenging to interpret the findings.

      We are sorry for the confusion. We now provide additional details and clear N numbers for each analysis to improve clarity.

      (b) How many neurons are from which stage of training? In some figures, I see 325, in some ~350, and in S5/S2B, 370. The number of neurons should be clearly indicated in each figure, and perhaps a table.

      All data were obtained from well-trained mice. For some analyses, the N is smaller because certain task sessions contained very few incorrect trials (≤3), which prevented us from examining ACC activity during those trials. We have modified figure legend so that neuron count is clear.

      (c) Were the tetrodes driven deeper each day? The depth should be used as a regressor in all analyses?

      Yes, the tetrodes were driven slightly deeper across task sessions (~80 µm per step; 2–4 depths per mouse). Given limited depth changes, preliminary analyses indicate no clear differences in ACC activity across these recording depths. However, we cannot rule out potential dorsal–ventral subregion differences if recordings were to span larger depth ranges.

      (d) Was is really ACC (Figure 2A)? Some shanks are in M2? All electrodes from all mice need to be plotted as a main figure with the drive length indicated.

      We have now included a supplementary figure showing all recording sites (Fig.S2). It is likely that a small subset of neurons was recorded at the ACC/M2 border area. Unfortunately, we are unable to separate them out due to blind recording design of our tetrode arrays.

      (e) It's not clear which sessions and how many go into which analysis

      We have now specified the number of task sessions for each analysis (see Methods).

      (f) How many correct and incorrect trials (<7?) are there per session?

      We have now specified the number of correct and incorrect trials per session (see Methods).

      (g) Why 'up to 10 shocks' on line 358? What amplitudes were tried? What does scrambled mean?

      We decided to use up to 10 mild shocks per trial because mice do not necessarily shuttle to the safe room after one or even a few shocks during the early stages of training. This design allows mice to efficiently learn the concept of the task (i.e., one room is safe while the other delivers shocks). Each shock was specified in the Methods section as 0.5 mA, 0.1 s. A “scrambled shock” refers to an electric shock delivered through multiple floor bars in a randomized pattern, effectively preventing the animal from avoiding the stimulus.

      (4) Why do the authors downplay pre-action encoding? It is clearly evident in the PETHs, and the classifiers are above chance. It's not surprising that post-shuttle classification is so high because the behavior has occurred. This is most evident in Figure S2B, which likely should be a main figure.

      We did not intend to downplay pre-action encoding. Our analysis shows that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%;Types 1b & 2b) post-shuttle activity changes (Fig.4). Although a subset of ACC neurons exhibits ramping pre-shuttle activity, they represent a much smaller fraction (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited

      (5) The statistics seem inappropriate. A linear mixed effects model accounting for between-mouse variance seems most appropriate. Statistical power or effect size is needed to interpret these results. This is important in analyses like Figure 7C or 6B.

      We appreciate this feedback. We now use appropriate statistics and report effect size.

      (6) Better behavioral details might help readers understand the task. These can be pulled from Figures S2 and S5. This is particularly important in a 'novel' task.

      We now provide more details to help better understand the task and have added new figures (Fig.1; Figs. S1&S2).

      (7) Can the authors put post-action encoding on the same classification accuracy axes as Figure 6B? It'd be useful to compare.

      We appreciate the comment, but we are unsure what clarification is being requested.

      (8) What limitations are there? I can think of several - number of animals, lack of causal manipulations, ACC in rodents and humans.

      We now include discussions on limitation of our study. One caveat of our study is that the discrimination–avoidance task requires weeks of training in mice. By the time they master the task, ACC activity may reflect modified neural circuits. Investigating ACC activity during early phase of learning, such as by introducing a new pair of cues or contexts, could provide further insights into ACC’s role in learning and cognitive processes. Additionally, a limitation of the current study is the lack of evidence for the causal role of post-action ACC activity in complex associative learning. Future investigations using closed-loop strategies to selectively disrupt ACC activity during the post-action phase could help address this question.

      Minor:

      (1) Each PCA analysis needs a scree plot to understand the variance explained.

      We have added a scree plot for each PCA analysis.

      (2) Figure 4C - y and x-axes have the same label?

      We have corrected the y-axis label.

      (3) What bin size do the authors use for machine learning (Not clear from line 416)?

      The bin sizes used were 2.5, 5, 7.5, or 10 sec which have now been discussed in the Methods section.

      (4) Why not just use PCA instead of 'dimension reduction' (of which there are many?)

      We have adjusted the phrasing where appropriate.

      (5) Would a video enhance understanding of the behavior?

      We appreciate this feedback. We now include a few videos to accompany our paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Is Figure 1C sufficiently powered?

      We have now included data from additional mice and updated the figure accordingly.

      (2) Task performance was not plateaued after 10 sessions in Figure 1B. How variable is task performance in the datasets with ephys recordings (session to session, mouse to mouse).

      We have now included additional data from extended training (15 sessions; Fig.S2). Moderate variations across both sessions and mice are observed. Specifically, the total number of correct/incorrect shuttles used for ephys analysis are 19/5, 19/4, 21/5, 20/4 (mouse #1; 4 sessions); 20/7, 23/7, 20/7 (mouse #2; 3 sessions); 19/4, 16/2 (mouse #3; 2 sessions); 26/4, 23/4, 17/6, 25/5 (mouse #4; 4 sessions); 20/5, and 17/4 (mouse #5; 2 sessions), respectively.

      (3) Please quantify the results in Figure 3, for both within individual mice and across mice.

      We have calculated maximum trajectory length within the 3-D space (Fig. 3C).

      (4) What is the effect size in Figure 7C?

      We now report the effect size.

      (5) Please provide more details for spike sorting.

      We have now included more details in the Methods section.

      (6) More detailed cell type or correlation analysis in Figures 4 and 5 may be helpful. For example, if putative regular and fast-spiking neurons were simultaneously recorded, did the FS directly inhibit the RS to give rise to the apparent encoding properties?

      We recorded a small number of putative interneurons (n = 13) from only three mice, which precludes drawing meaningful conclusions, particularly given their heterogeneous responses during discrimination–avoidance tasks. Accordingly, we include only an example interneuron demonstrating discrimination between AB vs. BA shuttles (Fig. S5). Nevertheless, it is evident there are reciprocal monosynaptic connections between putative interneurons and certain pyramidal neurons, as indicated by short-latency (~2 ms) excitatory or inhibitory interactions (Fig. S5). That said, follow up studies with greater Ns are needed to parse out these details

      Reviewer #2 (Recommendations for the authors):

      (1) While I appreciate displaying the success rate for the sake of simplifying behavioral data in Figure 1B, it would be nice to also see these data broken out as correct vs incorrect for stay vs shuttle trials, since it is difficult to determine whether the performance increases are primarily driven by mice improving at stay vs shuttle responses

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1; Fig.S2).

      (2) In Figure 2 the comparison between shuttle and stay is not particularly convincing, since the comparison is also essentially movement vs no movement and place1-->place2 vs place1-->place1. A more appropriate comparison might be action state neurons vs action content neurons during A-->B, B-->A, or both crossings. If it is true that these populations contain this information, then action state neurons should traverse a large component space in both directions, action content neurons only one direction, and so on.

      We agree that the comparison is not ideal due to differences in locomotion. However, it provides valuable information suggesting that the ACC plays a limited role during stay trials, despite these trials involve mental and cognitive processes comparable to shuttle trials. While we appreciate the reviewer’s suggestion, the proposed analysis is not particularly reliable given the relatively small number of simultaneously recorded action-state or action-content neurons.

      (3) I would say the above point applies to Figure 3 as well. I would also note that this reviewer greatly appreciates the rigor of showing ensemble activity in each subject.

      We appreciate this comment. See our response above.

      (4) In Figure 5 do these neurons show the same A-->B vs B-->A firing patterns during correct vs incorrect shuttles? The text describing the data in Figure 4 suggests this should be the case but even from a quick glance it sort of seems like the population dynamics during correct vs incorrect shuttles are not the same. My concern is that averaging neural activity over 5s windows washes out all these dynamics

      Preliminary analysis suggests that these firing patterns apply to both correct and incorrect shuttles. However, the main reason we did not compare correct and incorrect trials is the limited amount of data. In many sessions, there are only a few (≤5) incorrect shuttles, which include both AB or BA shuttles (Fig.1C; Fig.S2), thus lacking the statistical power for a meaningful comparison.

      (5) Some information on classifier validation is required - was this leave-out validation and if so how many trials were left-out vs tested? K-fold, and if so, how many folds? Was the trial order shuffled for each simulation? Classifiers will pick up within-session temporal information. In addition to this classifier accuracy during the different time points should be compared by a non-parametric test, and compared to the 95th percentile of the label-shuffled distribution.

      Yes, we use standard 10-fold cross-validation. We appreciate the suggestion on trial-order shuffling, and implementing this procedure does not change our original conclusion. Additionally, we have applied a non-parametric test.

      (6) How exactly were neurons classified as content vs state? Was it the average activity during the 5s following the shuttle? If this is stated I could not really find it easily so I might suggest clarifying.

      We now use a new method for classification of the two neuron types (Fig.7). We have included detailed methods in the revised manuscript.

      (7) Movement drives cortical neuron activity more than anything else I have ever seen. Really, more than anything else, it would be nice to demonstrate that it is not movement alone or movement multiplexed with place/sensory information/direction driving these responses.

      We have analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of ACC neurons (<15%) show speed-correlated activity (Fig.5). It remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex. Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Furthermore, we identify two distinct groups of ACC neurons: <iaction-state and action-content neurons, both of which tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors. This prolonged activation in the absence of movement suggests that their activity is not directly driven by locomotion. Moreover, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms AB or BA shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns.

      (8) In addition to the above, the place-field analysis in Supplemental Figure 5 only shows 4 neurons. Was the whole population analyzed? Is it possible to decode place from the population during the ITI? The data in this figure sort of look exactly like place fields - many cortical neurons and also some hippocampal neurons have more than 1 place field

      We have now provided additional place-field analysis. A comparison with hippocampal CA1 neurons (recorded during the same task) suggests that ACC neurons encode limited spatial information.

      (9) "a simple Pavlovian association strategy is unlikely to be sufficient for learning the task" ... is Pavlovian occasion setting not a simple association? Tones and contexts both readily act as Pavlovian occasion setters. Similarly positive/negative patterning might also explain how the task is learned.

      We appreciate this comment and have revised the sentence accordingly. It is possible that animals use multiple strategies to learn and perform the task effectively. In the early stages, animals may rely more heavily on sensory–spatial integration, whereas in later stages, sensory- or location-related Pavlovian associative strategies may contribute to performance, particularly when animals begin to show place preferences during inter-trial intervals.

      (10) I might suggest softening this language and others like it. For example, 2x2 factorial designs are not really novel.

      We have revised the language used to describe the task.

      (11) Some of the color-scale bars and figures do not have labels. For example, Supplementary Figure 3, Supplementary Figure 5. Please add labels.

      We have added the missing labels to all color bars.

      Reviewer #3 (Recommendations for the authors):

      (1) Some relevant papers that should be cited:

      https://doi.org/10.1523/JNEUROSCI.4450-08.2008

      10.1016/j.neuron.2018.11.016

      https://doi.org/10.1016/j.jphysparis.2014.12.001

      We appreciate these suggestions.

      (2) Where can we download the data and code?

      We will upload the essential data and MATLAB code to GitHub to accompany the publication of the final version of this paper.

  4. Feb 2026
    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Weaknesses:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have followed the referee advice,repeating the experiments with the dominant negative UAS-cyc<sup>DN</sup>. They nicely confirm our conclusions: the abolition of the cellular clock in LNd neurons rule out the rhythmicity of oviposition. The results are presented in Fig. 3 of the new manuscript, panels H to N. We thank the reviewer for this suggestion that has definitely improved our paper, since it allows us to confirm our result using both a different driver and a different UAS sequence. In addition, we included the required GAL4 controls, which can be found in Panels E, L of the figure as well as average egglaying profiles for all genotypes involved (Panels B, D, F, I, K and M). Regarding the MB122Bsplit-Gal4>UAS-per<sup>RNAi</sup> experiment, we moved it to a supplementary figure (Figure 3S1). The paragraph where the new Figure 3 is discussed has been modified accordingly.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artifacts introduced by the 24h moving average used.

      The method used for the assessment of rhythmicity is now more fully explained and tested in the supplementary material. In particular, the issue of trend removal is treated in the second section of the SM, and the absence of "artifacts" (interpreted as the possibility of deciding that a signal is rhythmic when it is not, or vice versa) shown in figs. S3 to S5.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      The choice of sampling every 4 hours is not due to a limitation imposed by the device used. In fact the device can be programmed to move at whatever times are desired. As mentioned in the Material and Methods section, "more frequent sampling gives rise to less consistent rhythmic patterns", because the number of eggs sampled at each time slot become too small. In particular, we have tested sampling at intervals of 2 hours, and we have observed that this doubles the work performed by the experimenter but does not lead to an improvement in the assessment of rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      As stressed in the paper, and in the new Supplementary Material, the individual egg records are very noisy, which in general precludes the extraction of any information about the underlying period and phase. The workaround we (and others, e.g. Howlader et al. 2006) have used is analyzing average egg records for each genotype. Even though this implies assuming the same period and phase for all individuals, we have observed, using experiments with synthetic data, that small variations in individual periods (of the same amount as those present in real experiments where the period of some flies can be assessed individually) still allow us to use our method to decide if the genotype is rhythmic or not. This issue is discussed at length in the new Supplementary Material. There we also discuss an experiment with real flies, showing the individual records, and the corresponding periodograms, for each fly, for a rhythmic (Fig. S14) and an arrhythmic genotype (Fig. S17).

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      We have added the individual periodograms of the arrhythmic lines to the Supplementary material (Figs. 3S2, 3S5 and panel G of Fig. 3S1), where they can be compared with their respective controls (Figs 3S3, 3S4, 3S6, 3S7 and panel F of Fig. 3S1).

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that the results may be biased for 'the best egg layers'. We remark however, that the flies that have been left out lay very few eggs, some of them even laying no eggs on a whole day. For these flies it is difficult to understand how one can even speak of egg laying rhythmicity (let alone how one can experimentally assess it). Thus, we think it might be misleading to speak of results as "representative of the whole population". Furthermore, it is even possible that the very concept of egg laying rhythmicity makes little sense if flies do not lay enough eggs.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      In general, we have checked that there are no "outliers", in the sense of flies that lay many more eggs than the others in the experiment. But maybe the reviewer is referring to the possibility that a few rhythmic flies make the average rhythmic. This issue is addressed in the supplementary material, at the end of section "Example of rhythmicity assessment for a synthetic experiment". In short, we found that eliminating some of the most rhythmic flies from a rhythmic population makes the average a bit less rhythmic, but still significantly so. Conversely, if these flies are transferred to an arrhythmic population, the average is still non rhythmic.

      Regarding "the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity", we stress that we have not performed a selection of flies for the averages. All of the flies tested are included in the average, independently of their individual rhythmicity, provided only that they lay enough eggs.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We are aware that in the studies of the rhythmicity of locomotor activity the presence of two significant peaks is usually interpreted as a “complex rhythm”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two non-significant peaks could also correspond to the periods of two different subpopulations of individuals. However, a close examination of the individual periodograms, now provided as Supplementary Figures 3S2 to 3S9, does not show any convincing evidence of any of these two possibilities.

      Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles and also few points per cycle. In the supplemenatry material we show that this can indeed happen. Consider, for example, periodograms 2 and 4 in Fig. S12 of the SM. Even though both of them display two non significant peaks, these periodograms correspond to two synthetic time series that are completely arrhythmic.

      We have added to the manuscript a paragraph discussing the issue of possible bimodality (next to last paragraph in subsection "The molecular clock in Cry+ LNd neurons is necessary for rhythmic egg-laying").

      Wider context:

      The study of the neural basis of oviposition rhythms in Drosophila melanogaster can serve as a model for the analogous mechanisms in other animals. In particular, research in this area can have wider implications for the management of insects with societal impact such as pests, disease vectors, and pollinators. One key aspect of D. melanogaster oviposition that is not addressed here is its strong social modulation (see Bailly et al.. Curr Biol 33:2865-2877.e4. doi:10.1016/j.cub.2023.05.074). It is plausible that most natural oviposition events do not involve isolated individuals, but rather groups of flies. As oviposition is encouraged by aggregation pheromones (e.g., Dumenil et al., J Chem Ecol 2016 https://link.springer.com/article/10.1007/s10886-016-0681-3) its propensity changes upon the pre-conditioning of the oviposition substrates, which is a complication in assays of oviposition rhythms that periodically move the flies to fresh substrate.

      We agree that social modulation can be important for oviposition, as has been shown in the paper cited by the reviewer. But we think that, in order to understand the contribution of social modulation to oviposition, it is important to know, as a reference for comparisons, what the flies do when they are isolated. Our aim in this work has been to provide such a reference.

      Recommendations for the authors:

      (1) The weaknesses identified in the Public review could be addressed as follows: etc.

      We have followed the suggestions of the editor and addressed each of the weaknesses mentioned (see details above).

      (2) Could the authors comment on their choice of using individual flies for their assay rather than (small) groups of flies? Is it possible that their assay would produce less noisy results with the latter?

      First we want to emphasize that our aim here was to assess the presence of individual rhythmicity, free from any external influences, whether arising from environmental external cues (such as light or temperature changes) or by social interactions (with other females or males). However, we were also curious about the behavior when males were put in the same chamber with each female. We performed a few tests and the results were very similar to what we obtained with single females.

      (3) Minor points:

      (a) Line 57-58 - "around 24 h and a peak near night onset (Manjunatha et al., 2008). Egglaying rhythmicity is temperature-compensated and remains invariant despite the nutritional state": Rephrase to something simpler like temperature and nutrition compensated.

      Corrected.

      (b) Line 56-57 - "The circadian nature of this behavior was revealed by its persistence under DD with a period around 24 h and a peak near night onset (Manjunatha et al., 2008)." A better reference here would be to Sheeba et al, 2001 for preliminary investigations into the egg-laying rhythms of individual flies and McCabe and Birley, 1998 for groups of flies under LD12:12 and DD.

      Suggestion accepted.

      (c) Line 65-67 - "We determined..... molecular clock in the entire clock network reduced the LNv did not." This suggests that it was unknown until now that LNv does not have a role, whereas Howlader et al 2006 already suggested that. The reader becomes aware of this at a later part of the manuscript. Please revise.

      This has been revised, and the citation to Howlader et al 2006 added to the new sentence.

      (d) Line 67 - "impairing the molecular clock in the entire clock network reduced the circadian rhythm of.."; saying "Reduced the power of the circadian rhythm" might be better phrasing."

      Suggestion accepted.

      (e) Line 72 - using the Janelia hemibrain dataset.

      Corrected

      (f) Line 72 typo "ussing", should be 'using'.

      Corrected.

      (g) Line 94: why is the periodic signal the same for all on the first day of DD?

      It is well known that in LD conditions activity is driven by the environmental light-dark cycle, which entrains the endogenous circadian clock of all flies. Even after the transition to DD, the effects of this entrainment persist for a few days, allowing the individual rhythmic patterns set by the light-dark cycle to remain synchronized for at least a few cycles. We are assuming that the same happens with oviposition. A sentence has been added explaining this (beginning of third paragraph of subsection "Egg-laying is rhythmic when registered with a semiautomated egg collection device").

      (h) Figure 1A-D, Were all flies included or only rhythmic flies? Please make this clear. How do you distinguish rhythmic and arrhythmic flies in Figure 1E? Their representative individual plots of egg number graphs are required. Why was the number of flies under DD decreased from 20 to 18?

      Throughout the paper, the analysis of average rhythmicity has been performed including all flies, since we postulate that even flies that individually can be classified as non rhythmic have a rhythm that is corrupted by noise, and that this noise can be partially subtracted by performing an average. The explanation of the characterization of rhythmic and arrhythmic individuals is in the Methods section, under the Data Analysis subsection. This is now fully developed in the Supplementary material, where the individual plots for some of the genotypes are included.

      Regarding the question of the number of flies having "decreased from 20 to 18?", there is a misunderstanding here. The results depicted in Figure 1, and in particular in panel E, correspond to two different experiments: one performed only in LD (7 days, n=20), and a second one performed for 5 days in DD, with one previous day in LD (n=18).

      (i) Figure E and K, Are n=20, 18, and n=30, 22 the total numbers of flies including both rhythmic and nonrhythmic? If so, it would be better to put them in the column, not in the rhythmic column.

      The figure has been corrected.

      (j) Line 107-108, please provide a citation for this statement.

      We have added two references: Shindey et al. 2016, and Deppisch et al. 2022.

      (k) Figure 1, 2, etc., please write a peak value inside the periodogram graph. This makes comparison easier.

      The peak values have been added in all Figures.

      (l) Line 184-185, Figure 2F, tau appears shorter in Clk4.1>perRNAi flies than in control, which suggests that DNp1 may play a role?

      As explained in the Supplementary Material, the particularities of oviposition records (discrete values, noise, few samples per period, etc.) preclude an accurate determination of the period if the record is considered as rhythmic. In particular, Fig. S4 shows that differences of 1 hour between the real and the estimated periods are not unusual.

      (m) Figure 4. Why are 2 controls shown? Please explain. Are they the same strains?

      The two controls shown are the UAS control and the GAL4 control. This information has now been added to the figure.

      (n) Line 314 'that' should be 'than'?

      Corrected.

      (o) Line 73-74 - Phrasing is not clear in: "LNds and oviposition neurons, consisting with, the essential role of LNds neurons in the control of this behavior.""

      Corrected.

      (p) Line 81-84 - "the experiments particularly demanding and labor-intensive. In this approach, eggs are typically collected every 4 hours (sometimes also every 2 hours), which usually implies transferring the fly to a new vial or extracting the food with the eggs and replacing it with fresh food in the same vial (McCabe and Birley, 1998; Menon et al., 2014)." McCabe and Birley had an automated egg collection device designed for groups of flies, which sampled eggs laid every hour for 6 days. Please remove this reference in this context

      Reference removed.

      (q) Line 91-92 - "The assessment of oviposition rhythmicity is challenging because the decision of laying an egg relies on many different internal and external factors making this behavior very noisy." This sentence makes it appear that 'assessment' is the limitation. Even locomotor activity is governed by many internal and external factors, yet we can obtain very robust rhythms. The sentence that follows is also not easy to digest. Can the authors frame the idea better?

      We have rewritten the corresponding paragraph in order to make it more clear (second paragraph of the Results section). Additionally, the Supplementary Material contains now a more detailed explanation and analysis of the method used.

      (r) Line 104-107 - rhythmic (with a period close to 24 h, Figure 1F) although the average egg record is strongly rhythmic with a period around 24 h (Figure 1B). Under DD condition, individual rhythmicity percentages are the same as in LD (Figure 1E) and their average record is also very rhythmic with a period of 24 h (Figure 1D). 'Strongly rhythmic' and 'very rhythmic' are less indicative of what is happening with the oviposition rhythm and can be phrased as robust instead, with a focus on their power measured.

      We have accepted the suggestion.

      (s) Line 108-110 - "Thus, egg-laying displays a much larger variability than locomotor activity, compounding the difficulty of observing the influence of the circadian clock on this behavior." The section discussed here does not illustrate the variability in egg-laying as much as the lack of robustness of the rhythm. The variation in rhythmicity going from CS flies (~70% rhythmic) to yw flies (~50% rhythmic) showcases the variability in this rhythm and how it is difficult to observe when compared to locomotor rhythms, which are usually consistently >90% rhythmic across multiple genotypes. These lines can be placed after the discussion about yw and perS flies. Moreover, previous studies using individual flies have reported that egg-laying rhythm is more variable than others Figure 1, Sheeba et al 2001.

      We have accepted the suggestion, replacing "Thus, egg-laying displays a much larger variability than locomotor activity..." by "This shows that, at the individual level, egg-laying is much less robust than locomotor activity ..."

      (t) Figure 1. Genotype notation within the figure panels is not consistent with the accepted / conventional notation or with the main text or legend notations throughout the manuscript.

      We are sorry for this mistake. We have corrected the genotype names in Figures and text in order to make notation consistent across the paper.

      (u) Supplementary Figure 1 Legend. Error in upper right corner? Not left corner? The photo does not clearly show the apparatus. The authors may wish to consider clearer images and more details about the apparatus including details of the 3D printing of the device and perhaps even include a short video where the motor moves the flies to a new chamber (This is only a suggestion to advertise the apparatus, not related to the review of the manuscript). They could also provide information about what fraction of females survived till the end of each trial when 21 flies were examined with 4-hour sampling across 4-5 cycles.

      In general, more than 80% of the females are alive at the end of a one week oviposition experiment. We have added this information in the Methods section at the end of the corresponding subsection ("Automated egg collection device"). Regarding the eggcollection device, we have replaced the photographs in what is now Supplementary Figure 1S1, and a short supplementary movie showing its operation.

      (v) The results depicted in Figure 2B are that of averaged time series. Hence the reader does not know 'the fact' that knocked-down animals are not completely rhythmic. Is the "not completely arrhythmic" in reference to flies with a power > 0.2 (weakly rhythmic) in their egg-laying rhythm or to the presence of ~40% of male flies (Supplementary Table 1) with a locomotor rhythm after perRNAi silencing of most of their clock neurons? This is confusing because no intermediate category of flies is discussed in Figure 2. Please edit for clarity.

      We were referring to the rhythmicity of the genotype, not of the individuals. We have rewritten the corresponding paragraph in order to make it clearer (last paragraph of the first subsection of the Results section).

      (w) Line 173 - ablation or electrically silencing all PDF+ neurons (Howlader et al., 2006). There were no experiments carried out using electrical silencing of PDF+ neurons in the referenced paper.

      We are sorry for this mistake. This has been corrected (we have deleted the mention to electrical silencing).

      (x) Line 173 - Shortening of period by nearly 3 hours cannot be considered minor.

      We agree, and we have deleted the word "minor".

      (y) Line 332-333 - "We also disrupted the molecular clock (or electrically silenced) in PDFexpressing neurons as well as in the DN1p group with no apparent effect on egg-laying rhythms". There was period shortening observed for pdf GAL4 > perRNAi manipulation so there was an effect on the egg-laying rhythm. Additionally, perRNAi based silencing does not electrically silence PDF neurons as the kir 2.1 was expressed only using Clk4.1 GAL4 in the Dn1ps. This line should be rewritten.

      We have rewritten the paragraph mentioned (third paragraph of the Discussion) in order to make it more accurate.

      (4) Page 22 - Data Analysis

      Since the number of eggs laid by a mated female tend to show a downward trend, we proceeded as follows, in order to detrend the data (see the Supplementary Material for further details). First, a moving average of the data is performed, with a 6 point window, and a new time series T is obtained. In principle, T is a good approximation to the trend of the data. Then, a new, detrended, time series D is generated by pointwise dividing the two series (i.e. D(i)=E(i)/T(i), where i indexes the points of each series)." Can the authors provide a reference for this method of detrending? Smoothing can frequently introduce artifacts in the data and give incorrect period estimates. Additionally, the trend visible in the data, especially in Figure 1, suggests a linear decay that can be easily subtracted. Also, there is no discussion of detrending in the Supplementary material attached.

      We are sorry for the confusion with the Supplementary materials. The method used for subtracting both noise and trend from the data is now fully explained in the new Supplementary Material. All the issues raised by the reviewer in this comment have been addressed there.

      (5) Figure by figure

      Page - Type (Figure or text) - Comment

      (a) Page 6 Figure 1C There is remarkable phase coherence seen in the average egg laying time series for CS flies 5 days into DD and as the authors note in Lines 94-95 in the text "Under light-dark (LD) conditions, or in the first days of DD, it can be that the periodic signal is the same for all flies". Since this observation is crucial to constructing the figures seen later in the paper, a note should be made about why this rhythm could persist across flies, so deep into DD.

      As mentioned above, we have added a couple of lines explaining why we think that the assumption of a synchronized periodic signal is reasonable, at least during the first cycles (second paragraph of the first subsection of section Results).

      (b) Figure 1 G The effect of period/phase decoherence seems to be showing up here in the average profile for yw flies as they seem to completely dampen out after 2 days in DD and yet have a 24-hour rhythm in the averaged periodogram. The authors should make a note here if the LS periodogram is over-representing the periodicity of the first few days in DD or if comparing the first 3 vs. the last 3 days in DD gives different results.

      The dampening observed in average oviposition records is a product of the dampening of the oviposition records, which is well known phenomenon, probably caused by the depletion of sperm in the female spermatheque. One of the aims of the method used in the paper was to avoid the bias introduced by this dampening, by means of a detrending procedure. This is explained in the Materials an Methods, and now full details are given in the new Supplementary Materials.

      (c) Figure 1E, K Is this data pooled across 2-3 experiments, as discussed in lines 500-01 under 'Statistical Analysis'? Also, what test is being performed to check for differences between proportions here, seeing as there are no error bars to denote error around a mean value and no other viable tests mentioned in Statistical Analysis?

      We are sorry for this omission. For the comparison of proportions we used the 'N-1' Chisquared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (d) Figure 1 F, L Can the total number of weakly and strongly rhythmic values be indicated in the scatter plot?

      Corrected.

      (e) Figure 1F, L (legend) Is the Chi-squared test being performed on the proportion values of Figure 1(E, K) or for Figure 1(F, L)?"

      The chi-squared test mentioned was used for Fig1 F-L. As explained above, for the comparison of proportions we used 'N-1' Chi-squared test. This has now been added to the legend of the figure

      (f) Page 8 Figure 2B Seeing as individual flies with a LS periodogram power < 0.2 are considered weakly rhythmic in Figure 1 F, L can Clk856 > perRNAi flies on average also be considered weakly rhythmic, as the peak in the periodogram is above 0.3?

      We prefer to use the weakly rhythmic class only for individual flies. Nevertheless, we agree that this periodogram shows that the genotype analyzed is not completely arrhythmic, and that this might be due to some remaining individual rhythmicity. As mentioned above, we have rewritten the last paragraph of the first subsection of section Results in order to discuss this.

      (g) Figure 2D Can the authors comment on why there is a shorter period rhythm when PDF neurons have a dysfunctional clock, whereas previous evidence (Howlader et al., 2004) suggested that these neurons play no role in egg-laying rhythm? They should also refer to McCabe and Birley, 1998 to see if their results (where they observed a shorter period of ~19h with groups of per0 flies), might be of interest in their interpretations.

      We have added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion. In a nutshell, even though Howlader et al did not find a shortening when PDF neurons are ablated, they did find it in pdf01 flies.

      (h) Figure 2 F, H As the authors mention in their Discussion on Page 16, lines 340-45, the manipulation of DN1p neurons might abolish the circadian rhythm in oogenesis as reported by Zhang et al, which is why they looked at this circuit driven by Clk4.1 neurons and comment that "The persistence of the rhythm of oviposition implies that it is not based on the availability of eggs but is instead an intrinsic property of the motor program". However, no change in fecundity is reported for either kir2.1 or perRNAi-based manipulations of these neurons, to help the reader understand if egg availability (at the level of egg formation) is playing any role in the downstream (and seemingly independent) act of egg laying. The authors should report if they see any change in total fecundity for either set of flies w.r.t their respective controls. Also, is the reduction in power seen with electrical silencing vs perRNAi expression of any relevance? Does the percentage of rhythmic flies change between these two manipulations?

      In the line mentioned by the reviewer what we meant is that our results show that the rhythm of oviposition does not seem to be based in the rhythmic production of oocytes, which is not necessarily connected with the total number of eggs produced. We have modified the corresponding line in the paper, in order to avoid this misunderstanding. Regarding the "reduction in power" mentioned, it must be stressed that, in general, the height of the peak is correlated with the fraction of rhythmic individuals. The problem is that this fraction is a much more noisy output, and that is the reason why we have chosen to work with periodograms of averages.

      (i) Figure 2 E and G, a loss of rhythmicity could also be due to a decrease in fecundity in the experimental lines. Since the number of eggs laid for each genotype is already known, can the authors show statistically relevant comparisons between the experimental lines and their respective controls? In this vein, can the averaged time series profiles also be provided for all the genotypes tested (as seen previously in Figure 1 A, C, G, I), perhaps in the supplementary?

      We did not focus on fecundity in the present work. However, our observations do not seem to show any definite relationship with rhythmicity. We plan to address the issue of fecundity more systematically in a future work. The averaged time series profiles have now been added to the figure.

      (j) Scatter plots showing the average period and SEM as seen in Figure 1 (F, L) would help in understanding if these manipulations have any effect on variation in the period of the egg-laying rhythm across flies. Particularly for pdf GAL4 > perRNAi flies which have a net shorter period, (but this might vary across the 34 flies tested).

      We have added a Supplementary Figure (2S1) that shows that the shortening of oviposition period can be also observed at the individual level. We have also added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion.

      (k) Page 11 Figure 3B Does the presence of two peaks in the LS periodogram at a power > 0.2 indicate the presence of weakly rhythmic flies with both a short(20h) and a long(~27h) period component or either one? The short-period peak is nearly at p < 0.05 level of significance. So then, do most of the flies in MB122B GAL4 > perRNAi line show a weakly rhythmic shorter period?

      (l) Figure 3D A similar peak is observed again at 20h (LS power > 0.2 and nearly at p < 0.05 significance level again) and a different longer one at (~30h) though this one is almost near 0.2 on the power scale. Given the consistency of this feature in both LNd manipulations, the authors should comment on whether this is driven by variation in periods detected or the presence of complex rhythms (splitting or change in period) in the oviposition time series for these lines.

      (m) Figure 3 General scatter plots showing average period {plus minus} SEM could help explain the bimodality seen in the periodograms. Additionally indicating just how many flies are weakly rhythmic vs. strongly rhythmic can also help to illustrate how important the CRY+ LnDs are to the oviposition rhythm's stability.

      For these three comments (k, l and m), we note that the issue of bimodality has been addressed above, in our response to Weakness 9.

      (o) Figure 4B Same as comments under Figure 1, what is the statistical test done to compare the proportions for these three genotypes?

      As mentioned above, for the comparison of proportions we used the 'N-1' Chi-squared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (p) Figure 4C Are all flies significantly rhythmic? The authors should also provide an averaged LS periodogram measure for each genotype, to help illustrate the difference in power between activity-rest and egg-laying rhythms.

      Yes, the points represent periods of (significantly) rhythmic flies. This has been added to the caption, to avoid misunderstandings. The differences that arise when assessing rhythmicity in activity records vs. egg-laying records is addressed at length in the Supplementary Material (see e.g. Fig S1).

      (q) Page 15 Figure 5 - general As the authors discuss the possible contribution of DN1ps to evening activity and control over oogenesis rhythm, investigating the connections of the few that are characterized in the connectome (or lack thereof) with the Oviposition neurons, can help illustrate the distinct role they play in the female Drosophila's reproductive rhythm.

      This information was in the text and the Supplementary Tables. Lines 273-275 of the old manuscript read: "The full results are displayed in Supplementary Tables 2 and Table 3, but in short, we found that whereas there are no connections between LNv or DN1 neurons and oviposition neurons..."

      (r) Minor: The dark shading of the circles depicting some of the clusters makes it difficult to read. Consider changing the colors or moving the names outside the circles.

      Figure corrected.

      (s) Line 38: The estimated number of clock neurons has been revised recently (https://www.biorxiv.org/content/10.1101/2023.09.11.557222v2.article-info).

      Thank you for the reference. We have corrected the number of clock neurons in the Introduction of the new manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using single-unit recording in 4 regions of non-human primate brains, the authors tested whether these regions encode computational variables related to model-based and model-free reinforcement learning strategies. While some of the variables seem to be encoded by all regions, there is clear evidence for stronger encoding of model-based information in the anterior cingulate cortex and caudate.

      Strengths:

      The analyses are thorough, the writing is clear, and the work is well-motivated by prior theory and empirical studies.

      Weaknesses:

      My comments here are quite minor.

      The correlation between transition and reward coefficients is interesting, but I'm a little worried that this might be an artifact. I suspect that reward probability is higher after common transitions, due to the fact that animals are choosing actions they think will lead to higher reward. This suggests that the coefficients might be inevitably correlated by virtue of the task design and the fact that all regions are sensitive to reward. Can the authors rule out this possibility (e.g., by simulation)?

      We fully agree with the reviewer that the task design has in-built correlations between transition and reward, and thus the correlation between neural selectivity for feedback and transition (Figure 3E) may be due to the different reward expectation after common or rare transitions. We did try to make this point in the manuscript:

      This suggests that the brain treats being diverted away from your current objective equivalent to losing reward, which is sensible as the subject would normally expect lower rewards on rare trials if their reward-seeking behaviour was efficient.

      We’ve now updated the wording of this statement to try and better make this point and avoid confusion that any non-reward-related encoding is involved:

      “As the reward expectation will be higher on common compared to rare trials, this demonstrates that the brain encodes being diverted to an area with a lower reward expectation equivalent to actually receiving a low reward (and vice versa).”

      We have also adjusted the significance test of this correlation to use a circular permutation test that accounts for correlations between the regressors. This test still found there to be significant correlation in all areas.

      We have described this new permutation test in Methods:

      “For comparing correlations between weights for different features (i.e., between transition and reward coding, Figure 3E), the null distribution of correlations observed in circularly shifted data was compared to the correlation seen in the actual data. This accounts for any correlations between features that existed in the task by preserving the structure of the design matrices.”

      And updated the text in Results accordingly:

      “All regions, but particularly ACC, encoded a common transition (at the time of transition) similar to a high reward (at the time of feedback), as there was a positive correlation between the coefficients for reward and transition (the transition parameter was signed such that common and rare transitions were equivalent to high and low rewards, respectively) (ACC r=0.4963, DLPFC r=0.3273, caudate r=0.4712, putamen, r=0.5052; all p<0.002 except DLPFC where p=0.006, circular permutation test; Figure 3E, S5).”

      The explore/exploit section seems somewhat randomly tacked on. Is this really relevant? If yes, then I think it needs to be integrated more coherently.

      We thank the reviewer for this comment. We agree that the motivation for the explore/exploit analysis was not sufficiently clear in the original version.

      Our aim was not to introduce this as a separate or tangential effect, but rather to highlight how the task’s reward structure (with outcome levels stable for 5–9 trials) naturally created alternating periods favoring exploitation of a known high-value option versus exploration when outcomes changed. This feature of the task is tightly linked to MB-RL computations, as it requires integration of state-transition knowledge and updating across trials.

      Importantly, we show previously in the manuscript that ACC encoded state-transition structure (i.e., common versus rare transition) and MB-value estimates (at choice epoch). However, here we aimed to highlight that the same region also modulated choice encoding as a function of whether the subject was in an exploratory or exploitative regime – by knowing another feature of the task that relies on state-transition and outcome. We have revised this section to better integrate it into the main logic of the paper:

      “In our task, the outcome level (high, medium, low) of each second-stage stimulus remained the same for 5-9 trials before potentially changing. This design naturally created periods where subjects could ‘exploit’ the same Choice 1 to maximize reward for several trials; and other periods where they had to ‘explore’ different second-stage stimuli to optimize reward (as contingencies shifted). In classical MB-RL, the transition between reward states can be learned by keeping counts of observed transitions from a current state-action pair to a subsequent state, yielding a maximum-likelihood estimate of the environment’s dynamics [42]. In fact, knowledge about the reward contingency schedule could support decision-making in both exploitation – by enabling efficient choice when rewards are stable; and exploration – by guiding alternative behaviour most likely to yield improved outcomes (this is different from MF learning, where exploration is more random since the agent lacks explicit state-transition knowledge).

      We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F). Therefore, exploit behaviour specifically upregulated relevant task parameters that were worth remembering across trials.”

      Reviewer #2 (Public review):

      Summary:

      The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

      Strengths:

      (1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

      (2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

      (3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

      (4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

      (5) They implemented control regressors to account for neural drift and temporal autocorrelation.

      (6) The authors showed evidence for three main findings:

      (a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial. b) ACC was the only area to represent both MB and MF value representations.

      (c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

      (7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

      (8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

      Weaknesses:

      (1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

      We agree with the reviewer that the present study does not establish causal relationships, and we do not claim otherwise in the manuscript. Our work was designed as a comprehensive characterization of neural activity across ACC, DLPFC, caudate, and putamen during reward-seeking decision-making. By systematically comparing MB- and MF- RL signals across these regions, we provide new insights into the division of labor and cooperative interactions within cortico-striatal networks.

      While causal manipulations (e.g., lesions, inactivations, stimulation) are indeed required to directly establish necessity or sufficiency, correlational studies such as ours play a crucial role in identifying where and how computationally relevant signals are represented. Importantly, our findings align with and extend prior causal work, for example showing that ACC and striatal lesions disrupt MB control. Thus, our study contributes a detailed functional mapping of MB and MF RL encoding across multiple nodes of this circuit, which serves as an important foundation for future causal investigations (e.g., using transcranial ultrasound stimulation).

      (2) There is a heavy emphasis on ACC versus other areas, but it is unclear how much of this signal drives behavior relative to the caudate.

      We appreciate the reviewer's observation regarding this matter. Our intention was not to place a heavy emphasis on ACC, rather this came naturally from the data. The ACC demonstrated considerably more robust and enduring neural activity compared to other brain regions – for instance, reward-related signals in the ACC continued well beyond individual trials (Fig. 2A-B), and encoding of state transitions remained active from the initial transition through to the feedback phase (Fig. 3A-B). By comparison, distinctions among other regions were less pronounced, which naturally resulted in the ACC receiving greater attention in our analytical findings.

      We acknowledge that the caudate plays an essential and complementary role in driving behavior, and we believe that this is emphasized in the two key subsections of our “Results”. First, caudate neurons encoded model-based choice values (Fig. 4A, 4C) and uniquely remapped these values following rare transitions (Fig. 5), reflecting flexible adjustment of action values. Second, decoding analyses showed that both ACC and caudate populations predicted first-stage choices (Fig. 6C), linking their activity directly to behavioral decisions. In the Discussion section, we also highlight that “the distinctive caudate signal of updating (flipping) the value estimates of the currently experienced option on rare trials” goes beyond a “general temporal-difference RPE” and rather supports “the role of caudate in MB valuation”.

      (3) The role of the putamen is somewhat underexplored here.

      Our analyses were conducted in an identical manner across all four recorded regions (ACC, DLPFC, caudate, and putamen), and we consistently reported the results for putamen alongside the others. For example, in the Results section we describe how “both caudate and putamen encoded the reward from the previous trial negatively during the feedback period of the current trial” (Fig. 2F-G), and that “all regions had a significant population of neurons that encoded MB-, but not MF-, derived value” including putamen (Fig. 4F). Similarly, we show that putamen, like caudate, encoded a dopamine-like RPE signal at feedback (“both caudate and putamen neurons clearly responded at feedback with the parametric features of a dopamine-like RPE”; Discussion). These findings align with previous work linking the putamen to MF learning and are discussed explicitly in the context of MF-MB dissociations. We therefore believe that the putamen was not underexplored, but rather that its contribution was more circumscribed relative to ACC and caudate because the signals observed were quantitatively weaker and less distinctive for MB computations.

      (4) The authors mention the monkeys were overtrained before recording, which might have led to a bias in the MB versus MF strategy.

      We agree that extensive training can influence the balance between MB and MF in choice behaviour and neuronal responses.

      In a previous comprehensive behavioral analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology - ref. 36, Figure S6B) we showed that both MB and MF strategies contributed to behavior, with MB dominance stable across weeks of testing – supporting that overtraining did not eliminate MF influences (but rather stabilized a mixed strategy with robust MB contributions).

      In the same manuscript, we have also: i) cautioned the readers when comparing our results to data from the original human studies; ii) acknowledged that our extensive training cannot address earlier phases of learning in which sensitivity to the task structure is first acquired; and iii) also provided task-related reasons for such MB dominance – as training made the transition structure well learned (making MB computationally less costly and faster to implement) and the non-stationary outcomes favored the flexibility of MB strategies.

      In the present manuscript, we also have acknowledged that overtraining may have shifted neural signals toward stronger MB representations, or alternatively enabled more sophisticated task representations:

      “On the other hand, MF-based estimates were neither as striking nor as specific to striatal regions as expected and observed in previous studies [18]. The monkeys were extensively trained on the task before recordings commenced, which may have caused a shift towards both MB behaviour and MB value representation within the striatum. Alternatively, this training may have allowed more sophisticated representations to occur, such as using latent states to expand the task space [54].”

      Importantly, we strongly believe that this possibility does not detract from our main finding that both MB and MF signals were present across regions, with ACC showing the strongest multiplexing of the two.

      (5) The GLM3 model combines MB and MF value estimates but does not clearly mention how hyperparameters were optimized to prevent overfitting. While the hybrid model explains behavior well, it does not clarify whether MB/MF weighting changes dynamically over time.

      We appreciate this comment and would like to note that, for completeness, we have on several occasions directed the reader to our prior behavioural analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology, ref 36). In that work, we provide a full and detailed description of both the task and the computational modeling approach (see particularly the “Model fitting procedures” section). Furthermore, our model-fitting was grounded in the MF/MB RL framework used in the original human two-step study (Daw et al., 2011); and the fitting procedures also followed previous studies (Huys et al., 2011).

      Hyperparameters – including the MB/MF weighting parameter (ω) - were estimated using maximum likelihood under two complementary approaches and with priors providing regularization across sessions. First, we performed a fixed-effects analysis, in which parameters were estimated independently for each session by maximizing the likelihood separately; secondly, we conducted a mixed-effects analysis, treating parameters as random effects across sessions within each subject. The effect of the prior procedure reduces the risk of overfitting by constraining parameters based on their empirical distributions, rather than allowing unconstrained session-by-session estimates. Finally, all model fitting procedures were verified on surrogate generated data.

      With regard to dynamic weighting, our approach – consistent with most two-step studies – assumed ω to be constant across trials within each session. This was a deliberate choice, both for comparability with prior work and because our subjects were extensively trained, making session-level stability of strategy weights a reasonable assumption. Indeed, our analyses showed no systematic drift in ω across sessions, suggesting that MB/MF balance was stable over sessions. While approaches that allow dynamic ω estimation are possible, we believe such extensions would likely have minimal impact in the current dataset.

      (6) It was unclear from the task description whether the images used changed periodically or how the transition effect (e.g., in Figure 3) could be disambiguated from a visual response to the pair of cues.

      All images were kept constant across sessions. Common/Rare transitions themselves were not explicitly cued, but rather each second-stage state was associated with a specific background colour, followed ~1s later by the presentation of two specific second-stage choice cues (Figure 1B). Hence the subject could infer whether they were transitioned down a Rare or Common path by the background colour, which can be disambiguated in time from the visual responses to the second-stage cues. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which could be inferred by a change in background colour immediately after choice indicating which second stage state they had just entered, Figure 1A).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 7 appears to be missing.

      We thank the reviewer for pointing this out. Figure 7 was inadvertently omitted in the previous version and has now been included in the revised manuscript.

      (2) No stats reported in the section on explore/exploit.

      We apologise for this oversight. This section now also reports the relevant statistics:

      “We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F).”

      (3) Make sure that error bars are explained in all figure captions where appropriate.

      We apologise that this information was absent. Error bars always represent the standard error of the mean. This has now been added to all relevant figure legends.

      Reviewer #2 (Recommendations for the authors):

      Overall, I think this is a great manuscript and was presented clearly and succinctly. I have some minor suggestions:

      (1) Typo: Abstract "ACC, DLPFC, caudate and striatum" I think should be "caudate and putamen".

      We have amended this incorrect reference in the introduction:

      “One such task that does enable the dissociation of MB and MF computations is Daw et al. (2011)’s ‘two-step’ task [18]. It contains a probabilistic transition between task states to uncouple MF learners (who would assign credit to which state was rewarded regardless of the transition) from MB learners (who would appropriately assign credit based on the reward and transition that occurred). Rodents [19], monkeys [36], and humans [18] all use MB-like behaviour to solve the task. Evidence in rodents suggests dorsal anterior cingulate cortex (ACC) tracks rewards, states, and the probabilistic transition structure, and that ACC is essential in implementing a MB-strategy [37]. Here, we compare primate single neuron activity of 4 different subregions implicated in reward-based learning and choice (ACC, dorsolateral PFC (DLPFC), caudate, and putamen) during performance of the classic two-step task, and demonstrate signatures of MB-RL primarily in ACC, and MF-RL signatures most notably in putamen.”

      (2) Could the authors provide a rationale for why they did the single-level encoding the way they did, instead of running an ANOVA?

      We thank the reviewer for this point. We are not entirely certain which specific ANOVA approach is being suggested, but our rationale for using a GLM-based encoding analysis is that such approach allows us to model continuous, trial-by-trial variables (e.g., value signals, prediction errors, transitions) while simultaneously controlling for multiple correlated predictors. This approach is widely used in systems neuroscience (particularly in decision-making research) offering analytical flexibility and comparability with prior approaches.

      (3) How were the 20 iterations for decoding decided? That seems low.

      We do not agree that 20 repetitions of 5-fold cross validation is low. The error bars in panels 6C-E demonstrate what low variance occurred across these 20 repetitions. It is the average of these low variance repetitions against which we performed statistics by performing a permutation test where these 20 repetitions were repeated a further 500 times.

      (4) It was unclear to me how the authors reached the conclusion "Thus, caudate activity appeared to represent the value of the state the subject was currently in." when the state value wasn't computed directly. I don't see how encoding the chosen and unchosen option is the same as the state the animal is in, which should also incorporate where the animal is in a block of trials or session, and the knowledge regarding the chosen and unchosen option.

      We agree with this point and have tempered this statement:

      “Thus, caudate’s encoding of an option’s value also reflected the availability of the option.”

      (5) Figures 1C, D, and E were not legible to me even at 200% zoom.

      We apologise for this oversight. We’ve now updated panels 1C-E to a more readable size:

      (6) There is a Figure 2H in the figure legend, but the panel appears to be missing from Figure 2.

      This text has been removed.

      (7) Figure 2: It would've been nice to see F and G for all areas.

      We have now added this data as additional panels in Figure 2.

      (8) Figure 3: How is the transition disambiguated from a visual response to the set of images?

      This was indicated by the background changing colour to that of the learned second stage state before the actual choices were presented. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which was indicated by a change in background colour before the second stage choices were presented, Figure 1A).”

      (9) Figure 4F: Is this collapsed across time points? So neurons that were significant at any time? I'm confused how Figure 4A relates to 4F, as 4A shows much lower percentages of significant neurons.

      Figure 4F counts the total number of neurons that had a significant period of encoding at any timepoint over the epoch (as assessed with a length-based permutation test). Whereas, 4A shows the amount of significant encoding neurons at any one time point. Investigating this further, we found that the encoding was dynamic with different neurons encoding different parts of the epoch. We have now added a new supplementary figure to highlight this and refer to it in Results:

      “Examination of the strongest signal observed, ACC’s encoding of MB Q-values, showed a dynamic pattern with different neurons encoding the signal at different parts of the epoch (Figure S6). When aggregating the number of significant coders throughout the epoch, and examining the specificity of MB versus MF coding, we found that all regions had a significant population of neurons that encoded MB-, but not MF-, derived value (30, 18.72, 23 and 24% of neurons in ACC, DLPFC, caudate and putamen respectively; all p<0.0014 binomial test against 10% (as the strongest response to either of the two options was used); Figure 4F).“

      (10) Data/ code could be made publicly available instead of upon request.

      All data and code to reproduce figures are now available at https://github.com/jamesbutler01/TwoStepExperiment. The manuscript has been updated to reflect this:

      Data and materials availability:

      All data and code to reproduce figures are available at https://github.com/jamesbutler01/TwoStepExperiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth. Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled the screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex and suggests a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge of interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of the targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the patterns that underscore the complexity of metabolic systems.

      We agree with reviewer #1 that metabolic fingerprints are complex to interpret and we did try to approach this problem by including mock treatment and non-metabolic inhibitors as controls. We address specific concerns below.

      Reviewer #2 ( Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. The authors claim that MV1028806 targets the bc1 complex of the mitochondrial electron transport chain of the parasite, although the evidence for this is indirect and speculative. Nevertheless, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors, although falls short of identifying the actual targets.

      Weaknesses:

      (1) The authors claim to have identified a compound in their screen (MMV1028806) that targets the bc1 complex of the mitochondrial electron transport chain (ETC). The evidence they present for this claim is indirect (metabolomic signatures and changes in mitochondrial membrane potential) and could be explained by the compound targeting other components of the ETC or affecting mitochondrial biology or metabolism in other ways. In order to make the conclusion that MMV1028806 targets the bc1 complex, the authors should test specifically whether MMV1028806 inhibits bc1-complex activity (i.e. in a direct enzymatic assay for bc1 complex activity). Testing the activity of MMV1028806 against other mitochondrial dehydrogenases (e.g. dihydroorotate dehydrogenase) that feed electrons into the ETC might also provide valuable insights. The experiments the authors perform also do not directly measure whether MMV1028806 impairs ETC activity, and the authors could also test whether this compound inhibits mitochondrial O2 consumption (as would be expected for a bc1 inhibitor).

      We thank the reviewer for highlighting this important aspect. To further investigate the effect of MMV1028806 on the mETC, we adapted a commercial oxygen consumption assay and demonstrated that MMV1028806, like Atovaquone and Buparvaquone, inhibits the ETC, leading to reduced oxygen consumption similar to Antimycin A, which inhibits the bc1-complex. These results are now included in the revised manuscript (Methods, lines 210–233; Results, lines 460–468).

      (2) The authors claim that compounds targeting bradyzoites have greater lipophilicity than other compounds in the library (and imply that these compounds also have greater gastrointestinal absorbability and permeability across the blood-brain barrier). While it is an attractive idea that lipophilicity influences drug targeting against bradyzoites, the effect seems pretty small and is complicated by the fact that the comparison is being made to compounds that are not active against parasites. If the authors are correct in their assertion that lipophilicity is a major determinant of bradyzoicidal compounds compared to compounds that target tachyzoites alone, you would expect that compounds that target tachyzoites alone would have lower lipophilicity than those that target bradyzoites. It would therefore make more sense to (statistically) compare the bradyzoicidal and dual-acting compounds to those that are only active in tachyzoites (visually the differences seem small in Figure S2B). This hypothesis would be better tested through a structure-activity relationship study of select compounds (which is beyond the scope of the study). Overall, the evidence the authors present that high lipophilicity is a determinant of bradyzoite targeting is not very convincing, and the authors should present their conclusions in a more cautious manner.

      Thank you for raising this excellent point. We performed a statistical test of tachyzoidal and both bradyzoidal and dually active compounds and find indeed no significant difference (P = 0.06). We altered the results text line 367-368 and the figure S2B caption to explicitly mention this.

      (3) Page 11 and Figure 7. The authors claim that their data indicate that ATP is produced by the mitochondria of bradyzoites "independently of exogenous glucose and HDQ-target enzymes." The authors cite their previous study (Christiansen et al, 2022) as evidence that HDQ can enter bradyzoites, since HDQ causes a decrease in mitochondrial membrane potential. Membrane potential is linked to the synthesis of ATP via oxidative phosphorylation. If HDQ is really causing a depletion of membrane potential, is it surprising that the authors observe no decrease in ATP levels in these parasites? Testing the importance of HDQ-target enzymes using genetic approaches (e.g. gene knockout approaches) would provide better insights than the ATP measurements presented in the manuscript, although would require considerable extra work that may be beyond the scope of the study. Given that the authors' assay can't distinguish between ATP synthesized in the mitochondrion vs glycolysis, they may wish to interpret their data with greater caution.

      We thank the reviewer for addressing this important point. The enzymatic assay used in our study cannot distinguish whether ATP is produced via glycolysis or mitochondrial respiration. However, we minimized glycolytic ATP production in bradyzoites by starving them for one week without glucose. After this period, amylopectin stores are depleted, forcing the parasites to utilize glutamine via the GABA shunt to fuel the TCA cycle and generate ATP predominantly through respiration. While minor ATP production via gluconeogenic fluxes cannot be excluded, the main ATP supply under these conditions is expected to originate from the mitochondrial electron transport chain. Indeed, ATP levels are lower in HDQ-treated bradyzoites, which we attribute to the compound’s impact on electron-supplying enzymes upstream of the bc1 complex, although this inhibition is not sufficient to fully abolish ATP production as observed with Atovaquone treatment.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared them with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlights different metabolic outcomes for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused on the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Weaknesses:

      Although the authors did experiments to identify the metabolomic profile of the compounds and suggested bc-1 complex as the main target of MMV1028806, they did not provide experimental validation for that.

      In our updated manuscript we performed additional experiments such as oxygen consumption assay to further qualify the bc1 complex as the target. We also toned down some of our statements to make sure that no false claims are made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Introduction: It would be helpful to briefly describe what the pathogen Box is, what compounds are in it, and the rationale for using a drug screen to better understand mitochondrial function in cysts.

      Thank you for this suggestion, we added an introduction of the MMV pathogen box and outlined our rationale for our experimental approach in lines 90 to 99.

      Please explain why dual-active drugs were useful for understanding differences, rather than just seeking drugs that might target bradyzoites alone.

      We focused on dually active compounds for two reasons. First, these are the most promising and potent targets to develop drugs against. Both stages might occur simultaneously and these dually active drugs may eliminate the need for treatment with a drug combination. Second, we speculated that monitoring the responses to inhibition of the same process in both parasite stages would reveal its functional consequences. Dually active compounds enable this direct comparison. Bradyzoite-specific compounds may be interesting from a developmental perspective but may require a reverse genetic follow-up to compare differences between stages. The lack of a well-established inducible expression system in bradyzoites that allows short term and synchronized knock-down makes metabolomic approaches difficult. We added these two points in brief to the results section (line 378 – 381).

      Figure 4: this is a very important figure in understanding the significance of the work, but it is not well described in the legend. Even if these graphics have been used in other manuscripts, it would be helpful to provide better annotation in the figure legend.

      Thank you for pointing this out. We expanded the figure legend to explain the isotopologues data in more detail. Line 793 to 802.

      B,D: Explain what the three columns for each drug category represent.

      Addressed

      C,E: Explain what isotopologues are, what the M+ notation means, and what the pie charts represent. Other main figures have suitable legends.

      Addressed

      Discussion: there are several places where the reasoning is a bit hard to follow, and rearrangement to provide a clear logical flow would be helpful. In particular, the reasoning for why HDQ impairs active but non-essential processes could be laid out more clearly.

      We added additional clarifications to the discussion section and re-wrote the HDQ paragraph. We hope that our reasoning is now easier to follow.

      Abbreviations: A list of abbreviations for the entire manuscript would be helpful.

      This is a good idea and we now provide an abbreviations list.

      Minor typos:

      P12, 2d paragraph: sentence beginning with: Consistent with this hypothesis... "cysts" is used twice

      Corrected

      P15, top of the second paragraph: "nano" and "molar" should be one word

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Major comments (not already covered in the weaknesses section of the public review)

      (1) Figure 2 and the related description of these experiments in the methods section (page 3). The approach for calculating IC50 values for the compounds against tachyzoites is unclear. How did the authors determine the time point for calculating IC50 vacuoles? Was this when the DMSO control wells reached maximum fluorescence? This could be described in a clearer manner. A concern with calculating IC50 values on different days is that parasites will have undergone more lytic cycles after 7 days compared to 4 days, which means that the IC50 values for fast- vs slow-acting compounds might be quite different between these days. As a more minor comment on these experiments, the methods section does not describe whether the test compound was removed after 7 days, as the experimental scheme in Figure S1A seems to imply. Please clarify in the methods section.

      This is a very good point and we clarified this in the methods section, line 157–160. In brief, we choose the latest time point when exponential growth could be observed in the fastest growing cultures, generally this was in mock treated cultures and at day 4 post infection. We also clarified that we changed media and removed treatment after 7 days.

      Minor Comments

      (2) Page 2. "we employed a recently developed human myotube-based culture system to generate mature T. gondii drug-tolerant bradyzoites". What makes these bradyzoites 'drug-tolerant' or to which drugs are they tolerant? This isn't clear from the description.

      We added these details in the introduction (line 94 to 96) and state that these cysts develop resistance against anti-folates, bumped kinase inhibitors and HDQ, a Co-enzyme Q analog.

      (3) Figure 1E. The number of compounds in this pie chart adds up to 384, whereas the methods describe that 371 compounds were tested. What explains this discrepancy in numbers?

      We understand the confusion. We now updated the pie chart to reflect only compounds that were included in the primary screen (371) as reflected in Supplementary Table S1. We separately analysed 29 compounds that were previously tested against tachyzoites by Spalenka et al., and found an additional 13 compound, that were originally included in the pie chart. In a secondary test the activity of 10 of these 13 compounds could be confirmed. All in all we found the 16 compounds shown in Fig. 2 E-G.

      (4) Page 3. The resazurin assays for measuring host cell viability could be explained in a clearer manner. What host cells were used? Were the host cells confluent when the drug was added (and the assay conducted) or was the drug added when the host cells were first seeded? How long were the host cells cultured in the candidate inhibitors before the assays were performed? What concentration (or concentration range) were the compounds tested? The host inhibition data are not easily accessible to the reader - the authors might consider including these data as part of Table S2D.

      The necessary information was added to the methods section (line 145 to 153). We tested for host toxicity in both HFF and KD3 myotubes during the primary screen at 10 µM in triplicates. The colorimetric assay was performed after tachyzoite growth assays in HFFs 7 days post infection and after completion of the 4 week re-growth phase of bradyzoites in myotubes. The resulting data is already part of Supplementary File 1. In addition, we performed concentration dependent resazurin assays after secondary concentration dependent growth inhibition assays and also included data in Supplementary File 1. For the bradyzoite growth assay we performed visual inspection after drug exposure for one week and before tachyzoite re-growth to detect missing or damaged monolayer. Also, this data is included in the Supplementary File 1. We also included the cytotoxicity data as suggested into Table S2D.

      (5) Page 7. "Except for four compounds (MMV021013, MMV022478, MMV658988, MMV659004), minimal lethal concentrations were higher in bradyzoites". The variation in these data seems quite large to be making this claim. Consider a statistical analysis of these data to compare potencies in tachyzoites vs bradyzoites.

      With this sentence we aimed to describe the results and not to make a statement. We toned down the sentence to “… minimal lethal concentrations appear generally higher in bradyzoites… “ line 344 to 347. We also added a line 1 µM in the charts to facilitate easier comparison of compound efficacies.

      (6) It would be helpful to readers to include the structures of hit compounds in the figures (perhaps as part of Figure 3).

      This is a good idea and would improve the manuscript. To not overburden figure 3 we added structures to Fig S3.

      (7) Page 8. "Infected monolayers were treated for three hours with a 3-fold of respective IC50 concentrations". 3-fold higher than IC50 concentrations? This isn't clear.

      Thank you for noticing this: We clarified the sentence and also corrected the concentration, corresponding to five times their IC50s as stated in the methods section: “Infected monolayers were treated for three hours with compound concentrations five times their respective IC<sub>50</sub> values or the solvent DMSO.” Line 374 - 376

      (8) Page 9. "buparvaquone, which we found to be dually active against T. gondii tachyzoites and bradyzoites, targets the bc1-complex in Theileria annulata (McHardy et al. 1985) and Neospora caninum (Müller et al. 2015) and was recently found active against T. gondii tachyzoites (Hayward et al. 2023)." The latter paper showed that buparvaquone targets the bc1 complex in T. gondii tachyzoites as well.

      Yes, it was found to inhibit O2 consumption rate in tachyzoites. We changed the sentence accordingly. Line 407 to 411.

      (9) Page 9. "Anaplerotic substrates were also affected by all three treatments, most notably a strong accumulation of aspartic acid." It is interesting that the M+3 isotopologue of aspartate (presumably synthesised from pyruvate) is the predominant form (rather than the M+2 and M+4 isotopologues that would derive from the TCA cycle, and as the diagram in Figure 4A seems to suggest). Given that aspartate is a precursor of pyrimidine biosynthesis that is upstream of the DHODH reaction, it is conceivable that its accumulation is related to the depletion of pyrimidine biosynthesis (so would tie into the point about the accumulation of DHO and CarbAsp noted earlier in the paragraph).

      Yes, we assume the same. We altered the text and summarized the changes in Asp as a result of DHOD inhibition, as we also already do in the next paragraph using <sup>15</sup>N-glutamine labelling. Line: 416 - 418

      (10) Figure 6 and Page 10. Regarding the metabolomic experiments that show increased levels of acyl-carnitines. The authors note that "Since [beta-oxidation] is thought to be absent in T. gondii, we attribute these changes to inhibition of host mitochondria". This is conceivable, although the T. gondii genome does encode homologs of the proteins necessary for beta-oxidation (e.g. see PMID 35298557). If the carnitine is coming from host mitochondria, is host contamination a concern for interpreting the metabolomic data? Or do the authors think that parasites are scavenging carnitine from host cells? It is curious that the carnitine accumulation is observed in parasites treated with buparvaquone (and MMV1028806) but not atovaquone, even though buparvaquone and atovaquone (and possibly MMV1028806) target the same enzyme. Do the authors have any thoughts on why that might be the case?

      Yes, thank you for raising this point. We changed the discussion elaborating on this and included the debated presence of beta-oxidation: line 640: “We also detect elevated levels of acyl-carnitines in BPQ and MMV1028806 treated bradyzoites. These molecules act as shuttles for the mitochondrial import of fatty acids for β-oxidation. However, this pathway has not been shown to be active and is deemed absent in T. gondii (35298557, 18775675). The presence of acyl-carnitines in bradyzoites might reflect import from the host. It is conceivable that their elevation in response to buparvaquone and MMV1028806 indicates compromised functionality of the host bc1-complex and subsequently accumulating β-oxidation substrates. Indeed, BPQ has a very broad activity across Apicomplexa (Hudson et al. 1985) and kinetoplastids (Croft et al. 1992).“ Regarding the existence of beta-oxidation: some potential enzymes might be conserved, but those could in part take part in branched chain amino acid degradation pathways. On a separate note: we looked extensively on beta-oxidation using stable isotope labelling and became convinced that any activity occurred in the host cell only but not in the parasite (unpublished).

      (11) Page 11. "the mitochondrial [electron] transport chain in bradyzoites".

      Corrected.

      (12) Figure S6B. Were these optimization experiments performed in tachyzoites or bradyzoites? If the former, and given that bradyzoites have apparently smaller amounts of ATP per parasite (Figure 7C), are these values in the linear range for 10^5 bradyzoites?

      Yes, we do think that the assay remains linear for these lower concentrations. Tachyzoites give a linear response starting from 10^3 parasites per sample. In the actual experiment we used 10^5 parasites, both tachyzoites and bradyzoites. Under the tested conditions bradyzoites maintain 10% of the ATP pools of tachyzoites, which should be well within the linear range of the assay. Also in Atovaquone-treated bradyzoites ATP concentration could be lower to 10% and still remain in the linear range of the assay. For practical reasons, we simply acknowledge this limitation and consider it acceptable within the scope of this study.

      Reviewer #3 (Recommendations for the authors):

      Major comments

      (1) The authors should provide a negative control for the experiment on Figure 5. I would suggest doing the same experiment with an inhibitor that has no effect on mitochondrial potential.

      We addressed this criticism by repeating the assay on tachyzoites and additionally including inhibitors that do not have the mitochondrial electron transport chain as their primary target (Pyrimethamine, Clindamycin, 6-Diazo-5-oxo-L-norleucin). The results are summarized in the supplementary Fig S5, line 445 – 449) and show that there is no effect of these inhibitors on the mitochondrial membrane potential. This supports the specificity of the assay and suggests that MMV1028806 and BPQ indeed target a mitochondrial process in this stage. Also, in this repetition ATQ, BPQ and MMV1028806 did significantly deplete the Mitotracker signal.

      (2) Figure 5 - Did the authors perform this experiment in 3 biological replicates? This requires clarification of the figure legend.

      No, we did not perform the experiment in 3 biological replicates. After establishing the assay thoroughly, we performed it once on tachyzoites and bradyzoites. The sampling was done on every vacuole we encountered during microscopy going through the slide from left to right. That is the reason the sample size varies from treatment to treatment. The sample size is mentioned in the caption of figure 5. However, we repeated the experiment with additional controls (see Fig. S5), which showed that the Mitotracker signals were significantly depleted in a very similar manner in ATQ, BPQ and MMV1028806 treated parasites.

      (3) The authors identify that MMV1028806 has bc1-complex as the main target. I suggest that they should perform a complex III activity assay to affirm this. Also, it would be good to test if other mETC complexes are affected by this compound to prove its specificity. There is only one paper showing complex III activity in tachyzoites (PMID:37471441) and no papers in bradyzoites. So if the authors cannot do this assay, I suggest that they should change the text indicating that bc-1 complex could be the main target of the compound but more experimental validation is needed.

      We hope to have satisfied the reviewer’s request by performing an oxygen consumption assay on tachyzoites. Together with metabolic profiling and labelling data, this shows that both upstream and downstream processes are impacted by MMV1028806 and strongly suggest the bc1-complex as a target (Fig 5E).

      (4) Figure S5 - Are the differences shown in the EM experiment statistically supported?

      We analyzed 28 images and measured the areas in 12 to 26 images. We substituted the table of means in Fig S6B by a graph showing individual values. These areas are indeed statistically different between DMSO and ATQ / MMV treated parasites. We changed the wording in the results section accordingly “Analysis by thin section electron microscopy revealed a largely unaffected sub-mitochondrial ultrastructure but the areas of mitochondrial profiles were changed in comparison to control after exposure with ATQ and MMV1028806 but not with BPQ (Fig. S6)“. The description of Fig S6B was changed to “(B) Measured areas of mitochondrial profiles from 21, 12, 15 and 26 images showing DMSO, ATQ, BPQ and MMV1028806 treated parasites (* denotes p < 0.05 in Mann-Whitney tests)”.

      Minor comments:

      (1) What was the criteria to choose the example compounds in Figure 1B and 1D? The authors should clarify this in the text.

      These graphs are shown for illustrative purposes and were chosen based on their display of different drug efficacies. We considered this helpful for interpreting the screening data.

      (2) Figure 2G - add statistical analysis.

      We added Mann-Whitney tests and updated the figure legend and results text accordingly in line 344 – 347.

      (3) The authors should provide more insights in the discussion about why this new compound is the next step in drug discovery compared to atovaquone or burvaquone - for example, do you expect better availability in the brain, etc.

      We used MMV1028806 and the other hits ATQ and BPQ to make the point that the bc1-complex is a good target in bradyzoites that allows curative treatment. We do not suggest that the compound itself is a good starting point. We point to other actively developed candidates such as ELQ series in the discussion, line 719.

      (4) Scale bars in Figure 5 should be aligned and have equal thickness.

      We re-formatted the scale bars and aligned them when not obscuring parasites.

      (5) The authors should be consistent with font sizes and styles in all the figures.

      We adjusted the font styles to match each other.

    1. These developments make the evidentiary gap salient: funders, editors, and policymakers need to know when AI evaluation outputs are trustworthy enough to use, and when they are unstable, biased, or manipulable. Recent work highlights all three concerns. First, reproducibility can be “jagged”: repeated runs of the same models on the same corpus over time can be highly consistent for some tasks and models, but much less so for others (Thomas, Romasanta, and Pujol Priego 2026); robustness may require separating scientific judgment from computational execution (Xu and Yang 2026); and even without overt adversarial intent, subtle reframings of the same task can induce systematic shifts in outputs—a form of LLM “specification search”—raising concerns about frame-sensitive biases when models serve as measurement instruments (Asher et al. 2026). Second, adversarial manipulation is not hypothetical: invisible-text “prompt injection” can substantially inflate LLM-assigned review scores and acceptance recommendations in simulated peer review (Choi et al. 2026), and prompt-injection vulnerabilities are also documented in other high-stakes advice settings (Lee et al. 2025). Third, even when outputs look fluent and plausible, it remains unclear whether AI models approximate expert judgment: AI-generated reviews tend to cover more surface-level sections while being less thematically diverse and less focused on interpretation, originality, and applicability than human reviews (Rajakumar et al. 2026); LLMs used as manuscript quality checkers identify only a small fraction of confirmed critical errors even with the strongest reasoning models (Zhang and Abernethy 2025); and LLM scoring exhibits systematic range restriction and halo effects that can distort agreement metrics (Wang et al. 2025).

      This seems too long. This isn't really coming from us, so we might mention some of these things, but I tend to make this a lot shorter. Perhaps some things can be put in footnotes. Obviously we need to check these carefully to see if we agree with them.

      I think I mentioned before I'm not sure our work really speaks to the prompt injection issue. The set of work we're putting the LLMs and humans to evaluate would seem to be rather unlikely to have such prompt injection, so we can't really test that (unless we modified the work being fed in, but I don't think that's in our wheelhouse right now. )

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity (Required)

      Ali et al investigate the composition of putative kinetochore subcomplexes in the unicellular eukaryote Tetrahymena thermophila. Up to the point of this study, only a CENP-A ortholog and two subunits of the microtubule-binding Ndc80 complex had been clearly identified. This left open the question, whether Tetrahmena kinetochores follow the conventional organization found in common model systems such as yeast or human cells, or contain many unconventional proteins. The authors combine proximity biotinylation coupled to mass spectrometry with deep homology searches and structure predictions.

      Extensive bioinformatic analysis of the T. thermophila genome allows the authors to annotate 16 genes as kinetochore genes (KiTT). Using sequence comparisons with known kinetochore proteins, they were able to relate their novel KiTT proteins to the conserved kinetochore components Cenp A, Cenp C, the KMN network, as well as auxiliary proteins. In particular, the authors were able to complete the organization of the Ndc80 complex and identify subunits of the Mtw1/Mis12 complex and a Knl1 ortholog. This characterizes a KMN network as the centerpiece of the Tetrahymena kinetochore architecture.

      The CCAN seems to be represented solely by CENP-C, with key binding interfaces to KMN and CENP-A being preserved. An interesting aspect is that neither a Dam1-, nor a Ska homolog seems to support the Ndc80 complex. Instead, the authors identify a Kinesin-6 homolog that may potentially compensate for the absence of these factors.

      The study is well-designed, the results are thoughtfully discussed and the expertly conducted experiments highlight the power of combining experimental identification (BioID) with bioinformatic analyses.

      We appreciate the favorable assessment of our manuscript and would like to extend our thanks for the reviewers’ constructive criticisms and insightful comments. Where possible we aim to incorporate them (see below).

      Major comments The functional validation of the newly identified subunits using RNAi feels somewhat limited in this study. I understand there a technical limitations in this system, but whenever possible, I would at least expect the authors to explore differential effects on different parts of the kinetochore using the reagents they have at hand. In the particular, the authors show the effects of depleting KiTT12 (the kinesin-6 homolog) on Ndc80 kinetochore localization. It would be important to check effects also on CENP-A (using the anti-CNA1 antibody), or on other subunits. Given the available reagents, this should be readily possible.

      We agree that examining the effect of KiTT12 depletion on inner kinetochore components will strengthen the functional interpretation. While we do not expect, based on KiTT12’s relative location, a direct impact of KiTT12 RNAi on CNA1 (CENP-A) or CENP-C, we will perform immunofluorescence analyses using anti-CNA1 and anti-CENP-C antibodies in KiTT12 RNAi cells (and KiTT2 (NUF2) RNAi as control). These experiments will allow us to determine whether KiTT12 depletion specifically affects outer kinetochore integrity (as suggested by Ndc80 mislocalization) or more broadly perturbs kinetochore architecture (CNA1/CENP-C). We will include quantitative analyses of signal intensity and kinetochore organization to clarify potential hierarchical dependencies.

      The organization of the Knl1 ortholog and the question of whether a mitotic checkpoint is present, deserves some additional discussion. Interestingly, the positional organization of a PP1 binding motif at the N-terminus of a long disordered domain seems conserved. On the other hand, MELT motifs appear to be absent. The authors should discuss the implications of this some more. Is there an Mps1 homolog? What about the error correction machinery including Aurora B and the CPC? The putative MadBub homolog does not seem to localize to kinetochores, but maybe this is not detectable, unless the respective conditions (unattached kinetochores) are generated. Is it known, how the system reacts to spindle depolymerization?

      Tetrahymena does not appear to have a spindle checkpoint, given prior reports that chromosome segregation is not halted by microtubule depolymerization [Kaczanowski et al. 1985]. In line with this, the SAC protein orthologs that are present lack the motifs to mount a sufficient response and halt the cell cycle. We thus agree that the architecture of the Tetrahymena KNL1 ortholog and possible other SAC-related proteins raises important evolutionary questions. We will expand the discussion to address:

      • The absence of canonical MELT motifs in the Tetrahymena KNL1 ortholog.
      • The absence of a detectable Mps1 ortholog in our homology searches.
      • The divergence of the Tetrahymena MadBub protein and its lack of conserved KEN–ABBA–KEN motifs typically required for APC/C inhibition.
      • The absence of Mad2 and Mad2-binding motif in Cdc20. Relevant REFS:

      • Kaczanowski et al. 1985, Experimental Cell Research

      • Loidl et al. 2009, Molecular Biology of the Cell

      • Kops et al. 2020, Current Biology

      Minor comments - Introduction: When introducing the Tetrahymena kinetochore, please add some sentences on microtubule/spindle organization in the MIC. What is known about the kinetochore-microtubule attachment site in Tetrahymena?

      We will expand the introduction to include a concise description of spindle organization in the micronucleus (MIC), including known features of centromere clustering, spindle assembly, and microtubule attachment sites during MIC mitosis.

      Relevant REFs:

      • Davidson et al. 1975, Biosystems

      • Lafountain Jr et al. 1979, Chromosoma

      • Lafountain Jr et al. 1980, Cell Motility

      • Line 128: putative homology to Spc24 (E=13), comment on why this was considered, what cutoffs were applied etc..

      We will clarify the homology detection criteria, including E-value thresholds, domain architecture considerations, reciprocal searches, and structure-based validation. We will explain why this candidate was retained despite weak sequence similarity and how structural prediction strengthened confidence. In (very) short, we used the ‘top hit’ principle. E=13 for spc24 was simply the first hit and upon AlphaFold-predicted structures, the protein was clearly similar to spc24.

      • Line 135: briefly mention and discuss conservation of the RWD folds in the Spc24-25 orthologs.

      We will expand this section to explicitly describe conservation of the RWD fold and how structural modeling supports ortholog assignment despite sequence divergence. The E-values mentioned in line 128 for instance are for the RWD domain-only, not the full-length protein, we will further indicate this in the text.

      • Line 194: Maybe replace "show" with "suggest", given there is no experimental data behind the CENP-C identification

      We agree and will revise wording to “suggest” to avoid overstatement. However, we do want to point out that CENP-C/KiTT8 was identified experimentally as well through the BioID pipeline, and also an antibody was raised against KiTT8 that places this protein at the inner kinetochore.

      • Figure 7B: please add the information for the RNAi target directly to the Figure

      We will add the requested information directly to the figure.

      • Figures in the combined pdf: please add the respective Figure number or Supplementary Figure number directly on the Figure.

      We will add the figure numbers to the supplementary figure files.

      Significance (Required)

      While functional studies are often conducted in very few model organisms, exploring the evolutionary variations of kinetochore architecture can help to understand the design principles of kinetochores. I also helps to assign functions to specific subcomplexes and can reveal how adaptations of a core machinery occurs. Tetrahymena is historically an important experimental system that has had a great impact on the understanding of multiple aspects of nuclear biology. Deciphering the organization of the chromosome segregation machinery in this organism is therefore of great interest to researchers interested in mitosis and genome stability.


      Reviewer #2

      Evidence, reproducibility and clarity (Required)

      Summary Ali, Raas et al. provide a comprehensive molecular characterization of the kinetochore in the ciliate Tetrahymena thermophila. By integrating proximity proteomics (TurboID) with structure-based "deep" homology detection, they identify 16 kinetochore proteins (KiTT1-16), including nine highly diverged "cryptic" orthologs of conserved LECA components and four lineage-specific proteins. Their results demonstrate that while the Tetrahymena kinetochore lacks a conventional CCAN complex, it maintains a recognizable outer kinetochore structure supplemented by novel proteins essential for faithful chromosome segregation.

      Major comments 1. Representation of known kinetochore diversity - Since this manuscript wants to highlight that it is important to characterize kinetochore components in different eukaryotic clades, it would be good to highlight the known diversity from the literature in Figure 1, e.g. indicating species/clades for which components have been experimentally validated vs. only computationally inferred. - It would be good to specifically highlight this on the figure for the clade closest to Tetrahymena in which KT components have been experimentally validated (Apicomplexa?). - L58-64: the sentences 'we have a limited understanding about kinetochore composition and function from other branches of the eukaryotic tree of life' and 'these surveys also uncovered a surprisingly extensive diversity of kinetochore composition across eukaryotes' seem to contradict each other. Instead of/in addition to the literature described in the introduction, as suggested above, having known diversity indicated on a figure would therefore be helpful. This could be done quite roughly, just mentioning the number of verified KT components and the number of species for which this was done.

      We will add a more elaborate version of Figure 1a (or include an extended version to the supplement), summarizing the requested information in the above three points. Indeed, our mention of diversity in lineages is an inferred one, not a directly tested one. We will amend the text to clarify this.

      • L46-L56: when explaining the structure of the KT, it would be good to already refer to a figure, like the diagram of a human KT in 1B. As it is now, the introduction first explains the general structure, and then goes into diversity. This is fine, but it would be easier to understand if the figure panels followed this order.

      We will include additional references to figure 1b at the appropriate places in the introduction.

      The data can sometimes be represented in a more straightforward manner: - L120-...: After reading through the whole text, I understand why the authors choose to talk about Spc24 and Spc25 first (since Spc25 is also used in the TurboID experiment). However, the presented pipeline for these two proteins is much less convincing than for the other proteins. Spc24/25: 'Some homology > slight structure similarity > right localization in immunostaining' vs. the pipeline for the other proteins: 'TurboID > confirmation using homology + immunostaining' (what is depicted in Fig. 2C). The latter is very convincing, but by starting off with the less convincing pipeline, the reader starts off on the wrong track. Since Spc24 is not used in the end for the first TurboID results, is Spc25 necessary at this point or can this come later?

      We used this ‘story line’ because it was the way it happened. It felt wrong to us to pretend we hadn’t already found Spc24 and Spc25 by bioinformatic means before doing the TurboID, which might also have caused concerns with some readers as to our ability to detect orthologs for these and other proteins. Of note: a re-analysis of the Spc24-BioID experiment revealed that it was previously wrongfully considered unusable, hence we now include it in our NDC80-C based TurboID discovery pipeline in Figure 2. We will where possible revise the narrative structure to more clearly explain the logic of the discovery pipeline, while maintaining transparency about the historical order in which candidates were identified. We will streamline the Spc24/25 section and more prominently introduce the TurboID-driven identification pipeline (Figure 2C) to guide the reader.

      • It is very good and thorough that the authors noticed that some of the KT proteins were simply missed because they were not part of the original predicted proteome. However, why weren't the TurboID analyses simply redone with the new proteome? The authors could still note that it was important to use the most recent version, but it would be much more straightforward for readers to immediately have the most up to date analysis.

      We thank the reviewer for pointing this out. We agree that remapping to the most recent proteome annotation will improve clarity. We will remap the TurboID datasets to the updated Tetrahymena proteome, which includes Nnf1 and Csm1, and report whether additional components are identified. Of note: in a preliminary analysis with the newest version of the proteome we do not find any new proteins in the NDC80-C-TurboID experiments. We will also clarify in the manuscript what “not in original proteome” refers to and revise Figure 2C accordingly.

      Figure 4 and accompanying paragraph: this is an interesting analysis, but impossible to interpret without comparing with the branch length of other Tetrahymena proteins or Tetrahymena as a species (if I interpreted the analysis correctly). L251: 'this underscores the high rates of evolution of kinetochore proteins'. This could be true, but this isn't proven here because there is no comparison with the evolutionary rate of other proteins in Tetrahymena.

      The reviewer is correct in arguing that without comparisons to other proteins, the statement that kinetochores proteins in Tetrahymena evolved at high rates is incorrect, or at least not supported by the present data. What we meant was to say that they evolved at high/increased rates compared to kinetochore proteins of other species. This in our view explains why we have missed them in past searches, regardless of whether this is specific to the kinetochore in Tetrahymena or to Tetrahymena proteins in general. We will amend the text to reflect this more clearly. We will explicitly acknowledge analytical limitations and remove claims regarding lineage-specific acceleration.

      Figure 5: For further validation and to better show the layered structure of the Tetrahymena kinetochore it would be nice to have a couple of images here with increased resolution by using expansion microscopy.

      We agree that improved spatial resolution would strengthen the layered organization model. We will attempt to perform expansion microscopy (ExM) on selected tagged kinetochore components and incorporate representative images into the revised manuscript (main or supplementary figures).

      Minor comments - Abstract: if you are going to call out individual components, maybe also point out the few that were already known (KiTT1-2 and 14). Otherwise the reader might be confused about the missing numbers.

      We will revise this in the abstract.

      • L37: is 'cryptic ortholog' an official term? Doesn't this just depend on the starting point of the homology search and the number of experimentally verified hits you have in certain parts of the tree? Just wondering.

      This is a valid question. Indeed, ‘cryptic’ refers to the starting point of our study (based on our previous analyses) and the process towards identifying them as being canonical. We chose this term because we feel it signifies to the reader that identifying these orthologs required approaches beyond conventional orthology searches.

      • For future submissions, it would be useful to have the figure numbers indicated on the figures, because now it was sometimes difficult to keep track.

      As mentioned above, we will add the figure numbers to the revision.

      • L51: mentioning the SAC might make it a bit too complicated for people not 100% familiar with all the complexes. Either leave it out until later, or have a short sentence explaining what the SAC is.

      We will leave out the spindle assembly checkpoint (SAC) in the beginning and will bring it up at a later point, also explaining its explicit function.

      • Figure 1A: the identity of the black 'nuclei' is not explained for the Ciliophora and Apicomplexa in the figure or figure legend.

      We apologise for the confusing black organelles in apicomplexans, these are actually the micronemes and apical complex, characteristic features of these parasites. We will change the color to that of the clade so that it is clear that only ciliates have two types of nuclei (nuclear dimorphism).

      • In Figure 1B, instead of saying 'absent', wouldn't it be more correct to say something like 'not found/detected/identified'?

      We agree and will replace ‘absent’ by ‘not detected’.

      • Figure 1C. During interphase, sometimes homologous chromosomes seem to cluster at the centromeres (5 foci - example on the left), but sometimes they don't (10 foci - example on the right). Is this something you observe a lot? Is it strain-dependent?

      We thank the reviewer for making a very good point. In principle we take the cells showing 5 foci to be interphase cells. We interpret the cells with 10 foci to be cells just prior to mitosis. So these would be G2 cells where the homologous chromosomes have been replicated and the sister pairs are still seen together here. However, if this would be the case one would expect to see 20 centromeres/kinetochores in metaphase and this is not always observed. To prevent confusion on this point, we will replace the right panel in 1C for one that contains 5 foci and will make it more clear that these foci indeed represent homologous chromosomes. In addition, we will make panels to clearly show the behaviour of chromosomes over the different stages of mitosis.

      • Figure 1C (and later in Fig. 5): centromeres don't seem to align during metaphase. Is this true, or are these examples showing late metaphase/early anaphase?

      Indeed, a true metaphase similar to classic textbook images does not seem to be present. In 3D reconstructions we do see that kinetochores sit close to the nuclear envelope forming a sphere on the outside of the spindle, but almost never exactly in the same plane. Whether this means we simply have not caught true metaphase state, or there is none (like for instance in apicomplexans, which also do not appear to have a spindle checkpoint), is unclear at this point. We will further review our images and will use consistent stages for these images, and will revise terminology on metaphase state if warranted.

      • Why was STU2 included in the kinetochore? Wouldn't it be better classified as a MAP as in Fig. 3A? I saw this is actually discussed in the discussion, but maybe this explanation should come earlier.

      We thank the reviewer for pointing this out and will add a short sentence about the MAP function of STU2, and kinetochore localization in other lineages in the introduction.

      • Figure 2A: 'strong similarity'. For a TM score of 0.4 and 0.54, I am not sure I would say 'strong similarity'. Visually, they also look different. TM is also not explained in the legend.

      What we meant to say with ‘strong similarity’ is that a domain is predicted with a matching set of secondary structure elements to the RWD domains in yeast Spc24/Spc25. As for the TM score, a score of ≥ 0.5 has been shown to be a robust metric for fold similarity significance , which is the case for the comparison of the putative T. thermophila Spc25 ortholog and the yeast ortholog. However, we acknowledge that the T. thermophila Spc24 ortholog shows additional beta sheets compared to its yeast counterpart and has a TM score below 0.5, and so we will tone down this statement and remove ‘strong similarity’. We nonetheless maintain that this protein is a Spc24 ortholog with derived properties in its RWD domain.

      Relevant reference on TM score interpretation:

      Xu & Yang 2010 Bioinformatics (https://pmc.ncbi.nlm.nih.gov/articles/PMC2913670/)

      • Fig. 2D: why not PC2? Please explain this somewhere.

      We thank the reviewer for this question. We shall add an elaborate explanation of the PC selection in the method section. In short, PC2 (together with PC1 or PC3) did not reveal any separate cluster/cloud of points surrounding the NDC80-C components (KiTT1-4). Since PC3 did reveal such a cluster, we opted to select PC3.

      • Fig. 3C-D: 'striking similarity', again, it is hard to evaluate whether this is true from the figures and TM values alone (all are >0.5). Either change the phrasing, or explain how much similarity one would expect between homologs.

      Please see our response to the previous question regarding the significance of a TM score of ≥ 0.5.

      • How certain are you that these are all diverged homologs? For example, for KNL1, could another RWD domain-containing protein have evolved to become a kinetochore protein?

      In most cases we consider multiple lines of evidence: AF2/3, HHsearch and overall protein topology, in the case of RWD KT proteins, a coiled-coil followed by a single or double RWD. In the case of SPC24, SPC25 and CSM1 we have clear best hits for both structure and sequence searches. For KNL1 (double RWD), we have a newer version of our eukaryote-wide ortholog alignment now usable for HHsearches, which reveals KiTT7 (KNL1) to be the best hit also. As such, the RWD domain proteins that we uncover are not merely some RWD, but are specifically those of the kinetochore that are found in other lineages. In addition, there are only very few double RWD proteins present amongst eukaryotes, which makes the proposed scenario of homolog replacement for KNL1 unlikely.

      • Fig. 5: why wasn't CNA1 used as a marker of the inner kinetochore or tested?

      The CNA1 antibody gave quite some background (see figure 1C), we therefore favored the use of the CENPC/KiTT8 antibody.

      • Fig. 8: There is a time axis below, but I'm not sure what is indicated on this axis. Are the events above mapped on this axis?

      We agree this axis may be confusing. The idea was to show a number of ancestral nodes relevant for the evolutionary events noted in this figure. We will add clear references in the figure to each of these ancestors.

      • L347-349: 'convergent evolution'. Is the loss of the CCAN convergent evolution, or was it already lost in the SAR common ancestor?

      This was indeed convergent evolution. Amongst Stramenopila most CCAN subunits can be detected (see for instance van Hooff et al. 2017). In addition, the alveolate ancestor already had the CCAN as we can clearly detect orthologs in Colponemida. We will add this piece of information to the presence/absence plot in either Figure 1 or in the supplemental (see comment above to Reviewer 1).

      Significance (Required)

      General Assessment: The study is robust, thorough, and well-written. The analyses are technically sound, and the authors avoid overstating their conclusions. Key strengths include the successful identification of diverged components using a "deep homology" pipeline and the functional validation of novel subunits. To improve the study, the data representation could be made more straightforward, and the manuscript structure could be condensed to better highlight the most convincing results. Finally, the claims on the speed of evolution of the kinetochore components need to be better supported.

      Advance: The study provides the first molecular map of a ciliate kinetochore. By uncovering "cryptic" orthologs that escaped previous detection, the work demonstrates that many "missing" complexes in diverse eukaryotes are likely present but highly diverged.

      Audience: This work will interest evolutionary cell biologists studying mitosis and kinetochores (especially those interested in eukaryotic diversity), as well as the ciliate research community. It also serves as a methodological roadmap for researchers using structural homology to identify divergent proteins in other non-model organisms.

      Expertise: My field of expertise includes evolutionary cell biology, kinetochores, centromeres, microbiology, microscopy and phylogenetics.


      Reviewer #3

      Evidence, reproducibility and clarity (Required)

      Kinetochores are protein complexes essential for chromosome segregation in all eukaryotes. Unexpectedly, despite their crucial function, many kinetochore components evolve rapidly, which can hinder their identification based solely on sequence comparisons. In this study, the authors combine experimental and computational analyses to provide insights into the composition of the kinetochore protein complex in the ciliate Tetrahymena thermophila. This study makes an important contribution because kinetochore components in Tetrahymena have not previously been investigated experimentally, and the composition of the Tetrahymena complex was largely unknown.

      Starting with previously identified orthologs of the outer kinetochore proteins Ndc80 and Nuf2, the authors computationally identified the two additional members of the Ndc80 complex, Spc24 and Spc25. All four components were subjected to BioID analyses, leading to the identification of 23 additional candidates, some of which are factors known to be associated with centromeric chromatin in other eukaryotes (condensin, etc.). Focusing on a subset of unknown components, the authors provide experimental support for their kinetochore participation using microscopy and confirm distant homology with several known kinetochore components in other eukaryotes. Four components referred to as KiTT10-13, however, lack detectable orthology to known kinetochore components.

      Relative localization analyses using super-resolution microscopy revealed that KiTT10, 11, and 13 are more proximal to the inner kinetochore component CENP-C, while KiTT12 localizes closer to outer kinetochore components. Remote homology and phylogenetic analyses identify divergent WD40 or SANT domains in KiTT10 and 11, as well as a kinesin motor domain for the outer-kinetochore proximal KiTT12. Finally, RNAi-mediated depletion of KiTT12 demonstrated its requirement for accurate chromosome segregation and Ndc80 localization.

      Overall, I think this manuscript is interesting and makes an important contribution to the field of kinetochore biology. The results of this study, particularly regarding the novel kinetochore components identified, will likely also spark follow-up studies. My major comment concerns the discussion and presentation of the data:

      Major Comments At times, the explanation of homology search appears very technical and would not be accessible to non-experts..

      We thank the reviewer for raising this point. Given that the homology detection approach is an important part of the message of our manuscript, we do think that it is warranted to keep some technicalities in the results section. However, we do agree that quite some detail could be quite easily transferred to a specific supplementary section about our homology detection approach. We will rewrite the results section to better suit non-experts.

      Moreover, the authors could include more details about their analysis of TurboID data to improve clarity.

      I was initially confused what does "not in original proteome" mean in the figure before understanding that two different proteome versions were used. I think it would be less confusing for the reader if the authors simply map their bioID data to the most recent version of the Tetrahymena proteome, which includes both Nnf1 and Csm1. Is it possible that this might also reveal the presence of other components in addition to the two that were specifically targeted?

      We agree that mapping to the most recent proteome annotation will eliminate confusion. We will remap all TurboID datasets to the updated proteome and report whether additional candidates are detected. We will revise the figure legends to clearly explain enrichment categories and annotation differences between proteome versions (or in a supplementary section). So far we have not detected any new proteins in a re-analysis of the MS data for components of the NDC80-C.

      The data presentation in Figure 2 is confusing and requires clarification of the analyses performed. The Figure legend for panel 2C is incomplete. For example, there is no mapping for the character "*" in the legend. The legend can be revised for better clarity. Also, more than 23 proteins are shown in the 2D inset; were those not enriched in the other BioID experiments? It would be helpful to include a legend for these hits as well.

      We will revise Figure 2 and its legend to:

      • Clearly define all symbols (including “*”).
      • Provide a complete legend for enriched hits.
      • Clarify PCA interpretation.
      • Explicitly state how many proteins are included and how they were categorized. I would be cautious about using the word comprehensive, as the identification depends on many aspects, including the completeness of the annotated proteome used to map the MassSpec spectra against. Even if their bioID experiments always converge on the same set of proteins, factors can still be missing due to annotation issues. In addition, certain components might be refractory for detection by MassSpec due to their amino acid composition. Other digestion methods, other than trypsin, could, however, identify those.

      We agree that this term overstates completeness. We will revise wording to reflect that our identification is extensive but dependent on proteome annotation and mass spectrometry detectability.

      Figure 4: I guess the result is somewhat expected given the previous inability to identify these components computationally. I guess the distribution of the non-tetrahymena components might be skewed towards lower sequence divergence, since they do not include orthologs that require experimental approaches for identification. If the authors agree, this could be added as a discussion.

      We agree with the reviewer that Figure 4 merely showcases why we could not detect these kinetochore orthologs in the first place. In our present analysis we did not include orthologs of species with previously shown ‘difficult-to-detect’ orthologs. We will add discussion acknowledging that detectable homologs in other species may be biased toward less divergent sequences and that experimental identification may reveal additional highly diverged components elsewhere.

      The telophase-specific localization of TTHERM_00932010 is interesting. Although the paper focuses on the structural composition of kinetochores, it would be useful if the authors included more details about this protein.

      We will expand the description of TTHERM_00932010 to provide additional contextual information regarding domain architecture, expression timing, and potential functional implications. Off note, for this protein we cannot detect any orthologs outside Tetrahymena spp.

      What is the function of kinesin-6, known roles with respect to chromosome segregation in other species?

      We already discuss the role of kinesin-6 in chromosome segregation in other species in the discussion section at L355-356 (bioRxiv v1). We will expand this section and add two more sentences on diverse functions of this family in eukaryotes.

      Perhaps MadBub localization is more apparent in the presence of unattached kinetochores? In that scenario, it would be useful if the authors knock down KiTT12 and test whether they can localize MadBub.

      We agree this is an interesting possibility. However, systematic spindle perturbation experiments fall outside the primary scope of this structural study. We will clarify this limitation and discuss it as a direction for future work.

      Minor comments It would be useful if the authors added either the expression of all genes or known constitutive genes as a background profile to Figure 2E, in order for the reader to be able to evaluate the G2/M specific increase in expression of bioID hits.

      The data has been taken from [Bertagna et al. 2025, Bioinformatics], and the expression profiles of all the other proteins are provided for inspection by the reviewer/reader (see Table S3). The data representation asked for by the reviewer can thus be found in Bertagna et al. 2025. To provide further overview, we will add a supplementary figure including expression profile for protein with peaks in each of the cell cycle phases, including an overview of those peaking in G2/M.

      What is TTHERM_0046753? One of the identified unknown hits? It is also not part of Figure 2E unless this is a typo and the correct identifier should be 00467535?

      The reviewer is correct that this is a typo on our end, for which we apologise. The correct identifier should be 00467535.

      Why are 29 expressions shown in 2E but only 27 mentioned in the text (23 bioID hits as well as the four Ndc80 complex components)? Or did the authors instead identify 25 specific bioID hits that were further classified into the different categories? A rewrite on this section would likely help the reader to better understand the analyses of the PCA data.

      We agree this section can do with some optimization. We will clarify the number of proteins included in PCA and expression analyses and revise the relevant section for clarity.

      Significance (Required)

      This study highlights the importance of non-model organisms, such as ciliates, in understanding the evolution of the chromosome segregation machinery. Studies on such organisms would shed light on the evolutionary aspects of kinetochore biology.

    1. When looking at who contributes in crowdsourcing systems, or with social media in generally, we almost always find that we can split the users into a small group of power users who do the majority of the contributions, and a very large group of lurkers who contribute little to nothing. For example, Nearly All of Wikipedia Is Written By Just 1 Percent of Its Editors, and on StackOverflow “A 2013 study has found that 75% of users only ask one question, 65% only answer one question, and only 8% of users answer more than 5 questions..” We see the same phenomenon on Twitter: Fig. 16.3 Summary of Twitter use by Pew Research Center# This small percentage of people doing most of the work in some areas is not a new phenomenon. In many aspects of our lives, some tasks have been done by a small group of people with specialization or resources. Their work is then shared with others. This goes back many thousands of years with activities such as collecting obsidian and making jewelry, to more modern activities like writing books, building cars, reporting on news, and making movies.

      When it comes to lurkers, I think the best way is not credit them if it's a project or something else that requires crediting the people who have worked on it. It may not be as effective, but I think, as of now, that is the best way to avoid lurkers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      An interesting manuscript from the Carrington lab is presented investigating the behavior of single vs double GPI-anchored nutrient receptors in bloodstream form (BSF) T. brucei. These include the transferrin receptor (TfR), the HpHb receptor (HpHbR), and the factor H receptor (FHR). The central question is why these critical proteins are not targeted by host-acquired immunity. It has generally been thought that they are sequestered in the flagellar pocket (FP), where they are subject to rapid endocytosis - any Ab:receptor complexes would be rapidly removed from the cell surface. This manuscript challenges that assumption by showing that these receptors can be found all over the outer cell body and flagella surfaces, if one looks in an appropriate manner (rapid direct fixation in culture media).

      The main part of the manuscript focuses on TfR, typically a GPI1 heterodimer of very similar E6 (GPI anchored) and E7 (truncated, no GPI) subunits. These are expressed coordinately from 15 telomeric expression sites (BES), of which only one can be transcribed at a time. The authors identify a native E6:E7 pair in BES7 in which E7 is not truncated and therefore forms a GPI2 heterodimer. By in situ genetic manipulation, they generate two different sets of GPI1:GPI2 TfR combinations expressed from two different BESs (BES1 and BES7). Comparative analyses of these receptors form the bulk of the data.

      The main findings are:

      (1) Both GPI1 and GPI2 TfR can be found on the cell body/flagellar surface.

      (2) Both are functional for Tf binding and uptake.

      (3) GPI2 TfR is expressed at ~1.5x relative to GPI1 TfR

      (4) Ultimate TfR expression level (protein) is dependent on the BES from which it is expressed.

      Most of these results are quite reasonably explained in light of the hydrodynamic flow model of the Engstler lab and the GPI valence model of the Bangs lab. Additional experiments, again by rapid fixation, with HpHbR and FHR, show that these GPI1 receptors can also be seen on the cell surface, in contrast to published localizations.

      It is quite interesting that the authors have identified a native GPI2 TfR. However, essentially all of the data with GPI2 TfR are confirmatory for the prior, more detailed studies of Tiengwe et al. (2017). That said, the suggestion that GPI2 was the ancestral state makes good evolutionary sense, and begs the question of why trypanosomes prefer GPI1 TfR in 14 of 15 ESs (i.e., what is the selection pressure?)

      Strengths and weaknesses:

      (1) BES7 TfR subunit genes (BES7_Tb427v10): There are actually three (in order 5'3'): E7gpi, E6.1 and E6.2. E6.1 and E6.2 have a single nucleotide difference. This raises the issue of coordinate expression. If overall levels of E6 (2 genes) are not down-regulated to match E7 (1 gene), this will result in a 2x excess of E6 subunits. The most likely fate of these is the formation of non-functional GPI2 homodimers on the cell surface, as shown in Tiengwe et al. (2017), which will contribute to the elevated TfR expression seen in BES7.

      We would like to thank the reviewer for pointing out that there are two ESAG6 genes in BES7, we had relied on the publicly available annotation and should have known better.

      For transferrin expression levels, see the discussion in response to reviewer 1 point 3 below

      (2) Surface binding studies: This is the most puzzling aspect of the entire manuscript. That surface GPI2 TfR should be functional for Tf binding and uptake is not surprising, as this has already been shown by Tiengwe et al. (2017), but the methodology for this assay raises important questions. First, labeled Tf is added at 500 nM to live cells in complete media containing 2.5 uM unlabeled Tf - a 5x excess. It is difficult to see how significant binding of labeled TfR could occur in as little as 15 seconds under these conditions.

      The k<sub>on</sub> for transferrin is very rapid (BES1 TfR / bovine transferrin at pH7.4 = 4.5 x 10<sup>5</sup> M<sup>-1</sup>s<sup>-1</sup> (Trevor et al., 2019) and binding would occur to unoccupied receptors within 15 sec. The k<sub>off</sub> is also fast (BES1 TfR / bovine transferrin at pH7.4 = 3.6 x 10<sup>-2</sup> s<sup>-1</sup> (Trevor et al., 2019) and there would be exchange of transferrin within the time taken for endocytosis. These values are in vitro with purified proteins, the in vivo values may be affected by the VSG coat.

      The failure to bind canine transferrin (Supp. Figure 4B) acts as a control for specificity of the interaction.

      We have now performed a competition experiment as an additional control; cells in culture were supplemented with: A, 0.5 µM labelled transferrin; B, 0.5 µM labelled and 2.5 µM unlabelled transferrin; C, 0.5 µM labelled and 5 µM unlabelled transferrin, fixed after 60 s and visualised by fluorescence microscopy (Figure S4C). There was effective competition and greatly reduced binding of transferrin was seen in the presence of a 10-fold excess of unlabelled. We would like to thank the reviewer for suggesting this experiment.

      Second, Tiengwe et al. (2017) found that trypanosomes taken directly from culture could not bind labeled Tf in direct surface labelling experiments. To achieve binding, it was necessary to first culture cells in serum-free media for a sufficient time to allow new unligated TfR to be synthesized and transported to the surface. This result suggests that essentially all surface TfR is normally ligated and unavailable to the added probe.

      As part of the preliminary experiments for this paper we found that centrifugation followed by resuspension in either complete or serum free (but 1% BSA) medium resulted in a reduction is total cellular TfR and determined by western blotting. We have now included this experiment (Figure S4D). The inference from this experiment is that centrifugation and subsequently incubation will have an effect on receptor detection and endocytosis rates for a discreet time period.

      The amount of binding of labelled transferrin to cells in culture will depend on the specific activity of the labelled transferrin. This reasoning was behind the use of 0.5 µM labelled transferrin when roughly 1 in 6 molecules in the culture medium are labelled and there was only a small effect on the overall concentration of transferrin.

      Third, the authors have themselves argued previously, based on binding affinities, that all surface-exposed TfR is likely ligated in a natural setting (DOI:10.1002/bies.202400053). Could the observed binding actually be non-specific due to the high levels of fixative used?

      The absence of binding/uptake of canine transferrin argues against a non-specific interaction. In our previous publication, we did not pay enough attention to the on and off rates which allow for a degree of exchange and, here, TfR newly appearing on the cell surface has a 1 in 6 chance of binding a labelled transferrin.

      (3) Variable TfR expression in different BESs: It appears that native TfR is expressed at higher levels from BES7 compared to BES1, and even more so when compared to BES3. This raises the possibility that the anti-TfR used in these experiments has differential reactivity with the three sets of TfRs. The authors discount this possibility due to the overall high sequence similarities of E6s and E7s from the various ESs. However, their own analyses show that the BES1, BES3, and BES7 TfRs are relatively distal to each other in the phylogenetic trees, and this Reviewer strongly suspects that the apparent difference in expression is due to differential reactivity with the anti-TfR used in this work. In the grand scheme, this is a minor issue that does not impact the other major conclusions concerning TfR localization and function, nor the behavior of HpHbR and FHR. However, the authors make very strong conclusions about the role of BESs in TfR expression levels, even claiming that it is the 'dominant determinant' (line 189).

      This point is valid but exceptionally difficult to address at the protein level. As an orthogonal approach, we performed RNAseq analysis of the ‘wild type’ BES1, BES3, and BES7 cell lines to determine whether differences in receptor mRNA levels were consistent with the proposed difference in protein levels (Table S1). The analysis showed total ESAG6/7 mRNA levels to vary in a similar manner to the protein estimates with BES3 < BES1 < BES7 providing support for the differences in protein levels.

      The strongest evidence for the expression site determining the TfR level is the comparison of the cell lines in which the VSG were exchanged. This had no effect on TfR levels and so there is no evidence that the identity of the VSG alters TfR expression.

      (4) Surface immuno-localization of receptors: These experiments are compelling and useful to the field. To explain the difference with essentially all prior studies, the authors suggest that typical fixation procedures allow for clearance of receptor:ligand complexes by hydrodynamic flow due to extended manipulation prior to fixation (washing steps). Despite the fact that these protocols typically involve ice-cold physiological buffers that minimize membrane mobility, this is a reasonable possibility. Have the authors challenged their hypothesis by testing more typical protocols themselves? Other contributing factors that could play a role are the use of deconvolution, which tends to minimize weak signals, and also the fact that investigators tend to discount weak surface signals as background relative to stronger internal signals.

      We have added preliminary experiments that compared fixation protocols in two parts. First the effect on TfR levels of washing and resuspending cells discussed above (Figure S4D), and second how different fixation protocols alter apparent TfR immunolocalisation (Supp Figure S5A-B). The comparison shows that both the absence of glutaraldeyde and the use of washing alters the outcome.

      (5) Shedding: A central aspect of the GPI valence model (Schwartz et al., 2005, Tiengwe et al., 2017) is that GPI1 reporters that reach the cell body surface are shed into the media because a single dimyristoylglycerol-containing GPI anchor does not stably associate with biological membranes. As the authors point out, this is a major factor contributing to higher steady-state levels of cell-associated GPI2 TfR relative to GPI1 TfR. Those studies also found that the size/complexity of the attached protein correlated inversely with shedding, suggesting exit from the flagellar pocket as a restricting factor in cell body surface localization. The amount of newly synthesized TfR shed into the media was ~5%, indicating that very little actually exits the FP to the outer surface. In this regard, is it possible to know the overall ratio of cell surface:FP:endosomal localized receptors? Could these data not be 'harvested' from the 3D structural illumination imaging?

      A ratio could be determined but we did not do this as it would only be valid if the antibody has equal access to the internal TfR in a diluted VSG environment and the external VSG embedded in a densely packed and cross-linked VSG layer As such, we would have no confidence in the accuracy of any estimate.

      Reviewer #2 (Public review):

      The work has significant implications for understanding immune evasion and nutrient uptake mechanisms in trypanosomes.

      While the experimental rigor is commendable, revisions are needed to clarify methodological limitations and to broaden the discussion of functional consequences.

      The authors argue that prior studies missed surface-localized TfR due to harsh washing/fixation (e.g., methanol). While this is plausible, additional evidence would strengthen the claim.

      Preliminary experiments that compared fixation protocols are now included to show that method affects outcome.

      It remains unclear how centrifugation steps of various lengths (as in previous publications) can equally and quantitatively redistribute TfR into the flagellar pocket. If this were the case, it should be straightforward for the authors to test this experimentally.

      Not aware of previous studies that demonstrate equal and quantitative redistribution to the flagellar pocket. In previous reports, there is variation in cell surface/flagellar pocket localisation depending on expression levels, for example (Mussmann et al., 2003) (Mussmann et al., 2004), it’s worth noting that the increase in TfR expression in these papers is similar to the difference in the cell lines used here. In addition, most report the presence of TfR in endosomal compartments. In the experiments here, there are cells where the majority of signal from labelled transferrin is present in the flagellar pocket and the argument is that this is a stage of a continuous process in which the receptor picks up a transferrin on the cell surface and is swept towards the pocket.

      If TfR is distributed over the cell surface, live-cell imaging with fluorescent transferrin should be performed as a control. Modern detection limits now reach the singlemolecule level, and transient immobilization of live trypanosomes has been established, which would exclude hydrodynamic surface clearance as a confounding factor.

      This is non-trivial and is a longer-term aim. The immobilisation involves significant manipulation of the cells prior to restraining.

      In most images, TfR is not evenly distributed on the surface but rather appears punctate. Could this reflect localization to membrane domains? Immuno-EM with high-pressure frozen parasites could resolve this question and is relatively straightforward.

      There is a non-uniform appearance in the super-resolution images for both TfR and FHR. We cannot distinguish whether this represents random variation in receptor density over the cell surface or results from a biological phenomenon. Whatever the cause, the experiments showed unambiguous cell surface localisation.

      The authors might consider discussing whether differences in parasite life cycle stages (procyclic versus bloodstream forms) or culture conditions (e.g., cell density) affect localization. The developmentally regulated retention of GPI-anchored procyclin in the flagellar pocket might be worth mentioning.

      The aim of this paper was to determine the localisation of receptors in proliferating bloodstream form trypanosomes in culture. TfR and HpHbR are not expressed in insect stages in culture. FHR is expressed in insect stages and is present all over the cell surface (Macleod et al., 2020). A procyclin-based reporter was distributed over the whole cell surface in one report (Schwartz et al. 2005). In other reports, the retention of procyclin in the flagellar pocket of proliferating bloodstream forms is probably dependent on structure/sequence as other single GPI-anchored proteins, such as FHR (Macleod et al., 2020) and GPI-anchored sfGFP (Martos-Esteban et al., 2022) can access the surface.

      References:

      MacGregor, P., Gonzalez-Munoz, A. L., Jobe, F., Taylor, M. C., Rust, S., Sandercock, A. M., Macleod, O. J. S., Van Bocxlaer, K., Francisco, A. F., D’Hooge, F., Tiberghien, A., Barry, C. S., Howard, P., Higgins, M. K., Vaughan, T. J., Minter, R., & Carrington, M. (2019). A single dose of antibody-drug conjugate cures a stage 1 model of African trypanosomiasis. PLoS Neglected Tropical Diseases, 13(5), e0007373. https://doi.org/10.1371/journal.pntd.0007373

      Macleod, O. J. S., Bart, J.-M., MacGregor, P., Peacock, L., Savill, N. J., Hester, S., Ravel, S., Sunter, J. D., Trevor, C., Rust, S., Vaughan, T. J., Minter, R., Mohammed, S., Gibson, W., Taylor, M. C., Higgins, M. K., & Carrington, M. (2020). A receptor for the complement regulator factor H increases transmission of trypanosomes to tsetse flies. Nature Communications, 11(1), 1326. https://doi.org/10.1038/s41467-020-15125-y

      Martos-Esteban, A., Macleod, O. J. S., Maudlin, I., Kalogeropoulos, K., Jürgensen, J. A., Carrington, M., & Laustsen, A. H. (2022). Black-necked spitting cobra (Naja nigricollis) phospholipases A2 may cause Trypanosoma brucei death by blocking endocytosis through the flagellar pocket. Scientific Reports, 12(1), 6394. https://doi.org/10.1038/s41598-02210091-5

      Mussmann, R., Engstler, M., Gerrits, H., Kieft, R., Toaldo, C. B., Onderwater, J., Koerten, H., van Luenen, H. G. A. M., & Borst, P. (2004). Factors affecting the level and localization of the transferrin receptor in Trypanosoma brucei. The Journal of Biological Chemistry, 279(39), 40690–40698. https://doi.org/10.1074/jbc.M404697200

      Mussmann, R., Janssen, H., Calafat, J., Engstler, M., Ansorge, I., Clayton, C., & Borst, P. (2003). The expression level determines the surface distribution of the transferrin receptor in Trypanosoma brucei. Molecular Microbiology, 47(1), 23–35. https://doi.org/10.1046/j.13652958.2003.03245.x

      Schwartz, K. J., Peck, R. F., Tazeh, N. N., & Bangs, J. D. (2005). GPI valence and the fate of secretory membrane proteins in African trypanosomes. Journal of Cell Science, 118(Pt 23), 5499–5511. https://doi.org/10.1242/jcs.02667

      Trevor, C. E., Gonzalez-Munoz, A. L., Macleod, O. J. S., Woodcock, P. G., Rust, S., Vaughan, T. J., Garman, E. F., Minter, R., Carrington, M., & Higgins, M. K. (2019). Structure of the trypanosome transferrin receptor reveals mechanisms of ligand recognition and immune evasion. Nature Microbiology, 4(12), 2074–2081. https://doi.org/10.1038/s41564-019-0589-0

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Recommendations:

      (1) 2 E6 gene in BES7s: This does not affect the overall conclusions, but the text should be modified to reflect the existence of the second gene, and to discuss the ramifications.

      This has been corrected

      (2) Surface binding studies: To clarify this issue, two experimental approaches are strongly recommended. First: additional excess unlabelled Tf should be added. If binding is truly receptor-mediated, it must by definition be saturable at some experimentally achievable level. Second: TfR expression should be abrogated by RNAi silencing to show that binding is TfR-dependent. Without some validation of specific binding by one or both of these approaches, these counter-intuitive results must be questioned.

      The excess unlabelled transferrin experiment is now included (we would like to thank the reviewer for this suggestion). The absence of binding of canine transferrin provides strong evidence for the specificity.

      (3) Variable TfR expression in different BESs: To make such claims, quantitative RTPCR should be performed with conserved primers to assess the actual relative expression at the transcriptional level. Absent this, the claims should be eliminated, or at the very least greatly tempered.

      This has been done using an RNAseq analysis.

      (4) Surface immuno-localization of receptors: An example of discounting weak signals as background can be seen in Figure 8 of Duncan et al. (2024). It has also been shown that at least one other GPI1 reporter (procyclin) is readily detected on the outer cell surface under ectopic expression in BSF trypanosomes (Schwartz et al., 2005) using typical fixation procedures. This could be cited, and the authors could discuss the fact that procyclin is not a receptor and may not be susceptible to hydrodynamic drag.

      Yes

      Minor issues:

      (1) Fully appreciating the data presented requires an understanding of the hydrodynamic flow and GPI valence models of the Engstler and Bangs labs, respectively. For the uninitiated,d it might perhaps be useful to include brief summaries of each in the Introduction.

      Added to the introduction

      (2) Lines 110-112: ISG65 and ISG75 both have strong localizations in endosomal compartments. This should be noted with citation of any of the work from the Field lab.

      Added

      (3) Lines 121-132: This passage presents the role of GPI anchors (1 vs 2) in a rather digital manner (in or out). Schwartz et al (2005) present a much more nuanced view of what is likely taking place. This is one reason summaries of hydrodynamic flow and GPI valence would be helpful.

      Modified

      (4) Lines 182-184: The increased size of GPI-anchored E7 is in part due to the presence of the GPI itself, as the authors state, but there are also 24 additional amino acid residues in this protein that contribute.

      Modified

      (5) Lines 212-214: Do p>0.95 and p>0.99 indicate statistical significance? This must be a typo.

      Thank you, corrected

      (6) Lines 218-219: The better references documenting GPI number in regard to turnover/shedding are Schwartz et al. 2005 and Tiengwe et al. 2017.

      Changed

      (7) Line 241 and Figures 3, 4, and 6: The transverse sections add little to the presentation. That there is signal variation in all dimensions is readily apparent from the images themselves, and similar profiles would be obtained regardless of the transect. Was there some process/rationale in the selection of the individual transects intended to make a broader point? If so, a description of the process should be provided.

      The point was to show that the signal had a pattern consistent with plasma membrane (two distal peaks) as opposed to cytoplasm (single central peak). As such, we think it is important.

      (8) Lines 582-596: Methodology for quantitation of cellular fluorescent signals should be provided.

      Has been expanded

      Reviewer #2 (Recommendations for the authors):

      (1) As a less critical but still useful control, antibody accessibility assays on live versus fixed parasites could test whether VSG coats limit detection.

      This could only be quantified by using a range of monoclonal antibodies which are not available.

      (2) The rapid transferrin uptake (15-60 seconds) could reflect fast endocytic recycling rather than stable surface residency. A pulse-chase experiment tracking receptor movement would clarify this (though I acknowledge that this is technically challenging).

      We agree that endocytic recycling is probably the main source of unoccupied TfR on the cell surface. It is hard to see how the pulse chase experiment could be performed without centrifugation which will affect the outcome – see above.

      (3) Statistical and quantitative reporting

      Added as Table S2- S4

      (4) Report confidence intervals (e.g., for fluorescence intensity comparisons in Figure 3B) to contextualize claims of "no significant difference."

      We do not claim ‘no significant difference’ and the SD overlap due to a high level of variation in the population

      (5) Specify the number of biological replicates and cells analyzed per condition in the figure legends.

      Added

      (6) The study notes that surface-exposed receptors avoid antibody detection, but does not explore how.

      We don’t claim that receptors avoid detection and have published evidence to the contrary. The cell has evolved mechanisms to reduce/minimise the effect of antibody binding.

      (7) Comparing antibody binding to TfR in VSG221 versus VSG224 coats.

      This is already present in Figure 3D

      (8) Testing whether receptor shedding or conformational masking contributes to immune evasion.

      A lifetime’s work

      (9) Evolutionary trade-offs: Discuss why T. brucei maintains ~15 TfR variants if the GPI-anchor number has minimal impact on function (Figure 3).

      The possible reason for the evolution of ~15 TfR variants was discussed in a previous publication.

      (10) How do their findings align with recent studies on ISG75 surface exposure?

      If this refers to the finding that ISG75 is an Ig Fc receptor, this has been included

      (11) Add scale bars to 3D reconstructions (Figure 5).

      Added

      (12) Include a schematic summarizing key findings in the main text.

      Chosen not to do

      (13) Explicitly state where raw microscopy images, flow cytometry data, and analysis scripts are deposited.

      Microscope Images have deposited in Bioimage Archive repository at EMBL/EBI No flow cytometry used

      (14) Correct inconsistent GPI-anchor terminology (e.g., "glycosylphosphoinositol" to "glycosylphosphatidylinositol").

      Our typo, corrected

      (15) Clarify ambiguous phrases (e.g., "subtle mechanisms" in the Discussion).

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely appreciate your constructive feedback. Based on the comments from the three reviewers, we were able to substantially improve the manuscript. Below, we provide our point-by-point responses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examined the functional organization of the mouse posterior parietal cortex (PPC) using meso-scale two-photon calcium imaging during visually-guided and history-guided tasks. The researchers found distinct functional modules within the medial PPC: area A, which integrates somatosensory and choice information, and area AM, which integrates visual and choice information. Area A also showed a robust representation of choice history and posture. The study further revealed distinct patterns of inter-area correlations for A and AM, suggesting different roles in cortical communication. These findings shed light on the functional architecture of the mouse PPC and its involvement in various sensorimotor and cognitive functions.

      Strengths:

      Overall, I find this manuscript excellent. It is very clearly written and built up logically. The subject is important, and the data supports the conclusions without overstating implications. Where the manuscript shines the most is the exceptionally thorough analysis of the data. The authors set a high bar for identifying the boundaries of the PPC subareas, where they combine both somatosensory and visual intrinsic imaging. There are many things to compliment the authors on, but one thing that should be applauded in particular is the analysis of the body movements of the mice in the tube. Anyone working with head-fixed mice knows that mice don't sit still but that almost invariable remains unanalyzed. Here the authors show that this indeed explained some of the variance in the data.

      Weaknesses:

      I see no major weaknesses and I only have minor comments.

      Reviewer #2 (Public review):

      Summary:

      The posterior parietal cortex (PPC) has been identified as an integrator of multiple sensory streams and guides decision-making. Hira et al observe that dissection of the functional specialization of PPC subregions requires simultaneous measurement of neuronal activity throughout these areas. To this end, they use wide-field calcium imaging to capture the activity of thousands of neurons across the PPC and surrounding areas. They begin by delineating the boundaries between the primary sensory and higher visual areas using intrinsic imaging and validate their mapping using calcium imaging. They then conduct imaging during a visually guided task to identify neurons that respond selectively to visual stimuli or choices. They find that vision and choice neurons intermingle primarily in the anterior medial (AM) area, and that AM uniquely encodes information regarding both the visual stimulus and the previous choice, positioning AM as the main site of integration of behavioral and visual information for this task.

      Strengths:

      There is an enormous amount of data and results reveal very interesting relationships between stimulus and choice coding across areas and how network dynamics relate to task coding.

      Weaknesses:

      The enormity of the data and the complexity of the analysis make the manuscript hard to follow. Sometimes it reads like a laundry list of results as opposed to a cohesive story.

      Reviewer #3 (Public review):

      Summary: This work from Hira et al leverages mesoscopic 2-photon imaging to study large neural populations in different higher visual areas, in particular areas A and AM of the parietal cortex. The focus of the study is to obtain a better understanding of the representation of different task-related parameters, such as choice formation and short-term history, as well as visual responses in large neural populations across different cortical regions to obtain a better understanding of the functional specialization of neural populations in each region as well as the interaction of neural populations across regions. The authors image a large number of neurons in animals that either perform visual discrimination or a history-dependent task to test how task demands affect neural responses and population dynamics. Furthermore, by including a behavioral perturbation of animal posture they aim to dissociate the neural representation of history signals from body posture. Lastly, they relate their functional findings to anatomical data from the Allen connectivity atlas and show a strong relation between functional correlations on anatomical connectivity patterns.

      Strengths:

      Overall, the study is very well done and tackles a problem that should be of high interest to the field by aiming to obtain a better understanding of the function and spatial structure of different regions in the parietal cortex. The experimental approach and analyses are sound and of high quality and the main conclusions are well supported by the results. Aside from the detailed analyses, a particular strength is the additional experimental perturbation of posture to isolate history-related activity which supports the conclusion that both posture and history signals are represented in different neurons within the same region. Weaknesses: The main point that I found hard to understand was the fairly strong language on functional clusters of neurons while also stating that neurons encoded combinations of different types of information and leveraging the encoding model to dissociate these contributions. Do the authors find mixed selectivity or rather functional segregation of neural tuning in their data? More details on this and some other points are below.

      We thank the three reviewers for their accurate and expert evaluations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It wasn't clear to me why the authors focused on areas A and AM, but not RL. After all, at the beginning of the results, the authors ask: "PPC has been reported to have functions including visually guided decision-making and working memory. Do these functions differ among RL, A, and AM?".

      Thank you for the comment. The manuscript first characterizes AM as a region involved in visually guided decision-making and A as a region related to history and/or working memory. Subsequently, when discussing correlation structure, we stated the following:

      “In particular, based on the critical functional differences between A and AM that we found, A and AM may belong to distinct cortical networks that consist of different sets of densely interacting cortical areas.”

      Thus, the logical flow of our analysis is to first reveal the functional contrast between A and AM through comparative functional analyses across RL, A, and AM, and then to focus on this contrast. We speculate that RL may exhibit more distinctive functional properties in tasks that rely on whisker-based processing or related modalities. We have therefore revised the text as described below to avoid the impression that the manuscript places disproportionate emphasis on RL.

      Line 137: “PPC has been reported to have functions including visually guided decisionmaking and working memory. Do these functions differ among A, AM, and RL?”

      (2) Figures 2 E, F, and Figure 3A, could the authors indicate the trial structure better on these plots?

      Thank you for the comment. We have added explanations of the bar meanings to the figure legends.

      Figure 2:

      “(E) Representative vision neurons (ROI 1-4 in I). The red bars indicate sampling periods during video presentation, and the brown bars indicate sampling periods without video stimulation. Vertical black lines mark the onset of the sampling period. F. Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9). Light blue lines indicate the response periods in trials with left choices, and purple lines indicate the response periods in trials with right choices. Vertical black lines mark the onset of the response period.”

      Figure 3:

      “(A) The representative history neurons. Numbers correspond to that of panel B and C. Light blue lines indicate rewards delivered from the left lick port, and purple lines indicate rewards delivered from the right lick port. Vertical white lines mark the onset of the sampling period.”

      (3) There are several typos that need correcting. Also, small and big capital letters to demark the panel names in the legends have been mixed.

      Thank you for the comment. We have corrected the panel labels as described below.

      Figure 2 legend:

      “Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9)”

      Figure 3 legend:

      “..than the next choice. I. The decoding accuracy of the next choice …”

      Figure 3 legend:

      “Error bars, mean ± s.e.m. in I, 95% confidence interval in G. M, and O.”

      Supplementary Figure 6:

      “…neurons with rt ≥ 0.3 (blue) were shown. B. Trial-to-trial activity fluctuation … (rt ≥ 0.3, panel B) was color coded…”

      We thoroughly checked the manuscript for typographical errors and corrected the issues.

      (4) Many in the field still use the Paxinos nomenclature for PPC subfields, could the authors write something short about how these two nomenclatures correspond?

      We have described the relationship between our area definitions and those of Paxinos in the main text as follows.

      Line 702: “In addition to our definition, previous studies have also defined posterior parietal cortex (PPC) to include the higher visual areas A, AM, and RL (Glickfeld and Olsen, 2017; Wang et al., 2011). These areas partially overlap with the parietal association regions defined in the Paxinos atlas, including MPtA, LPtA, PtPD, and PtPR. For a detailed discussion of the correspondence and variability among these regional definitions, see Lyamzin and Benucci (2019).”

      (5) Analyzing choice history may be affected by the long fluorescence Ca transients and will depend on excellent event deconvolution. Could the authors show some more zoomed-in examples of how well their deconvolution works?

      We provide enlarged, trial-by-trial activity traces of the four example neurons shown in Figure 3A in Supplementary Figure 3G. In all neurons, multiple small calcium transients occur repeatedly throughout the delay period, which lasts longer than 10 s. If the sustained activity during the delay were simply due to a long decay time constant, one would expect a large calcium transient in the preceding trial that slowly decays over the delay period. However, such a pattern is not observed in the actual data. Also, since the decay time constant of GCaMP6s is on the order of ~1 s, signals persisting for ~10 s cannot be explained by slow decay alone.

      (6) The authors write: "the history neurons exhibited properties of working memory." However, note that this is not a working memory task since the mice don't need to keep evidence in memory, the direction to lick can be made at the very beginning of a trial.

      Behaviorally, demonstrating that an animal maintains working memory requires showing that its behavior changes based on retained information when new information is introduced, as in delayed match-to-sample tasks. In the present task, however, the correct action for the next trial is determined at the moment the action in the previous trial is completed, such that animals can simply switch to motor preparation at that point. Thus, from a strictly behavioral perspective, working memory is not required.

      On the other hand, during the inter-trial interval (ITI), information from the previous trial dominates over information from the upcoming trial (Fig. 3H), which is more consistent with retention of past information than with motor preparation. Moreover, trials in which neural activity maintained information about the previous trial’s action were associated with a higher probability of correct performance in the subsequent trial. In other words, retaining past information contributes to guiding correct behavior in the next trial.

      Based on these neural analyses, we interpret that mice retain information about their previous trial’s action history in working memory and use it to determine behavior in the subsequent trial. Accordingly, we consider ITI activity in PPC to reflect working memory rather than motor preparation. Nevertheless, we acknowledge that your concern is valid, and we have therefore revised the text as follows:

      Line 234: “These results suggest that the history neurons exhibited properties of working memory.”

      (7) In the section about the Choice History Task, the authors write: "Since the visual stimuli were randomly presented during the sampling period, the mice had to ignore the visual stimuli." Why continue to present the visual stimuli?

      Thank you for the suggestion. By designing the vision task and the history task to have identical structures, we can apply the same encoding and decoding models to both tasks, which facilitates direct comparison between them. This design makes it easier to examine how neuronal activity patterns change depending on task demands.

      Reviewer #2 (Recommendations for the authors):

      (1) I don't understand the logic of Figure S7 and the neuropil analysis in general. Neuropil activity is purported to represent input, so it seems unsurprising that nearby neurons would exhibit similar dynamics.

      Thank you for your comment. Your argument is correct, and it is not at all surprising that neuropil signals correlate with the activity of surrounding neurons. Here, we quantitatively examined the relationship between neuropil activity and the average activity of nearby neurons. In addition, in a separate analysis, we clarified the relationship between connectome information and neuropil activity. Taken together, these analyses reveal the relationship between connectome information and the local average of neuronal activity. We describe this point as follows:

      “Indeed, the trial-to-trial variation of a neuropil activity could be approximated by the average of 1,000–10,000 neurons within several hundred micrometers from the center (Figure S7).”

      Although we analyzed this phenomenon in the cases of areas A and AM, this finding should not be considered specific to A and AM but instead has broader, general significance. Accordingly, we added a new Results subsection and revised the manuscript as follows.

      Line 448: “Constraints and limits of anatomical connectivity on neuronal population activity Although we have so far focused on the differences between A and AM, our data provide broader insights into the relationship between anatomical connectivity and neuronal population activity. First, based on Figure S7 and the considerations above, anatomical input correlations strongly constrain the correlations between local averages of activity across thousands of neurons. We then asked whether this anatomical constraint extends beyond mean activity, and how anatomical input correlations relate to relationships between neuronal population activities (population vectors).

      The correlation between CC<sub>t</sub> and r<sub>anatomy</sub> was moderate (r = 0.60, Figure 6L). This moderate correlation did not change when the coupling neurons were eliminated (r = 0.61). Interestingly, the largest canonical component was the most unpredictable from the anatomical data (Figure 6M). Thus, while inter-area correlations based on the mean activity of neuronal populations are largely determined by anatomical input correlations, correlations between population vectors contain additional structure that cannot be captured by anatomical input correlations alone.

      One possible source of this additional structure is globally shared activity, which may reflect behavior, brain state, or levels of neuromodulators. To evaluate the contribution of global activity on the canonical correlation between areas, we first compared the canonical coefficient vectors (CCV). We found that the first CCV had a similar orientation, regardless of the paired areas (Figure6N). This indicates that the largest components of correlated activity in the CCA analysis are globally shared fluctuations. We also directly evaluated the correlated activity components across all 8 areas with generalized canonical correlation analysis. The first CCV also had a similar orientation to the first generalized canonical coefficient vector (GCCV) (Figure 6O). These results indicate that the largest canonical component reflects a global correlation across all cortical areas imaged. Such global correlations may be driven by factors beyond cortico-cortical or thalamo-cortical inputs, such as the animal’s behavioral state as we recently characterized (H. Imamura et al., 2025; F. Imamura et al., 2025). We also confirmed the robustness of these results by repeating analyses using only the 40% highly active neurons after denoising with non-negative deconvolution (36828 out of 91397 neurons; Figure S9).”

      (2) Furthermore, the neuropil signal likely contains signals from out-of-focus neurons that are presumably functioning similarly to the in-focus cells. Wouldn't the interesting question be to what extent the local neuropil signal in, for example, area A resembled that of neuronal activity in S1t?

      Thank you very much for your comment. We agree with your point. Based on the evaluation in Figure S7, the neuropil signal likely contains the average activity of several thousand local neurons, including out-of-focus contributions. The neuropil signal in area A may also partially reflect neuronal activity from the neighboring S1t area. In particular, neurons that show little correlation with the local population average (i.e., the neuropil signal) within the same area are sometimes referred to as “soloists” (M. Okun et al., 2015). If such soloist neurons were found to exhibit strong correlations with the neuropil signal of an adjacent area, this would be a highly interesting result. However, such an analysis would go beyond the scope of the present manuscript and would require a new line of discussion; therefore, we plan to address this issue in future work.

      (3) I generally found the final Results section (Relationship between mesoscale functional correlation and anatomical connections) to be hard to follow. The motivation for this analysis should be better explained.

      We fully incorporated your suggestion and rewrote the final section of the Results accordingly. Please refer to our responses to the two comments above.

      (4) The question of brain state/neuromodulation as a driver of the globally shared activity may be addressable by considering its correlation with pupillometry data.

      We fully agree with your suggestion. In our experiments, visual stimuli change continuously, and thus pupil diameter changes are most likely driven primarily by changes in visual input. Although state-dependent fluctuations of brain activity may also be present, they are likely masked by the larger effects induced by visual stimulation. Therefore, analyzing pupil-linked signals as a factor of globally shared activity would be more appropriately addressed in experiments without visual stimulation. We plan to investigate this issue in future studies. Here, we have added the following description regarding pupil dynamics and their associated relationships.

      Line 292: “We found that the neurons related to the tail and forepaws were similarly distributed around the parietal cortex including S1 and A, while the pupil-size related neurons were mapped around visual areas (Figure 4C). Changes in pupil diameter may influence neuronal activity through multiple mechanisms, including behavioral state or noradrenergic level [REF], nonlinear interactions with visual stimulation, and changes in the amount of light reaching the retina.”

      Minor issues

      (1) The authors deploy sophisticated mathematical techniques with essentially no explanation outside the Methods section. A brief introduction of jPCA and CCA in the main text would help the reader understand the value of these analyses.

      Thank you for the comment. We added the following explanation.

      Line 238: “In this task, left and right selection are alternated, so the activity of the history neuron is a sequence that repeats in two consecutive trials. We used jPCA<sup>49</sup> to visualize and quantify this activity pattern (Figure 3K). jPCA identifies low-dimensional projections of population activity that maximize rotational dynamics across time.”

      Line 374: “Next, to investigate r<sub>t</sub> of the population activity (r<sub>t_population</sub>), we first reduced the dimension of population activity in each area into 10 by using PCA (principal component analysis) (Figure S6B,C). Then, “fluctuation activity” was recalculated for each dimension and trial type, analogous to the single-neuron analysis described above, but here representing noise in population-level activation patterns. We applied CCA (canonical correlation analysis) to each pair of areas and obtained an average of 10 canonical correlations (CC<sub>t</sub>) as r<sub>t_population</sub>. CCA identifies pairs of linear combinations of population activity from two areas that maximize their correlation across trials, thereby capturing shared population-level fluctuations. The CC<sub>t</sub> structure between areas was similar across task types (Figure 5H) indicating that this structure reflects the underlying functional connectivity independent of the task. The CC<sub>t</sub> between A and S1t was the largest among all the pairs (Figure 5H), whereas when the CC<sub>t</sub> was averaged across all connections for each area, A and AM had the largest and second largest C<sub>t</sub>, respectively (Figure 5I). The dominance in CC<sub>t</sub> in A and AM disappeared when the neurons with r<sub>t_single</sub> >0.3 were removed. Notably, the CC<sub>t</sub> of AM and the other areas was uniform regardless of the paired areas across all 10 canonical components (Figure 5J). Thus, area AM is an integration hub of interareal communication, whereas A simply coupled with S1t, and such correlation structure at the population level critically depends on this subset of neurons.”

      (2) The manuscript contains numerous typos ("hoice"), spelling errors ("parameters", "costom"), abbreviations that are not defined (ex: RL/rostrolateral), and minor grammatical issues that should be addressed by a round of copy editing.

      We thank the reviewer for pointing this out. We have thoroughly corrected these typographical and grammatical errors, and have described the revisions in detail in our response to Reviewer 1, comment (3). In addition, we have clarified the abbreviations in the manuscript as follows.

      Line 94: “rostrolateral area (RL)”

      Figure 1 legend: “Abbreviations: RL, rostrolateral HVA; PM, posteromedial HVA; RSC, retrosplenial cortex.“

      (3) Figure 3K unlabeled axes.

      Thank you for the comment. We have added the axis labels.

      (4) Figure 3K caption, first "(right)" should be "(left)".

      Thank you very much for your careful attention to detail. We have made the requested correction.

      (5) Figure 6 is hard to read. Panel A is too small, and the interpretation of G is difficult.

      - For panel A, we added an enlarged view with images from a larger number of trials in Figure S7A.

      - G represents the connectivity matrix. The sources correspond to the injection sites, and the targets correspond to voxels in the cerebral cortex. Because the latter may not be immediately clear, we explicitly indicated in the figure that the targets are cortical voxels.

      (6) Figure S4C has a double compass.

      Thank you for the comment. We have revised the manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      While I have some questions and additional suggestions to further improve the clarity of the manuscript, I already found it to be highly interesting and well done in its current form.

      Major points:

      (1) The t-SNE comes up rather abruptly and is not well-explained in the main text or the figure caption. It would be good to provide some more information on the rationale of this analysis and how to interpret it. In particular, I don't see clear clusters in Figure 2H although the description of the authors seems to indicate that they observe clear functional classes such as choice, stimulus, and history neurons. Similarly, in Figure 3B, I don't see a clear separation between history and choice neurons in the t-SNE map. The example cells in Figure 3A appear to be delayed or long-tailed choice neurons rather than a dedicated group of 'history neurons'. It would be helpful for the interpretation of the t-SNE plots to show different PSTHs for different regions of the t-SNE map to better illustrate what different regions within the t-SNE projection represent and what distinguishes these cells.

      Thank you for the comment. The absence of clearly defined clusters in the t-SNE map suggests that neuronal activity forms a continuum rather than discrete classes. Importantly, the purpose of the t-SNE map here is not to identify sharp clusters, but to demonstrate that the functional categorization provided by our encoding model broadly and comprehensively spans the major structures present in the unsupervised t-SNE map. We have revised the relevant text in the manuscript accordingly as follows.

      Line 158: “To examine whether the neuron groups labeled by this model broadly capture the diversity of neuronal activity, we performed unsupervised clustering of neuronal activity using t-SNE. The functional labels revealed by this encoding model were consistent with the t-SNE clusters, indicating the validity of the encoding model (Figure 2H; Figure S4B; materials and methods).”

      The issue regarding History neurons was also raised in Reviewer #1’s comment (5). We provide an enlarged view of Figure 3A in Figure S3A. Each History neuron exhibits multiple calcium transients repeatedly and asynchronously following the previous reward acquisition. Therefore, rather than being “choice neurons with a long tail,” these neurons are better interpreted as neurons whose activity is sustained during this delay period.

      (2) Although the authors mention that neurons represent a mixture of features, they then use the encoding model to isolate clusters, such as vision or choice neurons. In general, the language throughout the manuscript suggests that there are various clusters of functionally segregated neurons (vision, choice, history, or coupling neurons). However, it is not clear to me to what extent this is supported by the data. Couldn't a choice neuron also be a vision neuron if both variables make significant contributions to the model? Similarly, are 'history' and 'choice' separate labels from the encoding model, or could a cell be given multiple labels? If a cell could be given multiple labels how did the authors create the colored plots on the right-hand side of Figures 2H and 3B? The example history cells in Figure 3J also appear to be highly selective for the contralateral choice, so again this seems to argue against a clear separation of choice and history neurons.

      Each label is assigned based on whether the corresponding coefficient is significant in the encoding model, and therefore neurons that are both vision- and choice-selective do exist. The presence of mixed selectivity neurons in PPC is well established (e.g., MJ Goard et al., 2016 elife). In this manuscript, however, we focus not on functional overlap at the single neuron level, but on the spatial distribution of functional classes, and thus do not explicitly address mixed selectivity. Although the colors in Figure 2H and Figure 3B overlap, the underlying data for each are presented separately in Figure S4B and S4D, respectively. As shown there, each color generally occupies distinct regions in the t-SNE map.

      (3) The decoding analysis in Figure 3F also suggests that a potential reason why there are more choice history signals in areas S1 and A is that neural activity is simply larger rather than due to the activity of a dedicated group of history neurons. Are the authors interpreting this differently? Could the duration of stored choice information also be affected by the dynamics of the calcium indicator?

      Thank you for the comment. Simply having larger neural activity in S1t or A would not result in calcium transients with a ~1-s time constant persisting throughout a delay period lasting up to 10 seconds. As also noted in comment (1), History neurons exhibit sustained and repeated calcium transients, and therefore their activity cannot be explained merely by elevated neural activity levels. One could argue that all cortical areas carry history-related information but that the signal-to-noise ratio is higher in S1t or A, which might make such signals more detectable there. If this were the case, however, differences across areas in all forms of selectivity should similarly depend on signal-to-noise ratio. This is not what we observe in our data.

      (4) I'm confused as to why the decoding accuracy is so high for areas A and S1t at time -3 relative to the choice in Figure 3F. Shouldn't this be the same as predicting the next choice in Figure 3H? Why is the decoding accuracy lower in this case?

      Thank you for the comment. The analysis shown in Figure 3F includes only trials in which the choice was correct. This is the reason why the decoding performance in Figure 3H is lower. We have added this clarification to the main text.

      Figure 3F: “Decoding accuracy of choice, outcome, and visual stimuli by the activity of 20 neurons from each area using only correct trials, before and after the choice onset, reward delivery, and the end of the visual stimuli, respectively. Line colors corresponded to the areas shown in panel G.”

      (5) In general, the text is not very detailed about the statistics. While test scores and p-values are mentioned, it would be good to also state what is actually compared and what the n is (e.g. how many neurons, neuron pairs, areas, sessions, or animals) for each case. How do the authors account for the nested experiment design where many neurons are coming from a low number of animals?

      Thank you for the comment. In our decoding analyses, we generally treat the number of animals as the independent variable. In contrast, for the encoding model analyses, we treat the number of neurons as the independent variable. As you correctly pointed out, because we recorded activity from a large number of neurons, statistical tests that treat individual neurons as independent samples can readily yield significant p-values even with a small number of animals. We have therefore confirmed that our conclusions are not driven by a large effect from a single animal. When making qualitative claims, we rely not only on statistical significance (p-values) but also require clear differences in effect size. We have added the following clarification to the Statistics section accordingly.

      Line 1049: ”For the decoding analyses, the number of animals was treated as the independent variable, whereas for the encoding model analyses, the number of neurons was treated as the independent variable. To ensure that the results were not driven by a single animal, we repeated the statistical tests while systematically excluding data from one animal at a time and confirmed that statistical significance was preserved in all cases. Furthermore, qualitative interpretations were made only when differences in effect size were clearly observed.”

      (6) How was the grouping in Figure 2O done? Specifically, how were the thresholds for the dashed lines selected to separate PM and V1 from AM and RL as association areas? It seems to me like this grouping was done rather arbitrarily as the difference in choice decoding accuracy is not particularly large between these areas.

      This line does not have a specific quantitative basis, but we consider it useful as an illustrative aid. We have added this clarification to the figure legend.

      Figure 2O: “Decoding accuracies of time in video presentation and choice direction indicate that AM would be the best position for associating these two signals. The background color and dashed lines are provided as visual aids for illustrative purposes.”

      (7) The fact that neurons with high rt_single tend to share the same function might also indicate the approach is insufficient to remove all effects of tuning to trial types from the neural data. Since the authors subtract the average of each trial type, the average trial-type related information is removed but type-specific variations that are not equally presented in the average might remain. For choice neurons for example, attentive vs in-attentive choices could be represented differently and thus remain in the data since the average would be a mixture of both. The same goes for other factors that would drive a particular modulation in the choice - or stimulus - related part of the trial which could still tie these neurons together. One way to circumvent this concern could be to first compute the mean activity for all time points in each trial and then compute the trial-to-trial variability across all trials of the same type. Alternatively, I would be curious how the results play out when using data when the animal is not actively performing the task to compute rt_single.

      Thank you for the comment. The concern raised by the reviewer applies to all noise-correlation analyses and highlights an important limitation of this approach, namely that factors other than the observed variables are treated as noise. By subtracting the trial-averaged activity, information related to sensory input and the direction of the first lick at choice can be removed. However, other factors cannot be eliminated if they are not observed. For example, if right hindlimb movements tend to occur only in trials with visual stimulation combined with left choice, such effects cannot be removed because they are not measured. The same issue remains even when restricting the analysis to a single trial type. Based on these considerations, we have added the following text to the manuscript.

      Line 932: “Correlation of trial-to-trial variance of activity between a pair of single neurons was defined as r<sub>t_single</sub>. To calculate r<sub>t_single</sub>, we averaged the activity of individual neurons over the sampling period, and the average across each trial type was subtracted from this value. The trial types consisted of four sets of pairs of stimuli and responses, that is, the video stimulation and left choice, the video stimulation and right choice, the black screen and left choice, and the black screen and right choice. By this operation, we extracted the fluctuating components of single-neuron activity that are independent of the trial types. Although the finding that neurons with high r<sub>t_single</sub> tend to share the functional properties we propose is not a trivial consequence of the analysis. At the same time, it remains possible that high r<sub>t_single</sub> reflects the degree to which neurons share unobserved features, and that such features are correlated with our functional classification. Thus, while this analysis suggests that correlated fluctuations across cortical areas may contribute to the determination of functional types, establishing an exclusive conclusion will require more fine-grained behavioral measurements, tighter control of internal states, and causal identification through targeted interventions.”

      Minor points:

      (1) Why did the authors use the activity of 50 neurons for the decoder analysis in Figure 2K? Didn't they have many more neurons available? How were these selected?

      We found that the conclusions were identical when using datasets consisting of either 50 neurons or 20 neurons across all analyses. Because the total number of recorded PM neurons did not reach 100 in at least one mouse, we standardized the analyses to 50 neurons in order to match the number of neurons across all cortical areas and animals.

      (2) The authors mention that some PPC neurons showed complex dynamics rather than encoding a specific feature such as visual or choice information but do not mention actual numbers on this point. It would be good to quantify to what extent neurons in different regions represent such mixed selectivity and whether there are clear differences in selectivity. This would also be interesting to discuss in context to earlier work on mixed selectivity in the parietal cortex, such as Raposo et al 2015.

      Thank you for the comment. Your point is entirely valid. However, as explained in our response to your major comment, our analyses focus not on how individual neurons are classified, but rather on the spatial distribution of these functional categories.

      (3) I have a hard time understanding what the length of the bars in the right panel of Figure 2k indicates. Does this plot show more than the decoder accuracy before and after the choice? Is the bar length related to the standard deviation? The same question for the visualization in panel 2n. It looks nice but I'm confused about what it shows exactly.

      These bars represent confidence intervals. Although this is stated at the end of the Figure 2 legend, we agree that it may not be sufficiently clear, and we have therefore added this information to the Statistics section.

      Line 1046: “In Figure 2K and N, and Figure 3G, L, M, and O, the bars indicate the 95% confidence intervals. All other bars denote s.e.m., unless otherwise noted.”

      (4) Is Figure 3D showing the same association index as in Figure 2j, thus showing the same result as in the vision task or is this meant to show something new? It was not clear to me from the wording, so it would be good to clarify.

      You are correct that the magenta trace in Fig. 3D is the same as in Fig. 2J. This panel was included to explicitly illustrate that, in areas A and AM, the separation between History and Association approximately overlaps. We have added the following clarification to the figure legend accordingly.

      Figure 3D: “The percentage of history neurons and the association index (as defined in Fig. 2J) were overlaid for comparison.”

      (5) When computing the Pseudo R2 for regressor contribution, how was the null model computed? From shuffling all regressors in the model? I think this is fine but it's not fully clear what the intended effect of this procedure is. For the description of Figure 4C it would be good to add a sentence explaining how to interpret the pseudo R^2.

      The null model predicts a fixed value that is independent of the explanatory variables, i.e., it predicts only the intercept. This provides a useful correction term when performing cross-validation, particularly in cases where baseline values differ across folds. In Figure 4C, the analysis shows the contribution of adding body part positions and pupil diameter to the model for predicting neural activity. We have added the following text to the Methods section.

      Line 881: “To estimate the contribution of parameters for the left forelimb, the right forelimb, the tail, and the pupil, we repeated the same analysis with a reduced model where each set of predictors was eliminated from the full model (Figure 4B). Then, the pseudo-R<sup>2</sup> was obtained for each set of predictors by (MSE<sub>reduced</sub>MSE<sub>full</sub>) /MSE<sub>null</sub>, where MSE is the mean squared error, MSE<sub>reduced</sub> is MSE for the reduced model, MSE<sub>full</sub> is the MSE of the full model, and MSE<sub>null</sub> is the null model. The null model predicts a fixed value that is independent of the explanatory variables; specifically, it simply outputs the mean of the training data. For example, we constructed a regression model without the parameters regarding the left forelimb (green shade of Figure 4B), obtained MSE<sub>reduced</sub> for the left forelimb, and the pseudo-R<sup>2</sup> was calculated as above by comparing the MSE of the full model and the null model. This value reflects the extent to which the position of the left forelimb contributes to the prediction of neuronal activity.”

      (6) It seems surprising that the pupil-size-related neurons were mapped around visual areas although the pupil should carry clear luminance information. Is this because the luminancerelated information in the pupil can also be explained by the stimulus variable in the model?

      Pupil size changed markedly before and after visual stimulus presentation (Figure S5C), dilating during the black stimulus and constricting during the video stimulus. This likely reflects changes relative to the luminance of the gray screen presented in the absence of visual stimuli. In our encoding model, visual stimuli are included as independent regressors for each corresponding time window. Therefore, pupil fluctuations that are temporally locked to visual stimulation are explained by these visual regressors. Neuronal activity that is better explained by pupil size changes not accounted for by the visual regressors is classified as pupil-related. At least three mechanisms may underlie the influence of pupil size on neuronal activity. First, fluctuations in pupil diameter have been linked to behavioral state or noradrenergic level [REF], which can act as variables independent of visual stimulation. Second, pupil fluctuations may be amplified in a stimulus-dependent manner, reflecting nonlinear interactions between visual input and brain state. Third, changes in pupil diameter alter the amount of light reaching the retina, which can modulate activity in visual cortical areas. The latter two mechanisms are therefore expected to predominantly affect visual areas and may explain why pupil-related neurons are more frequently observed there. The first mechanism is likely related to global brain state, and its association with behavior may account for the presence of pupil-related neurons in S1. However, these interpretations require confirmation through more refined causal manipulations. Accordingly, we limited the addition to the manuscript to the following statement.

      Line 292: “We found that the neurons related to the tail and forepaws were similarly distributed around the parietal cortex including S1 and A, while the pupil-size related neurons were mapped around visual areas (Figure 4C). Changes in pupil diameter may influence neuronal activity through multiple mechanisms, including behavioral state or noradrenergic level [REF], nonlinear interactions with visual stimulation, and changes in the amount of light reaching the retina.”

      (7) What is meant by 'external control parameters such as a video frame' when explaining the encoding model?

      Thank you for the comment. We added the following explanation.

      Line 151: “In the encoding model, the activity of each neuron was fitted by a weighted sum of external control parameters, such as video frames, and behavioral parameters, such as choice and reward direction. Because the visual stimulus changes continuously over time, sliding time windows were placed during the visual stimulus period.”

      (8) What does the trace in Figure 2G show? Is this a single-cell example? What are the axes here?

      We added an explanation to the figure legend.

      Figure 2G: “Schematic of our encoding model. The bottom right panel shows an example of single-neuron activity with an overlay of the fitting obtained by the encoding model.”

      (9) There seems to be a word missing in the sentence that describes the results for Figure 3O in the main text.

      Thank you for the comment. We added the following description related to Fig. 3O.

      Line 247: “resulting in the decoding accuracy of time after a specific choice being lower than in A (Figure 3O).”

      (10) The abbreviation RP is used when describing Figure S5A. It should be mentioned that this refers to the response period.

      Thank you for the comment. We added the following description related to Figure S5A.

      Line 283: “We found that the angle of the tail was significantly different from the baseline values several seconds after the response period (RP) (Figure S5A)”

      (11) I can't see the color difference between the traces in Figure 2E. There are probably red and green but this is hard to see for readers with red-green color blindness. Does the black indicate the time of visual stimulation? Is the line in Figure 2F the time when the spouts move in?

      Thank you for the comment. In Fig. 2E, we improved visibility by changing the line opacity. In addition, the vertical line in Fig. 2E indicates the onset of the visual stimulus, and the vertical line in Fig. 2F indicates the onset of the response period. We have added the following explanations to the figure legend.

      Figure 2: E. “Representative vision neurons (ROI 1-4 in I). The red bars indicate sampling periods during video presentation, and the brown bars indicate sampling periods without video stimulation. Vertical black lines mark the onset of the sampling period. F. Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9). Light blue lines indicate the response periods in trials with left choices, and purple lines indicate the response periods in trials with right choices. Vertical black lines mark the onset of the response period.”

      (12) It might be useful to provide a short explanation in the results or methods of why the harmonic mean was used for the computation of the association index. I think it makes sense but since it is not commonly used this could be helpful for the reader to understand the approach.

      Thank you for the comment. We added the following explanation to the main text.

      Line 869: “The association index was determined by the harmonic mean of the rates of vision neurons and choice neurons. The harmonic mean approaches the arithmetic mean when the two values are similar, but becomes closer to the smaller value when the two values differ substantially. Therefore, the association index takes a large value when both vision neurons and choice neurons are abundant.”

      (13) I don't fully understand how coupling diversity is computed. If there are six preference vectors, what is meant by taking the average of angles between all pairs of the two vectors?

      Which two are meant here?

      Thank you for the comment. We revised the explanation as follows.

      Line 950: “To quantify the diversity of coupling patterns across clusters, we computed the angle between every pair of preference vectors. We then averaged these pairwise angles and defined this quantity as the “coupling diversity.”

      (14) The results text states that the high correlation between r_anatomy and r_neuropil (Figure 6I) is evidence for the functional correlations being driven by cortico-cortical connectivity. However, Figure 6J shows that correlations for either cortico-cortical or thalamo-cortical connectivity are below 0.94 and generally higher for thalamo-cortical connectivity. This doesn't negate the general point of the authors but it would be good to clarify this section so it is easier to understand if r_anatomy includes both cortico-cortical and thalamo-cortical data and how the results in Figure I and J go together with the description in the results section.

      You are correct. We have revised the text to clarify that the analysis reflects the combined effects of both cortico-cortical and thalamo-cortical inputs.

      Line 436: “This correspondence suggests that the mesoscale interarea correlation is determined by the cortico-cortical and thalamo-cortical common input at mesoscale. Figure S8: A. Using Allen connectivity atlas, the axonal density of cortico-cortical and thalamo-cortical projection was analyzed.”

      (15) I'm not very familiar with canonical correlation analysis and found this part hard to follow. Some additional explainer sentences would be helpful here. For example, what does it mean to take the average of the top 10 canonical correlations as rt_population? What exactly are the canonical correlation vectors? It was also not clear to me what exactly the results in Figure 5J signify.

      Thank you for the comment. We have clarified the description in the main text related to CCA and the associated analyses as follows.

      Line 374: “Next, to investigate r<sub>t</sub> of the population activity (r<sub>t_population</sub>), we first reduced the dimension of population activity in each area into 10 by using PCA (principal component analysis) (Figure S6B,C). Then, “fluctuation activity” was recalculated for each dimension and trial type, analogous to the single-neuron analysis described above, but here representing noise in population-level activation patterns. We applied CCA (canonical correlation analysis) to each pair of areas and obtained an average of 10 canonical correlations (CC<sub>t</sub>) as r<sub>t_population</sub>. CCA identifies pairs of linear combinations of population activity from two areas that maximize their correlation across trials, thereby capturing shared population-level fluctuations. The CC<sub>t</sub> structure between areas was similar across task types (Figure 5H) indicating that this structure reflects the underlying functional connectivity independent of the task. The CC<sub>t</sub> between A and S1t was the largest among all the pairs (Figure 5H), whereas when the CC<sub>t</sub> was averaged across all connections for each area, A and AM had the largest and second largest CC<sub>t</sub>, respectively (Figure 5I). The dominance in CC<sub>t</sub> in A and AM disappeared when the neurons with r<sub>t,single</sub> >0.3 were removed. Notably, the CC<sub>t</sub> of AM and the other areas was uniform regardless of the paired areas across all 10 canonical components (Figure 5J). Thus, area AM is an integration hub of interareal communication, whereas A simply coupled with S1t, and such a correlation structure at the population level critically depends on this subset of neurons.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype for genetic knock out is a major weakness.

      We agree with the reviewer that a S. aureus invasion phenotype in ASM K.O. cells would unequivocally demonstrate the importance of ASM for the process. In the revised manuscript, we report an invasion phenotype in ASM K.O. cells. The absence of an invasion phenotype in ASM K.O. cells in our original experiments was likely caused by SM accumulation in ASM-depleted cells originating from FBS (see Figure 2I, in the revised manuscript).

      We thus cultured cells for up to three days in 2% FBS and then reduced the concentration to 1% FBS one day prior to experimentation. Under these conditions reduced S. aureus invasion in ASM K.O.s was observed when compared to wildtype cells.

      This was not detected when we cultured the cells in medium containing the common concentration of 10% FBS. Our new data supports the results we acquired with three different ASM inhibitors.

      The invasion defect in ASM K.O.s cultured in low FBS was more pronounced at 10 min p.i. when compared to the 30 minute time point (Figure 2K), further corroborating that the ASM-dependent invasion pathway is relevant early in infection. This is consistent with the invasion dynamics we observed upon interference with lysosomal Ca<sup>2+</sup> signaling [TPC1 K.O. (Figure 1C), BAPTA-AM (Figure 3D)], lysosomal exocytosis [Syt7 K.O. (Figure 2F), Ionomycin (Figure 3D)] and ASM activity by inhibitor treatment (Figure 3D).

      Originally, we had hypothesized that changes in the sphingolipidome induced by absence of ASM may have caused the lack of an S. aureus invasion phenotype. We thus compared the sphingolipidome of ASM K.O.s cultured in 1% and 10% FBS. Indeed, SM accumulation was less severe when we cultured the cells in 1% FBS (Figure 2M and Supp. Figure 3). Hence, we think that strong SM accumulations in ASM K.O. cells cultured in 10% FBS may facilitate ASM-independent invasion mechanisms and thus, the absence of ASM-dependent invasion could not be detected by analyzing the number of invaded bacteria. This is supported by experiments, where we treated ASM K.O.s with the ASM inhibitor ARC39, which only slightly affected S. aureus invasion, whereas we detected a strong reduction of internalized bacteria by ARC39 treatment of WT cells (Figure 2 J). We think that this experiment and the reduced invasion in ASM K.O.s rule out an ASM/SM-independent effect of the inhibitors.

      - While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      We agree with reviewer that we do not show formation of ceramide-enriched platforms, and we thus changed the manuscript accordingly (see below).

      - The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We shared the reviewer’s desire to discriminate between ASM-dependent and ASM-independent processes, but we are limited by cell biology and the simultaneous occurrence of processes - here the uptake of bacteria by multiple pathways.

      However, we were able to address ASM-dependency of our rapid uptake mechanism by observing a genetic phenotype in SMPD1 knockout-cells.

      We here do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were interested in the fact that such an ASM dependent pathway existed. In different as of yet still unidentified cell lines such a pathway may pose the main entry point for bacteria. Or maybe it represent an ASM-dependent mode of receptor uptake which we have identified with the bacteria piggy-backing into the cells.

      - I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASM-mediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be?

      We are convinced that our new genetic evidence of an S. aureus invasion phenotype in ASM K.O.s will eliminate the reviewer’s concerns about the role of ASM during the bacterial invasion.

      The new lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype.

      We agree with the reviewer, however, that the reason why changes in sphingolipidome increase ASM-independent S. aureus internalization by host cells remains elusive. One possible explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus (3, 4). Characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript.

      Host cells possess mechanisms to prevent infections, while pathogens developed strategies to circumvent these defense processes. In the present scenario, a physiological membrane composition of the host cell represents such a pathogen defense mechanism (as shown e.g. for caveolin-1 that restricts invasion of S. aureus in healthy cells). If a defense mechanism is disabled (as we speculate it is the case upon strong SM accumulation in ASM K.O.s cultured in 10%FBS), infection is facilitated. In healthy WT cells, these mechanisms (e.g. caveolin-1) are functional and, hence, we would not expect a “compensation” of ASM-dependent invasion. We here analyze invasion events that cannot be prevented by host defense mechanisms as they occur in untreated WT cells and are absent upon interfering with the ASM-dependent invasion pathway (by inhibitors and genetic K.O.). Thus, we think the ASM-dependent pathway, which mediates 50-70% of bacteria internalized by healthy WT cells 10 min p.i., is central for the infection.

      - The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We measured phagosomal escape of S. aureus JE2 in ASM K.O. cells cultured in 1% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (Author response image 1).

      Escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium. We therefore think that prolonged absence of ASM has other side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      Author response image 1.

      As it is unclear how prolonged absence of ASM can affect cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      - Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment ? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment)?

      Inducible knock-downs in our laboratory are based on the vector pLVTHM in cells co-expressing the repressor TetR fused to a KRAB domain. It needs to be stated that for optimal knock-downs the induction has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (5). However, the course of infection in macrophages differs from non-professional phagocytes (6). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      - The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms. We thus changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (7).”

      - The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection.

      We thank the reviewer for this suggestion. We included the following section in our discussion (line 593):

      “Since fluorescent calcium reporters allow to monitor this process microscopically (8, 9) ,future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References

      (1) J. Rappaport, C. Garnacho, S. Muro, Clathrin-mediated endocytosis is impaired in type A-B Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm 11, 2887-2895 (2014).

      (2) J. Rappaport, R. L. Manthe, C. Garnacho, S. Muro, Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm 12, 1366-1376 (2015).

      (3) C. Hoffmann et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci 123, 4280-4291 (2010).

      (4) L.-P. Tricou et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports 14, 28643 (2024).

      (5) C. Li et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (6) A. Moldovan, M. J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (7) M. Rühling, F. Schmelz, A. Kempf, K. Paprotka, J. Fraunholz Martin, Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (8) D. Shen et al., Lipid storage disorders block lysosomal trafficking by inhibiting a TRP channel and lysosomal calcium release. Nat Commun 3, 731 (2012).

      (9) L. C. Davis, A. J. Morgan, A. Galione, NAADP-regulated two-pore channels drive phagocytosis through endo-lysosomal Ca(2+) nanodomains, calcineurin and dynamin. EMBO J 39, e104058 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      General assessment of the work:

      In this manuscript, Mohr and Kelly show that the C1 component of the human VEP is correlated with binary choices in a contrast discrimination task, even when the stimulus is kept constant and confounding variables are considered in the analysis. They interpret this as evidence for the role V1 plays during perceptual decision formation. Choice-related signals in single sensory cells are enlightening because they speak to the spatial (and temporal) scale of the brain computations underlying perceptual decision-making. However, similar signals in aggregate measures of neural activity offer a less direct window and thus less insight into these computations. For example, although I am not a VEP specialist, it seems doubtful that the measurements are exclusively picking up (an unbiased selection of) V1 spikes. Moreover, although this is not widely known, there is in fact a long history to this line of work. In 1972, Campbell and Kulikowski ("The Visual Evoked Potential as a function of contrast of a grating pattern" - Journal of Physiology) already showed a similar effect in a contrast detection task (this finding inspired the original Choice Probability analyses in the monkey physiology studies conducted in the early 1990's). Finally, it is not clear to me that there is an interesting alternative hypothesis that is somehow ruled out by these results. Should we really consider that simple visual signals such as spatial contrast are *not* mediated by V1? This seems to fly in the face of well-established anatomy and function of visual circuits. Or should we be open to the idea that VEP measurements are almost completely divorced from task-relevant neural signals? Why would this be an interesting technique then? In sum, while this work reports results in line with several single-cell and VEP studies and perhaps is technically superior in its domain, I find it hard to see how these findings would meaningfully impact our thinking about the neural and computational basis of spatial contrast discrimination.

      We agree that single cell measurements allow for a spatially more detailed analysis, but they are not feasible in humans. Assuming we value insights into the relationship between neural activity and decision making in the human as well as non-human brain, we are restricted to non-invasive measurements such as EEG, which inevitably showcase the neural underpinnings of decision making at a coarser level of analysis. This was the challenge we met with our paradigm design. For example, we chose contrast as the task-relevant stimulus feature in this study because monotonic contrast response functions exist for sensory neurons throughout the visual system, and the aggregated measures that we could attain with EEG would reflect that contrast-sensitivity and hence provide a window onto the encoding of the main decision-relevant quantity. We were specifically interested in initial afferent, contrast-dependent V1 activity reflected in the C1 component (80-90 ms). As we point out in the Introduction, the C1 is unusual among EEG signals in the extent to which it is dominated by a single visual area, V1 (Jeffreys & Axford, 1972; Clark et al., 1994; Di Russo et al., 2002; Ales et al., 2010; Mohr et al., 2024), and even if other downstream areas also make a minor contribution in the C1 time period, it still represents a very low-level sensory response early in the sensory analysis pipeline, appropriate for addressing our primary question of whether such a low-level signal is used in the formation of perceptual decisions. The alternative hypothesis, that early responses are passed over in decision readout, relates to a fundamental debate about whether early sensory responses are separated from cognition. The possibility that late, but not early, representations are correlated with choices does not imply that the later sensory representations are divorced from the earlier ones, only that there is a noise component that is not shared between the two, such as that produced by the ensuing computations that generate the later representations. Instead, a lack of choice probability in early representations would imply that decision readout is selective in where it sources sensory evidence from, with some possible reasons being to maintain high quality standards for sensory evidence or to impose a layer of separation between cognition and sensation.

      As the reviewer points out, the animal literature is highly mixed on the topic of choice probability in V1. Even for orientation discrimination tasks where V1 is ostensibly highly suited given the existence of orientation columns in V1, and even when measurements are taken from V1 neurons with good neurometric performance and/or aggregated across a V1 population (Jasper et al 2019), some studies have reported little to no V1 choice probability. If our alternative hypothesis of no EEG-indexed V1 choice probability flies in the face of well-established anatomy and function of visual circuits, then so also do these empirical findings in the animal neurophysiology literature. 

      Although there are important aspects of choice probability that are accessible in single cell studies but not in EEG (e.g. noise correlations, details of circuit physiology), our EEG measurements tap into the same phenomenon, just at a different level of analysis, i.e. the neural population level. At this level, we have been able to address whether the full body of sensory responses at a particular stage of visual analysis is systematically related to perceptual decision outcomes. Very similar questions are in fact sometimes addressed in the animal neurophysiology literature; for example, Kang and Maunsell (2020) aggregated single-cell choice probability measurements within visual areas to investigate whether choice probability strength at the level of an entire visual area was sensitive to task demands. The global vantage point of EEG comes with the additional benefit of picking up signatures of other potentially mediating processes such as attention and being able to control for them in our analysis. Our human study thus provides a valuable complementary viewpoint alongside animal neurophysiology work in this area.

      Summary of substantive concerns:

      (1) The study of choice probability in V1 cells is more extensive than portrayed in the paper's introduction. In recent years, choice-related activity in V1 has also been studied by Nienborg & Cumming (2014), Goris et al (2017), Jasper et al (2019), Lange et al (2023), and Boundy-Singer et al (2025). These studies paint a complex picture (a mixture of positive, absent, and negative results), but should be mentioned in the paper's introduction.

      We thank the reviewer for highlighting these papers bearing on choice-related activity in V1, only two of which we had cited. The three additional studies do indeed lend further support to our description of the complex picture around V1-CP effects in the literature and we have now included them.

      (2) The very first study to conduct an analysis of stimulus-conditioned neural activity during a perceptual decision-making task was, in fact, a VEP study: Campbell and Kulikowski (1972). This study never gained the fame it perhaps deserves. But it would be appropriate to weave it into the introduction and motivation of this paper.

      We are aware of this paper, and indeed we ourselves have shown steady-state VEP (SSVEP) correlations with timing and selection of decision reports (O'Connell et al 2012; Grogan et al 2023), but SSVEPs do not provide an index of initial afferent V1 activity in the way that the C1 of the transient VEP does. SSVEPs are evoked by a rapid sequence of stimulus onsets, so that activity cannot be attributed to a particular stimulus onset nor its bottom-up latency resolved, and, being a response to an ongoing stimulus, it combines top-down and bottom-up influences from striate and extra striate areas (Di Russo et al 2007). Indeed, in Campbell and Kulikowski (1972) the SSVEP was almost entirely eliminated when the stimulus was undetected. This is in keeping with robust modulations of the SSVEP by spatial attention (Muller and Hillyard 2000). Cognitive influences of this magnitude are never observed in the C1, and in fact are often not observed at all even when later VEP components show robust modulations (Luck et al 2000), which motivated a recent meta-analysis to address the issue (Qin et al 2022). This highlights the important distinction between the earliest transient VEP activity reflecting mainly the initial afferent response in V1, and steady-state sensory activity reflecting a mix of bottom-up and top-down influences across visual cortex. Because of the importance of this distinction, we have added a reference to the above SSVEP papers to the 3rd paragraph of the introduction along with a statement about the distinction.

      (3) What are interesting alternative hypotheses to be considered here? I don't understand the (somewhat implicit) suggestion here that contrast representations late in the system can somehow be divorced from early representations. If they were, they would not be correlated with stimulus contrast.

      This same conundrum applies to single-cell studies of choice probability. Do studies showing choice probability in V4 but not V1 for example demonstrate that V4 is divorced from V1? In such studies, measurements are typically taken from large representative samples of neurons from both areas with good neurometric performance in both cases and the task often (though not always) involves a target stimulus feature that is encoded in V1 such as orientation. Why then should V4 but not V1 show choice probability when we know the vast majority of input to the visual cortex passes through V1? It must be that feature representation and choice formation are different things with one not inferring the other. This is true for an EEG study as much as it is for a single-cell study.

      The alternative hypothesis in our study is that the early sensory responses indexed by the C1 are not directly used in the formation of the perceptual decision at hand. As outlined in our comments above, this does not imply that those early responses are divorced from later responses. Of course, both are correlated with stimulus contrast and so would correlate with each other across changing contrast but this does not necessitate that their noise is correlated when contrast is held constant because new instantiations of noise can be generated by the computations performed at each stage of visual processing. Thus, the interesting alternative hypothesis is that information contained in the sensory representation generated during initial afferent V1 activity is not used directly to form decisions, and instead, decisions are read out from the outputs of computations performed further downstream. Such an outcome, if it had arisen in our data, would have been consistent with a separation between cognition and early visual processing. Instead, our results suggest a certain level of cognitive interfacing at the lowest and earliest cortical levels of visual processing. We have now added text to the Introduction to highlight the distinction between sensory representation and decision readout in order to make the alternative hypothesis clearer.

      (4) I find the arguments about the timing of the VEP signals somewhat complex and not very compelling, to be honest. It might help if you added a simulation of a process model that illustrated the temporal flow of the neural computations involved in the task. When are sensory signals manifested in V1 activity informing the decision-making process, in your view? And how is your measure of neural activity related to this latent variable? Can you show in a simulation that the combination of this process and linking hypothesis gives rise to inverted U-shaped relationships, as is the case for your data?

      We thank the reviewer for this suggestion of a simulation, which we carried out using the Matlab code. We have also included new Figure 1-Figure Supplement 1 in the revised manuscript.

      In our view, sensory signals in V1 are informing the decision-making process in this task from at least as early as the initial afferent response. The main point about C1 latency in relation to the response-time contingency of the choice probability effect is that the more time that elapses without a decision made (and therefore the more additional sensory processing that contributes to the decision), the more diluted is the contribution of the C1 to the decision by contributions from later representations, and thus choice probability reduces. Likewise, when response times are too quick for C1 evidence to contribute, choice probability is also absent, hence the inverted-U-shaped curve. Moreover, if the C1-choice correlation is mediated by a top-down factor such as attention rather than readout, the inverted-U-shaped curve is not expected because in such a case the relative timing of the C1 and choice commitment would not be relevant.

      Reviewer #2 (Public review):

      Summary:

      Mohr and Kelly report a high-density EEG study in healthy human volunteers in which they test whether correlations between neural activity in the primary visual cortex and choice behavior can be measured non-invasively. Participants performed a contrast discrimination task on large arrays of Gabor gratings presented in the upper left and lower right quadrants of the visual field. The results indicate that single-trial amplitudes of C1, the earliest cortical component of the visual evoked potential in humans, predict forced-choice behavior over and beyond other behavioral and electrophysiological choice-related signals. These results constitute an important advance for our understanding of the nature and flexibility of early visual processing.

      Strengths:

      (1) The findings suggest a previously unsuspected role for aggregate early visual cortex activity in shaping behavioral choices.

      (2) The authors extend well-established methods for assessing covariation between neural signals and behavioral output to non-invasive EEG recordings.

      (3) The effects of initial afferent information in the primary visual cortex on choice behavior are carefully assessed by accounting for a wide range of potential behavioral and electrophysiological confounds.

      (4) Caveats and limitations are transparently addressed and discussed.

      We would like to thank the reviewer for these positive remarks.

      Weaknesses:

      (1) It is not clear whether integration of contrast information across relatively large arrays is a good test case for decision-related information in C1. The authors raise this issue in the Discussion, and I agree that it is all the more striking that they do find C1 choice probability. Nevertheless, I think the choice of task and stimuli should be explained in more detail.

      We thank the reviewer for raising this point about the large stimulus arrays. As we said in our Discussion, it would seem that aggregation across a large stimulus region would be better suited to a downstream visual area with larger receptive fields, yet our setting of a strict deadline would put the emphasis back on earlier sensory representations. We now elaborate on this matter in the discussion, to say that although the small receptive fields and short, slow horizontal connections in V1 mean that the aggregation necessary for performing the task is unlikely to happen within V1 during the C1 timeframe, the aggregation would be readily achieved simply by convergence of the outputs of all relevant V1 neurons for a given stimulus array on the same decision process. In this sense, the design of our paradigm was such that the globally-measured C1 component on the scalp reflected the same aggregated evidence input as the summed V1 readout that we suppose would be entering the decision process.  

      We have also added further rationale in the Methods section on the practical benefits of the stimulus design, as the reviewer anticipates in their subsequent point, of yielding robust C1 signals. This concern was paramount in the design of this study because we expected the C1 difference metric that was of interest to be very small. We also needed a robust C1 to be measured in both the upper and lower visual field in as many individuals as possible and, in our experience, this is true less often when using smaller stimuli, even with a pre-mapping procedure.

      It also helped to homogenize C1 topography across individuals and ensure that topographies from the upper and lower visual field had sufficient overlap that there were electrodes with strong loading from both topographies where the C1 difference as a function of which array was brighter would be maximal.

      We have updated the methods section to provide these rationales while we describe the stimulus design.

      (2) In a similar vein, while C1 has canonical topographical properties at the grand-average level, these may differ substantially depending on individual anatomy (which the authors did not assess). This means that task-relevant information will be represented to different degrees in individuals' single-trial data. My guess is that this confound was mitigated precisely by choosing relatively extended stimulus arrays. But given the authors' impressive track record on C1 mapping and modeling, I was surprised that the underlying rationale is only roughly outlined. For example, given the topographies shown and the electrode selection procedure employed, I assume that the differences between upper and lower targets are mainly driven by stimulus arms on the main diagonal. Did the authors run pilot experiments with more restricted stimulus arrays? I do not mean to imply that such additional information needs to be detailed in the main article, but it would be worth mentioning.

      We thank the reviewer for their thoughtful consideration of this issue about individual variability in C1 retinotopy. Indeed, as the reviewer anticipated we expected the large stimulus coverage to mitigate this issue and we think that our response to the point above and the changes we made to the manuscript in response address this point also. Although we did not show this in the manuscript, we did in fact find that C1 topography was much more similar across individuals than it has been in previous C1 experiments we have carried out with smaller stimuli.

      However, we acknowledge the reviewer’s point that the signal measured at a specific electrode likely has a variable loading strength from the various gratings in the stimulus array and that the gratings of maximal loading may indeed vary from subject to subject. Such inter-subject variability cannot confound the choice probability effects because the latter are measured within-subject. Nevertheless, it could be a source of noise. We believe the impact of this is unlikely to be substantial for the following reasons:

      i) We designed the spatial spread of contrasts in such a way as to encourage participants to aggregate across the full array. In essence, to match the property of the C1 as an aggregate measure of V1 activity, we designed a task that involved aggregating across stimulus elements. Therefore, the decision weighting applied to any particular grating should be representative of the weighting applied to all gratings and, as such, the specific gratings that contribute most to the C1 signal for a particular participant should be relatively inconsequential.

      ii) By avoiding the horizontal and vertical meridians we avoided the regions of space where the shifts in C1 topography are largest.

      (3) Also, the stimulus arrangement disregards known differences in conduction velocity between the upper and lower visual fields. While no such differences are evident from the maximal-electrode averages shown in Figure 1B, it is difficult to assess this issue without single-stimulus VEPs and/or a dedicated latency analysis. The authors touch upon this issue when discussing potential pre-C1 signals emanating from the magnocellular pathway.

      Indeed, there are important differences in V1 properties between the upper and lower visual fields, visual acuity being another example in addition to conduction velocity as the reviewer points out. However, these differences appeared to be quite minimal in this case (Figure 1B does in fact include a single-stimulus VEP – the “1-stim” entry in the legend). Perhaps this is also due to the large stimulus array which may include a range of conduction velocities within it and thereby blur overall differences between the upper and lower visual field. The variability of contrast within each array was also quite high (+/-20% from the midpoint), which would have further increased within-array conduction velocity variability and blurred differences between arrays.

      Our staircasing procedure may have also helped in this regard to some extent as it included a bias parameter between the arrays to account for any behavioural response biases. Although the small contrast changes it usually incurred are likely much too small to change conduction velocities, it corrected for any effect on behaviour they may have.

      (4) I suspect that most of these issues are at least partly related to a lack of clarity regarding levels of description: the authors often refer to 'information' contained in C1 or, apparently interchangeably, to 'visual representations' before, during, or following C1. However, if I understand correctly, the signal predicting (or predicted by) behavioral choice is much cruder than what an RSA-primed readership may expect, and also cruder than the other choice-predictive signals entered as control variables: namely, a univariate difference score on single-trial data integrated over a 10 ms window determined on the basis of grand-averaged data. I think it is worth clarifying and emphasizing the nature of this signal as the difference of aggregate contrast responses that *can* only be read out at higher levels of the visual system due to the limited extent of horizontal connectivity in V1. I do not think that this diminishes the importance of the findings - if anything, it makes them more remarkable.

      This is true that a univariate measure may stick out in a field increasingly favouring multivariate analyses with the spread of machine learning, and so we have added a short qualifier in the methods section where we describe the C1 measurement to explicitly state that it is a scalar variable. What we have done in using this univariate measure is leverage the rich prior knowledge about V1 anatomy and neurophysiology, rather than trust in data-driven classifiers; interestingly, we found that such a classifier trained on all electrodes discriminates choices less well than our informed univariate measure during the C1 time-frame. 

      We also thank the reviewer for raising an interesting point about the nature of aggregation and readout in the context of our stimulus. We agree that it is not feasible that V1 activity would be aggregated locally in V1 across such large regions of space prior to being readout within the C1 time period. As we say above, the aggregation may instead be carried out through convergent transmission of the parallel, spatially-local V1 information to the decision process.

      (5) Arguably even more remarkable is the finding that C1 amplitudes themselves appear to be influenced by choice history. The authors address this issue in the Discussion; however, I'm afraid I could not follow their argument regarding preparatory (and differential?) weighting of read-outs across the visual hierarchy. I believe this point is worth developing further, as it bears on the issue of whether C1 modulations are present and ecologically relevant when looking (before and) beyond stimulus-locked averages.

      We thank the reviewer for their positive appraisal of this additional finding, which we also found remarkable. We agree that our description of our interpretation was too brief and lacked clarity. We have reworded it and expressed it in terms of the speed accuracy trade-off, with the new explanation given below. However, it is important to remember that this account is speculative and serves only to explain the response-time contingency of the bias. That the bias was present and constitutes a modulation of the C1 does not rest on this argument:

      […] “to explain the RT contingency for the C1 bias, we speculate that the speed-accuracy trade-off could fluctuate from trial to trial and that the corresponding decision bound fluctuations (Heitz and Schall 2012) could be implemented by pre-determining decision weights across visual areas. For example, to achieve faster decisions, the sensory evidence requirement could be reduced by placing greater emphasis on initial afferent V1 evidence. In such a case, the RT contingency of the above choice history bias could be explained if the C1 bias is exerted in proportion with the planned emphasis of C1 evidence for the upcoming decision.”

      Recommendations to the Authors:

      Reviewer #2 (Recommendations for the authors):

      (1) As someone whose first language is not English, I am somewhat hesitant to bring this up, but I found the use of 'readout' as both noun and verb somewhat confusing. I thought read-out was defined as 'that which is read out'.

      We agree that this dual use of the word readout may cause confusion. To avoid this, we have edited the manuscript to replace verbal forms of the word “readout” with “read out”.

      (2) I found it difficult to follow the reasoning for why intermediate RTs should be the ones most affected by C1-related information. Perhaps this could be described in more detail for the uninitiated reader.

      We appreciate that our reasoning for why intermediate RTs should be the ones most affected by C1-related information was difficult to follow. We have now added a simulation to showcase this rationale more clearly - see response to reviewer 1, and new figure supplement to figure 1. 

      (3) It would be interesting to compare the effect sizes observed here to those seen in single-cell studies and to discuss this comparison with regard to differences in the nature of EEG signals and single-cell firing rates.

      While we agree that such a comparison would be interesting if feasible, it would have to be for the same task settings, which have not been used in a single-cell study, and  the very different nature and extent of noise between the two recording modalities would make such a comparison difficult to interpret, e.g. background noise in EEG from ongoing processes unrelated to the task. 

      (4) Figure 1: It may be worth mentioning in the legend that only parts of the peripheral stimulus grid are shown for better visibility, as the Methods speak of 9 x 9 grids. Also, in panel B, it should be mentioned that waveshapes are calculated using individually selected maximal-difference electrodes.

      We thank the reviewer for spotting these. We have updated the caption for this figure to reflect these two observations.

      (5) Figure 4: The different shades of green may be difficult to distinguish when printed.

      Although this may be true, we chose shades of green that differ in luminance so they should still be distinguishable. Different colours may in fact be less distinguishable if they had the same luminance and the print was black-and-white. We chose different shades of the same colour to reflect the fact that we were plotting the same signals at different difficulty levels. In our opinion, this takes precedence since eLife is an online journal so the majority of readers will likely read it digitally.

      (6) Methods/Task: While the ITI of 780 ms is substantial, I was wondering why the authors decided against jittering this interval? It would be helpful to briefly discuss whether contrast adaptation for slow periodic stimulation may have affected the findings.

      We opted against jittering the ITI to avoid an additional source of inter-trial variability. While this may allow for adaptation effects of this source, this would be approximately constant across trials and therefore less of a concern for our design. We have added text to the methods section to state this rationale.

      (7) Methods/Stimuli: The authors convincingly argue that focusing on single arms of the stimuli is an unlikely strategy, but did they ask for participants' strategies during debriefing?

      We are glad that the reviewer found our argument about whether or not participants may have focused on a single arm of the stimuli convincing. We did not ask participants about their strategies but even with such a debriefing, there would still remain a possibility that a participant may have used that strategy but were unaware that they were doing so. In any case, if participants were doing this it would have dampened the strength of our choice probability result. 

      (8) Methods/Procedure, Difficulty Titration: Why did the authors opt for manually adapting the difficulty level in a separate session rather than constantly and automatically titrating difficulty?

      We did this because calculating choice probability requires a comparison of trials with different choice outcomes but the same stimulus so continuously staircasing difficulty level during the experiment would have created a confound. Although this could have been corrected for in our regression, this would have entailed greater noise that we could avoid by staircasing in advance.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The manuscript by Ma et al. provides robust and novel evidence that the noctuid moth Spodoptera frugiperda (Fall Armyworm) possesses a complex compass mechanism for seasonal migration that integrates visual horizon cues with Earth's magnetic field (likely its horizontal component). This is an important and timely study: apart from the Bogong moth, no other nocturnal Lepidoptera has yet been shown to rely on such a dual-compass system. The research therefore expands our understanding of magnetic orientation in insects with both theoretical (evolution and sensory biology) and applied (agricultural pest management, a new model of magnetoreception) significance.

      The study uses state-of-the-art methods and presents convincing behavioural evidence for a multimodal compass. It also establishes the Fall Armyworm as a tractable new insect model for exploring the sensory mechanisms of magnetoreception, given the experimental challenges of working with migratory birds. Overall, the experiments are well-designed, the analyses are appropriate, and the conclusions are generally well supported by the data.

      Strengths

      (1) Novelty and significance: First strong demonstration of a magnetic-visual compass in a globally relevant migratory moth species, extending previous findings from the Bogong moth and opening new research avenues in comparative magnetoreception.

      (2) Methodological robustness: Use of validated and sophisticated behavioural paradigms and magnetic manipulations consistent with best practices in the field. The use of 5-minute bins to study the dynamic nature of the magnetic compass which is anchored to a visual cue but updated with a latency of several minutes, is an important finding and a new methodological aspect in insect orientation studies.

      (3) Clarity of experimental logic: The cue-conflict and visual cue manipulations are conceptually sound and capable of addressing clear mechanistic questions.

      (4) Ecological and applied relevance: Results have implications for understanding migration in an invasive agricultural pest with an expanding global range.

      (5) Potential model system: Provides a new, experimentally accessible species for dissecting the sensory and neural bases of magnetic orientation.

      Weaknesses

      While the study is strong overall, several recommendations should be addressed to improve clarity, contextualisation, and reproducibility:

      We thank Reviewer #1 for the positive and encouraging evaluation of our study. We appreciate the recognition of our work’s strengths and are grateful for the constructive feedback on the remaining weaknesses, which will guide and strengthen our revisions.

      Structure and presentation of results

      Requires reordering the visual-cue experiments to move from simpler (no cues) to more complex (cue-conflict) conditions, improving narrative logic and accessibility for non-specialists.

      Thank you for this thoughtful suggestion. While we appreciate the rationale for presenting results from simpler to more complex conditions, we kept the original sequence because it aligns with the logic of our study. Our initial aim was to determine whether fall armyworms use a magnetic compass integrated with visual cues, as shown in the Bogong moth. After establishing this phenotype, we then examined whether visual cues are required for maintaining magnetic orientation. We have also clarified in the Introduction that magnetic orientation in the Bogong moth relies on integration with visual cues, which provides readers with clearer context and improves the overall narrative flow.

      Ecological interpretation

      (a) The authors should discuss how their highly simplified, static cue setup translates to natural migratory conditions where landmarks are dynamic, transient or absent.

      Thank you for raising this important point. We agree that natural migratory environments provide visual information that is often dynamic, transient, or intermittently absent, in contrast to the simplified and static cue used in our indoor experiments. Our intention in using a minimal, static cue was to isolate and test the fundamental presence of magnetic–visual integration in fall armyworms under fully controlled conditions.To address the reviewer’s concern, we have added a brief note in the Discussion indicating that fall armyworms may encounter both static and dynamic luminance-based visual cues in nature, such as light–dark gradients created by terrain features or more stable celestial patterns. Although these natural cues differ from our simplified laboratory stimulus, they may similarly provide asymmetric visual structure that can be integrated with magnetic information. We also note that determining which natural visual cues support the magnetic–visual compass will be an important direction for future work.

      (b) Further consideration is required regarding how the compass might function when landmarks shift position, are obscured, or are replaced by celestial cues. Also, more consolidated (one section) and concrete suggestions for future experiments are needed, with transient, multiple, or more naturalistic visual cues to address this.

      Thank you for this constructive suggestion. We appreciate the reviewer’s point that additional consideration of how the compass might function under shifting, obscured, or celestial visual cues would strengthen the manuscript. Given the limited evidence currently available for this species, we have incorporated a concise and appropriately cautious discussion addressing these possibilities.

      Methodological details and reproducibility

      (a) It would be better to move critical information (e.g., electromagnetic noise measurements) from the supplementary material into the main Methods.

      Thank you for this helpful suggestion. In the revised manuscript, we have added the key electromagnetic noise measurements information to the main Methods section.

      (b) Specifying luminance levels and spectral composition at the moth's eye is required for all visual treatments.

      Thank you for this helpful comment. We have clarified in the Methods as well as the legend of Fig. S3 that both luminance levels and spectral composition were measured at the position corresponding to the moth’s head.

      (c) Details are needed on the sex ratio/reproductive status of tested moths, and a map of the experimental site and migratory routes (spring vs. fall) should be included.

      Thanks. We have added the reproductive status of the tested moths in the Methods, specifying that all individuals used were unmated 2-day-old adults.

      (d) Expanding on activity-level analyses is required, replacing "fatigue" with "reduced flight activity," and clarifying if such analyses were performed.

      Thank you for this comment. In this context, the term “fatigue” referred to the possibility that moths might gradually lose motivation or attention to orient when flying for an extended period in a simplified, artificial environment with limited sensory cues. Such a decrease in orientation motivation over time could, in theory, lead to a loss of individual orientation and consequently to the observed loss of group orientation. To test this possibility, we analyzed the orientation performance of each individual moth across different phases using the Rayleigh test. The r-value was used as a measure of individual directedness (higher r-values indicate stronger orientation). Our results showed that mean r-values did not differ significantly among the experimental phases (multiple comparisons, Table S2). This indicates that 25min measurement itself was not responsible for the loss of orientation. We did not perform a quantitative activity-level analysis in this study. However, as mentioned in Methods, flight activity was continuously monitored during the experiments by observing fluctuations in the pointer values on the experimental software, which corresponded to the moth’s rotational movements. If the pointer values remained unchanged for more than 10 seconds, the experimenter checked for wing vibrations by sound; if the moth had stopped flying, gentle tapping on the arena wall was used to stimulate renewed flight. Only individuals that maintained active flight throughout the experiment, with fewer than four instances of wingbeat cessation, were included in the analysis. We also mentioned that activity level analysis was not performed due to technical difficulties in the revised manuscript.

      Figures and data presentation

      (a) The font sizes on circular plots should be increased; compass labels (magnetic North), sample sizes, and p-values should be included.

      Thank you for this helpful suggestion. Regarding the compass labels and statistical reporting, our analysis provides significance levels as ranges rather than exact p-values; therefore, we clarified in the figure legends that the two dashed circles correspond to thresholds for statistical significance p = 0.05 and p = 0.01, respectively. Sample sizes are already indicated within each panel. To avoid visual clutter caused by displaying both magnetic North and South, we show only the magnetic South direction (mS) consistently across panels, which can improve readability.

      (b) More clarity is required on what "no visual cue" conditions entail, and schematics or photos should be provided.

      Thank you for this comment. In our study, the “no visual cue” condition refers to the absence of the black triangular landmark inside the flight simulator. To improve clarity, we have updated the legend of Fig. 4 to explicitly state this and have referred readers to the schematic in Fig. 1, which illustrates the structure of the flight simulator. These additions clarify what the “no visual cue” condition entails without requiring additional schematics.

      (c) The figure legends should be adjusted for readability and consistency (e.g., replace "magnetic South" with magnetic North, and for box plots better to use asterisks for significance, report confidence intervals).

      Thank you. Regarding the choice of compass labeling, we intentionally used magnetic South (mS) rather than magnetic North (mN) because the main population tested in our experiments represents the autumn migratory generation. During autumn, fall armyworms orient southward when visual and magnetic cues are aligned. Using magnetic South in the plots therefore provides a clearer representation of cue alignment in this season and avoids potential confusion when interpreting the combined visual–magnetic information.

      Conceptual framing and discussion

      (a) Generalisations across species should be toned down, given the small number of systems tested by overlapping author groups.

      Thank you for this valuable comment. In the revised manuscript, we have softened such statements in both abstract and maintext.

      (b) It requires highlighting that, unlike some vertebrates, moths require both magnetic and visual cues for orientation.

      Thank you for this helpful suggestion. We have added a sentence to the Discussion explicitly highlighting that, unlike some vertebrates capable of using magnetic information in the absence of visual cues, moths require the integration of both magnetic and visual cues for accurate orientation. This clarification emphasizes the distinct multimodal nature of compass use in migratory moths.

      (c) It should be emphasised that this study addresses direction finding rather than full navigation.

      Thank you for this important clarification. We have now made it explicit in the manuscript that our experiments address direction finding (i.e., orientation) rather than full navigation. This distinction is stated in both the Introduction and Discussion to clearly define the scope of the study.

      (d) Future Directions should be integrated and consolidated into one coherent subsection proposing realistic next steps (e.g., more complex visual environments, temporal adaptation to cue-field relationships).

      Thank you for this constructive suggestion. We agree that outlining realistic next steps is valuable. However, given the limited scope of the current data, we have only slightly expanded the existing forward-looking statements in the Discussion.

      (e) The limitations should be better discussed, due to the artificiality of the visual cue earlier in the Discussion.

      Thank you for this comment. We agree that the artificiality of the visual cue is an important limitation of the present study. Rather than extending speculative discussion, we have clarified this limitation in the revised Discussion and highlighted the key questions that future work must address.

      Technical and open-science points

      Appropriate circular statistics should be used instead of t-tests for angular data shown in the supplementary material.

      Thank you for this comment. We have addressed this point (Fig. S1) in the revised supplementary material.

      Details should be provided on light intensities, power supplies, and improvements to the apparatus.

      Thank you. Light intensities are reported as spectral irradiance measurements in Supplementary Materials, which provide full wavelength-resolved information for the illumination used, although a separate measurement of total illuminance (lux) was not performed. We have also added the requested information on the power supplies.

      The derivation of individual r-values should be clarified.

      Thanks. We have clarified in the revised manuscript.

      Share R code openly (e.g., GitHub).

      Thanks. We are in the process of organizing the relevant R code, but have not been able to upload it to GitHub before the current revision deadline. The code is available from the corresponding author upon request.\

      Some highly relevant - yet missing - recent and relevant citations should be added, and some less relevant ones removed..

      Thanks. We added one recent relevant reference to the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This work provided experimental evidence on how geomagnetic and visual cues are integrated, and visual cues are indispensable for magnetic orientation in the nocturnal fall armyworm.

      Strengths:

      Although it has been demonstrated previously that the Australian Bogon moth could integrate global stellar cues with the geomagnetic field for long-distance navigation, the study presented in this manuscript is still fundamentally important to the field of magnetoreception and sensory biology. It clearly shows that the integration of geomagnetic and visual cues may represent a conserved navigational mechanism broadly employed across migratory insects. I find the research very important, and the results are presented very well.

      We thank Reviewer #2 for the positive and encouraging evaluation of our study. We appreciate the recognition of our work’s strengths.

      Weaknesses:

      The authors developed an indoor experimental system to study the influence of magnetic fields and visual cues on insect orientation, which is certainly a valuable approach for this field. However, the ecological relevance of the visual cue may be limited or unclear based on the current version. The visual cues were provided "by a black isosceles triangle (10 cm high, 10 cm 513 base) made from black wallpaper and fixed to the horizon at the bottom of the arena". It is difficult to conceive how such a stimulus (intended to represent a landmark like a mountain) could provide directional information for LONG-DISTANCE navigation in nocturnal fall armyworms, particularly given that these insects would have no prior memory of this specific landmark. It might be a good idea to make a more detailed explanation of this question.

      We appreciate the constructive feedback on the weaknesses, which will guide and strengthen our revisions. To address the reviewer’s concern, we have added a brief note in the Discussion indicating that fall armyworms may encounter both static and dynamic luminance-based visual cues in nature, such as light–dark gradients created by terrain features or more stable celestial patterns. Although such natural cues differ from our simplified laboratory stimulus, they may represent intermittently sampled visual inputs that can be optimally integrated with magnetic information, whether the cues are static or changing, and brief periods without them may still allow the subsequent recovery of a stable long-distance orientation strategy.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major to Medium Suggestions

      (a) Reordering of Visual Cue Tests

      The manuscript currently presents cue-conflict experiments before the simpler "no visual cue" tests. For non-specialist readers, it would be more logical to start with the basic condition (no visual cues) and then move to progressively more complex ones. This provides a clearer and more logically sound narrative.

      For example, the results could first demonstrate that without visual cues, the moths fail to orient (both in darkness and uniform light), and then show that introducing a single salient cue (a triangle on the horizon) restores directed behaviour. This would help readers understand the logic of the progression and should be better integrated throughout the Results and Discussion.

      Thanks. We have responded this comment in Public Reviews.

      (b) Translating Key Findings to Realistic Scenarios (LL 333-344 or where suitable in Discussion, and mentioning that we utilised a reductionist principle first in Intro, but clearly articulated that it is very simplified)

      The main text (eg Discussion) should address how these findings translate to real-world conditions. The experimental design used a single, highly salient, and static cue, always aligned with the migratory direction. In nature, such a consistent landmark is unlikely-mountains or other features would shift position relative to the moth's trajectory as it flies.

      Key questions arise which need to be addressed:

      - How would the compass system adapt to changing landmark positions as the moth moves?

      - What happens when no landmarks are visible (e.g. over flat plains or cloudy nights)?

      - Would stellar or other cues take over in such cases? Your hypotheses, please.

      Addressing these points - and proposing specific future experiments (e.g. with transient or multiple visual cues)-would strengthen the ecological relevance of the findings and show a clear way forward.

      Thanks for your kind comments. We now explicitly state in the Introduction that our study employs a reductionist approach using a simplified visual environment to isolate magnetic-visual interactions. As the ecological questions raised by the reviewer cannot be addressed with the current dataset, we avoid extended speculation but have added brief clarification in the Discussion and addressed these points in the Public Reviews response. We also indicate that future work will need to examine the types of visual cues that can support magnetic orientation and how such cues couple with geomagnetic information.

      Technical and Methodological Points

      (a) Incomplete Methods Section

      Critical technical information (e.g. electromagnetic noise measurements) currently appears only in supplementary figure legends. All such details should be included in the main Methods section if the word count allows (or include a short section in the main text with reference to more details in the supplementary material).

      Thanks for your kind comments. We have addressed this as suggested in the Public Reviews.

      (b) Lighting Conditions

      Specify luminance levels (the amount of light emitted and passing through in quanta per unit of surface, eg m2) at the moth's eye and indicate whether spectral composition was consistent between treatments (with and without the visual cue).

      Thanks for your comments. We have responded to this point in the Public Reviews.

      (c) Figures

      - Increase font sizes on circular histograms.

      - Add compass labels (ideally magnetic North, mN, not south, etc, as it is usual in pertinent literature), sample sizes, and p-values on each panel.

      - Replace "magnetic South" (mS) indicators with magnetic North (mN) to align with convention.

      Thanks for your comments. We have responded to this point in the Public Reviews.

      (d) Migratory Expectations

      Include expected compass bearings for spring and autumn migrations (with citations) to relevant figures (Figure 2, 4, S2).

      Thanks for your comments. We have added the information that “We recently found that fall armyworms from the year-round range in Southwest China (Yunnan) exhibit seasonally appropriate migratory headings when flown outdoors in virtual flight simulators, heading northward in the spring and southward in the fall, and this seasonal reversal is controlled by photoperiod (Chen et al., 2023).” in Introduction. Thus, we didn’t offer expected seasonal compass bearings in Results section.

      (e) Add a map showing the experimental site and known migratory routes, clearly labelling spring vs fall routes. It would help justify expected headings.

      Thank you for this suggestion. At present, there are no experimentally validated migratory routes (e.g., through mark-release-recapture or tracking approaches) for the specific fall armyworm population used in our study. Because these routes have not been biologically confirmed, we didn’t offer a presumed migratory map that may imply unwarranted certainty.

      (f) Composition of Test Groups

      Indicate sex ratios and reproductive status (mated/unmated) of tested moths, if known or comment if unknown, as both can affect migratory motivation and behaviour.

      Thank you for this suggestion. We have responded to this point in the Public Reviews.

      (g) Role and Nature of Visual Cues

      While the results clearly show that orientation disappears without visual cues, the triangle cue is highly artificial. Well-studied Bogong moths are known to rely on views of Australian mountain ranges during their nocturnal migrations, but there is no evidence that armyworms use a similar strategy. Even for bogongs, it is not just one salient mountain always in front of them on migration. Discuss whether Fall Armyworm would encounter comparable natural cues in the field along their migratory route, or whether the triangle might simply provide a frame of reference rather than a true landmark.

      Thank you for this comments. We have responded to this point in the Public Reviews.

      (h) Future work could test:

      - More naturalistic sky cues (moonlight, star fields).

      - Varying the landmark's position relative to the magnetic field - slowly moving along - transient landmarks. Also, less salient landmarks and a more complex skyline, as it is usually more complex than just a single salient peak.

      Thank you for this comments. We have responded to this point in the Public Reviews. Brief discussion as suggested has been added to the revised manuscript.

      Minor Comments and Line-by-Line Suggestions

      L70 - Check citation (possibly Mouritsen 2018). Missing in the list of references.

      Thanks. This point has been addressed.

      L75 - Consider citing the new and highly relevant preprint:

      Pakhomov, A., Shapoval, A., Shapoval, N., & Kishkinev, D. (2025). Not All Butterflies Are Monarchs: Compass Systems in the Red Admiral (Vanessa atalanta). bioRxiv.

      Thanks. We have cited this reference.

      LL81-82 - Clarify vague phrasing; specify criteria for "good" vs "poor" orientation ability. Or reword/leave out.

      Thanks for your comments.

      L85 - "but one," not "bar one." 

      Thanks. Corrected.

      L124 - The 2 genetic citations are weakly linked to magnetoreception. We do not have a clear understanding of the insect magnetoreceptor and its underlying mechanism, so we simply cannot interpret genetic associations very well to underpin them to magnetoreception. For example, does noctuid's magnetic sense require a magnetised-based receptor and genes involved in biomineralization? Consider removing or softening claims. 

      Thanks. Adressed.

      LL123-126 - Define what for YOU constitutes "strong evidence" for magnetoreception (e.g. adaptive directional behaviour consistent with migratory orientation?). Is there such a thing as strong evidence at all?

      Thanks for your comments. We agree that terms such as “confirmed” or “strong evidence” can overstate the certainty of magnetoreception findings, given the ongoing debates in the field. In the revised manuscript, we have toned down.

      L153 - Indicate whether coils in NMF condition were powered or inactive.

      Thanks for your comments. Addressed.

      L163 - Justify use of multiple 5-min phases (e.g. temporal resolution of behaviour). It is confusing at the start, where first mentioned, and becomes clearer only towards the end, but it should be clearer at the start.

      Thanks for your comments. The assay was divided into these 5-min segments to provide the temporal resolution needed to detect changes in flight orientation as the relative alignment of magnetic and visual cues was systematically altered. We now clarify this earlier in the Results.

      LL167-171 - This is a good place where you can provide a map (main or supplementary with referencing) showing the study site and migration routes.

      Thanks for your suggestion. We have responded to this point in the Public Reviews.

      L174 - Avoid repetition of "expected."

      Thanks. Addressed.

      LL176-177 - Report 95% confidence intervals or equivalent and clarify which test (e.g. Moore's paired test) each p-value refers to.

      Thanks for your suggestion.

      LL189-191 - explain what fatigue means. I would remove fatigue and substitute it with "lowered flight activity". Also, the same statement comes later, so avoid repetitiveness and remove it in one place. The analysis of directedness is good throughout, but what about the analysis of activity level? Could you explain whether you did it or not, and if not, why, or if angular changes can serve as an activity proxy? Replace "fatigue" with "reduced flight activity." Avoid repetition. Clarify if activity level analysis was performed or if it was not, e.g. due to technical difficulties.

      Thanks for your comments. We have responded to this point in the Public Reviews.

      L196 - Note whether 95% CI overlaps with the expected direction. This is a crucial outcome.

      Thanks for your comments.

      LL203-205 - unclear, better to stick to "congruency", especially "initial congruency for the relationship between mN and visual cue" throughout.

      Thanks for your suggestions.

      L206 - Better to introduce a new subheading: "Laboratory-Reared Animals.".

      Thanks for your suggestion. A new subheading has been added in the revised manuscript.

      LL207-208 - Clarify which cues were available in Chen et al. (2023) and how they differ here.

      Thanks for your comments. In Chen et al. (2023), the moths oriented under an artificial starry sky together with optic flow cues. In contrast, our experiments intentionally removed both the starry-sky pattern and optic flow to avoid introducing additional visual information when testing magnetic-visual integration for orientation. We have added further clarification regarding the conditions used in Chen et al. (2023) in the revised manuscript.

      L228 - Use "lab-reared" consistently throughout the entire MS. Do not mix with lab-raised.

      Thanks. Addressed by consistently using “lab-raised”.

      Figure 2 - Confusing in parts, especially for people coming from birds and other vertebrates orientation background. At 12 o'clock, you usually expect either mN / gN (magnetic or geographic North) or the animal's own initial directional response used as control to compare the same animal's direction post-treatment. Here, your 6 o'clock is magnetic South in the first place - non-conventional. At 12 o'clock, better use mN or gN. Avoid using non-conventional references such as magnetic south. Remind readers of seasonally appropriate headings and refer to the map.

      Thanks. We have responded to this point in the Public Reviews.

      LL232-234 - Emphasize that cue-magnetic congruency is key. Highlight the most important point that the congruency between the seasonal migratory direction and visual cues is key, not that in spring/fall, visual cues must be towards or opposite to the migratory goal. But the visual cue could be in the migratory direction or opposite, or at an angle - this is for future direction.

      Thanks. We have responded to this point in the Public Reviews.

      Figure 2 and associated main text - highlight that you only tested the designs when in all seasons the salient and single visual cue was in the migratory direction (in spring it coincided with mN but in fall it was towards the magnetic south). Other directions of visual cues have not been tested, but for simplicity and consistency, you chose to do these ones as the first step, perhaps.

      Thank you for this insightful comment. Yes, our experiments tested only the conditions in which the salient and single visual cue was aligned with the migratory direction. Other angular relationships between visual cues and the magnetic field were not examined in this study. For simplicity and consistency, we focused on this alignment as a first step toward understanding magnetic-visual cue integration in migratory orientation. We now highlight this in the Fig. 2 legend.

      Figures captures/legends - hard to tell from the main text now, better to italicize figure caption text and visually space them from the main text.

      Thanks for your suggestions.

      LL 250-251 - mention to people more familiar with r - lowercase - what is the expected range for R uppercase. It is not bound 0-1 as r. Could it be negative? How large can it be?

      Thanks. Thanks for the comment. After revisiting Moore (1980) we think that R* cannot take negative values. However, since R* = R*/N^ (3/2), it is not bounded between 0 and 1. We didn’t find any concept of an upper bound in the paper (https://doi.org/10.2307/2335330).

      Figure 3 - Consider adding a horizontal line indicating the 5% significance threshold.

      Thanks for your suggestions.

      L 261 - need to have some narrative after the subheading before you insert Figure 3.

      Thanks. Addreseed.

      LL274-275 - highlight that the timeline of this congruency between mN and a landmark and the effect of this on directedness is not explored here, but worth doing in future. How long does a new congruency or a relationship between mN and a visual cue need to be exposed to the animal to regain its directional response? Clearly, it is just a question of time of exposure so that a new association is established. Suggest future work on time-dependent adaptation to new cue-field relationships.

      Thanks for your suggestion. We have now included this point as a future direction in the revised Discussion.

      Figure 4 & S4 - Replace letters with asterisks/brackets for significance. The use of the letter is confusing and unconventional.

      Thanks for your suggestion.

      Figure 4 caption - Clarify the main takeaway.

      Thanks for your suggestion.

      Figure 4 - bare minimum is confusing. I understand that you wanted to avoid "no visual cues" because, as long as the animal sees things, there are things to be used as visual cues, even if this is not the intention of the experimenter. However, it needs clarification and rewording. Better to be more specific, like "no black triangle and horizon were used, just the uniformly white cylinder", or something like that.

      Thanks for your comments. In our setup it accurately describes the intentional removal of both the black triangle and the horizon, leaving only the uniformly white cylinder as the visual environment. This wording was chosen to reflect the practical limitations of producing a perfectly symmetrical flight simulator under laboratory conditions, and we therefore prefer to retain the original phrasing.

      L328 - Remove Xu et al. (2021) citation (not relevant). This is an in vitro study with a protein which may not work exactly as it is claimed in the paper in vivo.

      Thanks. Citation removed.

      L349-350 - Clarify what "no visual cue" means (e.g., uniformly white cylinder, no horizon line). Include a photo or a schematic of the inner surface of the cylinder for this condition in the Supplementary Materials.

      Thanks. We have responded to this point in the Public Reviews.

      L380 & throughout - Replace "barely minimum visual cues" (BMVC) with "no visual cues", clarifying limitations in Methods, meaning that you can explain that absolutely no visual cues is practically impossible because, as long as there is light, animals can use some asymmetries as cues even if this is not the intention of the experimenter.

      Thank you for this comment. We have decided to retain the term “barely minimum visual cues (BMVC)” because it accurately describes our experimental condition, which is distinct from a true “no visual cues” environment. In the revised Figure legend, we now clarify that BMVC refers to conditions in which obvious visual cues (i.e., features such as the black triangle in Fig. 1) were removed, while acknowledging that complete elimination of all visual information is not possible under illuminated conditions.

      L396 - Be cautious when generalizing from two species tested by a research group that is not absolutely independent (some authors in bogong and armyworm works overlap). We saw examples in diurnal migratory butterflies (Monarchs), a more studied species than the armyworm, that the findings do not entirely translate to Red Admirals (Pakhomov et al. 2025 preprint mentioned). Suggestion to tone down any claims of broad generalisation throughout the manuscript.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      LL402-407 - Note that, unlike birds (e.g. European robins), moths appear to require both magnetic and visual cues for orientation, whereas birds, mole rats and some other animals can use magnetic cues alone.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      L410 - Specify that this is correct only in the Northern Hemisphere.

      Thank you for this comment. Addressed.

      LL415-416 - Acknowledge artificiality of single-cue setup (see the major comments above); integrate earlier in the Discussion.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      LL420-425 - Consolidate Future Directions into a single subsection; include more concrete experimental ideas, for example, using more naturalistic, numerous transient landmarks (could be done in a virtual maze with LEDs on the wall of the cylinder with cues moving with time). Multiple visual cues. Manipulating with salience of cues - less simplistic, less salient.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      L431 - Does this paper support this statement? I think it just tested the use of stellar cues in a zero magnetic field. It also dealt with direction finding, not navigation, which is a position-finding ability - a much more complex feat and might not be the ability of moths (requires further studies like with geographic and magnetic displacements, etc). Reword and check this. Show the distinction between direction finding and navigation.

      Thank you for this comment. We have reworded the relevant sentence to use “orientation” instead of “navigation”.

      L436-437 - Specify "global visual cues" (stellar, lunar, etc.) and merge all future directions into one coherent section.

      Thank you for this comment. Addressed.

      LL443-446 - A bit early to plan such studies because migratory direction could well be a complex multigenetic trait, so that you cannot approach it simply with the knock out of a single gene. The genetic basis of magnetic direction needs to be first demonstrated, which leads you to the Future Directions section.

      Thank you for this helpful comment. We fully agree that migratory direction is likely a complex multigenic trait, and our intention was not to imply that knocking out a single gene would be sufficient to explain magnetic or migratory orientation. Our statement aimed only to highlight that identifying candidate genes is an important first step toward understanding the genetic basis of magnetic orientation.

      Line 496 - Clarify whether optic flow was used (unlike previous studies).

      Thank you for pointing this out. Clarified.

      LL499-511 - Clarify the improvements done in Chen's system and their relevance.

      Thank you for pointing this out. We reworded this sentence “The Flash flight simulator system was developed based on the early design of the Mouritsen-Frost flight simulator and adapted for our experiments in Yuanjiang”.

      Line 531 - Report and compare light intensities between indoor and outdoor experiments.

      Thanks for this comment. Unfortunately, due to the sensitivity limits of our current equipment, we were unable to reliably measure outdoor light intensities at night. However, we did not perform any open-top outdoor flight-simulator experiments; instead, we used field-captured moths but conducted all behavioral tests indoors.

      L549 - Add make/model of power supplies.

      Thanks. Addressed.

      LL582-585 - Specify whether R code will be shared; recommend open access (e.g., GitHub, other open repositories). Reiterate the importance of open science and sharing all scripts. Also here, add citations to some studies where MMRT has been used recently.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      Line 592 - Explain how individual r-values were derived from optical encoder data.

      Thank you for this comment. Addressed.

      L842-843 - t-tests are inappropriate for angular data; use circular tests (Watson-Williams, Mardia-Watson-Wheeler, etc.).

      Thank you for this comment. Addressed.

      L865 - Reword to avoid repetition of "fall." Example: "In field captured armyworms during fall migration".

      Thank you for this comment. Addressed.

      LL882-885 - Improve phrasing and language here. Confirming that - no colon after. "Both the acrylic plate and diffusion paper." Confirm relevance of spectra to moth visual sensitivity - add relevant citation to original studies showing that.

      Thank you for this comment. Addressed.

      L886 - Reword "uniform" - does not look uniform to me.

      Thank you for this comment. Addressed.

      Reviewer #2 (Recommendations for the authors):

      The first two sentences of the abstract ("The navigational mechanisms employed by nocturnal insect migrants remain to be elucidated in most species. Nocturnal insect migrants are often considered to use the Earth's geomagnetic field for navigation, yet the underlying mechanisms of magnetoreception in insects remain elusive") are somewhat redundant. The authors may consider rewriting them.

      Thank you for pointing this out. We have rewritten this opening to provide a more concise and non-repetitive introduction.

    1. Seals Allers—and her fifteen-year-old son, Michael—are working on their own data-driven contribution to the maternal and infant health conversation: a platform and app called Irth—from birth, but with the b for bias removed (figure 1.8). One of the major contributing factors to poor birth outcomes, as well as maternal and infant mortality, is biased care. Hospitals, clinics, and caregivers routinely disregard Black women’s expressions of pain and wishes for treatment.81 As we saw, Serena Williams’s own story almost ended in this way, despite the fact that she is an international tennis star. To combat this, Irth operates like an intersectional Yelp for birth experiences. Users post ratings and reviews of their prenatal, postpartum, and birth experiences at specific hospitals and in the hands of specific caregivers. Their reviews include important details like their race, religion, sexuality, and gender identity, as well as whether they felt that those identities were respected in the care that they received. The app also has a taxonomy of bias and asks users to tick boxes to indicate whether and how they may have experienced different types of bias. Irth allows parents who are seeking care to search for a review from someone like them—from a racial, ethnic, socioeconomic, and/or gender perspective—to see how they experienced a certain doctor or hospital.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Amanda Christopher.

      "taxonomy of bias" love this term and didn't think about it as biased originally.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. The model considers a population subdivided in groups, each group has a single asexually-reproducing breeder, other group members (subordinates) can perform two types of tasks called "work" or "defense", individuals have different ages, individuals can disperse between groups, each individual has a dominance rank that increases with age, and upon death of the breeder a new breeder is chosen among group members depending on their dominance. "Workers" pay a reproduction cost by having their dominance decreased, and "defenders" pay a survival cost. Every group member receives a survival benefit with increasing group size. There are 6 genetic traits, each controlled by a single locus, that control propensities to help and disperse, and how task choice and dispersal relate to dominance. To study the effect of group augmentation without kin selection, the authors cross-foster individuals to eliminate relatedness. The paper allows for the evolution of the 6 genetic traits under some different parameter values to study the conditions under which division of labor evolves, defined as the occurrence of different subordinates performing "work" and "defense" tasks. The authors envision the model as one of vertebrate division of labor.

      The main conclusion of the paper is that group augmentation is the primary factor causing the evolution of vertebrate division of labor, rather than kin selection. This conclusion is drawn because, for the parameter values considered, when the benefit of group augmentation is set to zero, no division of labor evolves and all subordinates perform "work" tasks but no "defense" tasks.

      Strengths:

      The model incorporates various biologically realistic details, including the possibility to evolve age polytheism where individuals switch from "work" to "defense" tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model and its analysis are limited, which in my view makes the results insufficient to reach the main conclusion that group augmentation and not kin selection is the primary cause of the evolution of vertebrate division of labor. There are several reasons.

      (1) First, although the main claim that group augmentation drives the evolution of division of labor in vertebrates, the model is rather conceptual in that it doesn't use quantitative empirical data that applies to all/most vertebrates and vertebrates only. So, I think the approach has a conceptual reach rather than being able to achieve such a conclusion about a real taxon.

      We appreciate the reviewer’s point that our model does not incorporate quantitative empirical data across vertebrate taxa. This is indeed a limitation and reflects the current lack of fine-scale datasets on task division, the influence of life-history traits, and the fitness consequences of different cooperative activities in vertebrates. One of our aims, however, is precisely to stimulate such empirical work by highlighting the value of examining division of labor in species inhabiting harsh environments, considering age/size/dominance structure when evaluating variation in cooperative activities, and incorporating defense behaviors more consistently into analyses of helping, especially since defenders are often overlooked relative to the classic helpers-at-the-nest that provision offspring. The model therefore remains directly relevant to vertebrate systems because it departs from insect-inspired approaches that focus on fitness outcomes based solely in maximizing colony productivity. Instead, it incorporates direct fitness benefits to group members, an essential feature of vertebrate cooperative breeding and of other systems with fertile “workers,” as we clarified in the discussion.

      (2) Second, I think that the model strongly restricts the possibility that kin selection is relevant. The two tasks considered essentially differ only by whether they are costly for reproduction or survival. "Work" tasks are those costly for reproduction and "defense" tasks are those costly for survival. The two tasks provide the same benefits for reproduction (eqs. 4, 5) and survival (through group augmentation, eq. 3.1). So, whether one, the other, or both helper types evolve presumably only depends on which task is less costly, not really on which benefits it provides. As the two tasks give the same benefits, there is no possibility that the two tasks act synergistically, where performing one task increases a benefit (e.g., increasing someone's survival) that is going to be compounded by someone else performing the other task (e.g., increasing that someone's reproduction). So, there is very little scope for kin selection to cause the evolution of labor in this model. Note synergy between tasks is not something unusual in division of labor models, but is in fact a basic element in them, so excluding it from the start in the model and then making general claims about division of labor is unwarranted. In their reply, the authors point out that they only consider fertility benefits as this, according to them, is what happens in cooperative breeders with alloparental care; however, alloparental care entails that workers can increase other's survival *without group augmentation*, such as via workers feeding young or defenders reducing predator-caused mortality, as a mentioned in my previous review but these potentially kin-selected benefits are not allowed here.

      We understand the reviewer’s concern that our model restricts the scope for kin-selected benefits by not including task-specific synergy effects—specifically, help that directly increases the survival of group members (e.g., load-lightening via feeding young, or predator defense that reduces mortality of breeders or offspring independently of group augmentation). We agree that such effects can occur in some cooperative breeders, and that they can, in principle, generate indirect fitness benefits. However, even when helpers increase the survival of breeders or reduce parental investment per offspring, these effects generally translate into higher breeder productivity—either via increased fecundity, increased survival to the next breeding attempt, or increased investment in subsequent broods. Thus, although we treat benefits in terms of enhanced breeder productivity, this formulation implicitly captures a range of help-related effects that ultimately improve the reproductive output of the breeders, including those mediated through increased survival. For this reason, we believe that the model remains relevant for vertebrate systems despite not representing each pathway separately.

      (3) Third, the parameter space is understandably little explored. This is necessarily an issue when trying to make general claims from an individual-based model where only a very narrow parameter region of a necessarily particular model can be feasibly explored. As in this model the two tasks ultimately only differ by their costs, the parameter values specifying their costs should be varied to determine their effects. In the main results, the model sets a very low survival cost for work (yh=0.1) and a very high survival cost for defense (xh=3), the latter of which can be compensated by the benefit of group augmentation (xn=3). Some limited variation of xh and xn is explored, always for very high values, effectively making defense unevolvable except if there is group augmentation. In this revision, additional runs have been included varying yh and keeping xh and xn constant (Fig. S6), so without addressing my comment as xn remains very high. Consequently, the main conclusion that "division of labor" needs group augmentation seems essentially enforced by the limited parameter exploration, in addition to the second reason above.

      As we have explained in previous revisions, the costs associated with work and defense are not directly comparable because they affect different fitness components: work costs reduce dominance, whereas defense costs reduce survival. Whether a particular cost is “high” or “low” can only be evaluated by examining the evolved reaction norms and identifying the ranges over which these norms change. For this reason, we focused on parameter ranges that actually generate shifts in reaction norms rather than presenting large regions of parameter space where nothing changes.

      We also reiterate that we did in fact explore broader parameter ranges than those shown in the main text. Additional analyses, including those specifically designed to identify conditions under which division of labor evolves under kin selection alone, are provided in the Supplementary Material. Specifically, Figure S1 addresses the point raised by the “need” of group augmentation benefits for defense to evolve, by increasing the baseline survival x<sub>0</sub>.

      We now include one additional figure in the Supplementary Material with a lower value for the benefit of group size (x<sub>n</sub> = 1 instead of x<sub>n</sub> = 3), and we extended the range of x<sub>h</sub> to include lower values (x<sub>h</sub> = 1). As we can see in Figure S7 and Table S8, group augmentation benefits are still the primary reason for individuals to group (see dispersal values). For low benefits of group augmentation, defense evolves in harsh environments in the absence of kin selection, and in benign environments when both direct and indirect fitness benefits take place. We have also now expanded the results section to include these last results. Note that we also checked even lower values for x<sub>h</sub> under the only kin selection implementation, with results being qualitatively similar, but chose not to include them in the manuscript since it is already a very long Supplementary Material. Here are the averages for two examples with x<sub>h</sub> = 0.1 and when we promote division of labor:

      Author response table 1.

      In short, the conclusion that division of labor requires group augmentation is not an artifact of limited parameter exploration. It arises because kin selection alone favors division of labor only under highly restrictive parameter combinations, whereas including direct fitness benefits substantially expands the conditions under which division of labor evolves. This pattern is consistent across the full set of parameter combinations we examined.

      (4) Fourth, my view is that what is called "division of labor" here is an overinterpretation. When the two helper types evolve, what exists in the model is some individuals that do reproduction-costly tasks (so-called "work") and survival-costly tasks (so-called "defense"). However, there are really no two tasks that are being completed, in the sense that completing both tasks (e.g., work and defense) is not necessary to achieve a goal (e.g., reproduction). In this model there is only one task (reproduction, equation 4,5) to which both helper types contribute equally and so one task doesn't need to be completed if completing the other task compensates for it; instead, it seems more fitting to say that there are two types of helpers, one that pays a fertility cost and another one a survival cost, for doing the same task. So, this model does not actually consider division of labor but the evolution of different helper types where both helper types are just as good at doing the single task but perhaps do it differently and so pay different types of costs. In this revision, the authors introduced a modified model where "work" and "defense" must be performed to a similar extent. Although I appreciate their effort, this model modification is rather unnatural and forces the evolution of different helper types if any help is to evolve.

      In previous models of division of labor in eusocial insects, the implicit benefit is also colony-level productivity (see Beshers & Fewell, 2001, for a review of division of labor in insects). Even in humans, division of labor functions as a means to increase efficiency toward achieving a shared goal. Our model adopts this same interpretation, as outlined in the Introduction, but extends it by considering that different tasks may impose different fitness costs, an aspect that has been largely overlooked in the existing literature. It is precisely because fitness outcomes are not fully shared among group members in vertebrates that distinguishing these cost structures matters. Unlike eusocial insects with sterile workers, vertebrate helpers can obtain direct fitness benefits, and the model explicitly accounts for these direct benefits—something absent from most insect-inspired approaches even when direct fitness benefits can also arise in some of those systems. Thus, our framework is not simply evolving “two types of helpers doing the same task,” but instead evolving specialization in different cooperative roles that carry different fitness consequences. It is therefore suitable for our model to treat contributions to breeder productivity as a common currency, while allowing individuals to specialize in different cost-distinct forms of help.

      Finally, regarding synergy: with the extension introduced in the previous revision, we now incorporate the requirement that multiple forms of help must be performed for the group to achieve maximal reproductive output. This directly addressed the reviewer’s concern about synergistic dependencies between tasks and aligns our framework with the kinds of complementarity highlighted in other models of division of labor.

      In summary, the structure of the model is consistent with both the theoretical literature on division of labor and the biological realities of vertebrate cooperative systems. We believe it is important for future models to explicitly consider the different fitness benefits and costs associated with distinct cooperative behaviors, and hope that our framework encourages more targeted empirical research on division of labor in vertebrates (e.g. inclusion of data on defense, life-history traits and environmental challenges) to better inform future modelling efforts.

      I should end by saying that these comments don't aim to discourage the authors, who have worked hard to put together a worthwhile model and have patiently attended to my reviews. My hope is that these comments can be helpful to build upon what has been done to address the question posed.

      We appreciate the reviewer’s thoughtful and constructive comments, as well as the time invested in evaluating our work. These insights have greatly helped us improve the clarity and overall quality of the manuscript. We hope that the revisions and additional clarifications we have provided adequately address all remaining concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors aimed to characterize neurocomputational signals underlying interpersonal guilt and responsibility. Across two studies, one behavioral and one fMRI, participants made risky economic decisions for themselves or for themselves and a partner; they also experienced a condition in which the partners made decisions for themselves and the participant. The authors also assessed momentary happiness intermittently between choices in the task. Briefly, results demonstrated that participants' self-reported happiness decreased after disadvantageous outcomes for themselves and when both they and their partner were affected; this effect was exacerbated when participants were responsible for their partner's low outcome, rather than the opposite, reflecting experienced guilt. Consistent with previous work, BOLD signals in the insula correlated with experienced guilt, and insula-right IFG connectivity was enhanced when participants made risky choices for themselves and safe choices for themselves and a partner.

      Strengths:

      This study implements an interesting approach to investigating guilt and responsibility; the paradigm in particular is well-suited to approach this question, offering participants the chance to make risky v. safe choices that affect both themselves and others. I appreciate the assessment of happiness as a metric for assessing guilt across the different task/outcome conditions, as well as the implementation of both computational models and fMRI.

      We thank Reviewer 1 for their positive assessment of our manuscript.

      Weaknesses:

      In spite of the overall strengths of the study, I think there are a few areas in which the paper fell a bit short and could be improved.

      We thank Reviewer 1 for their comments, which we have used to improve our manuscript. We hope that these changes address the issues raised by the Reviewer.

      (1) While the framing and goal of this study was to investigate guilt and felt responsibility, the task implemented - a risky choice task with social conditions - has been conducted in similar ways in past research that were not addressed here. The novelty of this study would appear to be the additional happiness assessments, but it would be helpful to consider the changes noted in risk-taking behavior in the context of additional studies that have investigated changes in risky economic choice in social contexts (e.g., Arioli et al., 2023 Cerebral Cortex; Fareri et al., 2022 Scientific Reports).

      We certainly agree that several previously published studies have relied on risky choice tasks with social conditions. In this revised version, we now mention these two studies in the substantially revised Introduction.

      (2) The authors note they assessed changes in risk preferences between social and solo conditions in two ways - by calculating a 'risk premium' and then by estimating rho from an expected utility model. I am curious why the authors took both approaches (this did not seem clearly justified, though I apologize if I missed it). Relatedly, in the expected utility approach, the authors report that since 'the number of these types of trials varied across participants', they 'only obtained reliable estimates for [gain and loss] trials in some participants' - in study 1, 22 participants had unreliable estimates and in study 2, 28 participants had unreliable estimates. Because of this, and because the task itself only had 20 gains, 20 losses, and 20 mixed gambles per condition, I wonder if the authors can comment on how interpretable these findings are in the Discussion. Other work investigating loss aversion has implemented larger numbers of trials to mitigate the potential for unreliable estimates (e.g., Sokol-Hessner et al., 2009).

      We agree that we have not clearly justified why we have taken two approaches to assess risk preferences. In short, while the expected utility approach is a more comprehensive method to model a participant’s choices, we had not sufficiently considered the need for the large number of trials required to fit such models when designing our experiment. Calculating the risk premium was the less comprehensive, simpler alternative that we could calculate for all participants. We have now mentioned this fact in the Results section. As the only difference in risk aversion across conditions was found in Study 1 using the expected utility method, which could only be successfully applied in a minority of participants, we believe that this difference should not be taken as a strong finding. We have now mentioned this fact in the revised Discussion.

      (3) One thing seemingly not addressed in the Discussion is the fact that the behavioral effect did not replicate significantly in study 2.

      We agree that we had not sufficiently discussed the fact that there were (slight but significant) differences in risk preferences between the Solo and Social conditions in Study 1 but not in Study 2. We now do so in the revised Discussion, and write the following:

      “Participants made slightly more risk-seeking choices when deciding for themselves than for both themselves and the partner in Study 1, but this difference disappeared in Study 2. The ρ parameter on which this finding in Study 1 is based could only be estimated in a minority of participants due to a relatively low number of trials, which suggests that this finding may not be very reliable. The simpler and more robust method (evaluation of a risk premium) showed no difference in risk aversion across conditions in either study. Overall, we believe that we do not have strong evidence of differences in risk preferences across conditions.”

      (4) Regarding the computational models, the authors suggest that the Reponsibility and Responsibility Redux models provided the best fit, but they are claiming this based on separate metrics (e.g., in study 1, the redux model had the lowest AIC, but the responsibility only model had the highest R^2; additionally, the basic model had the lowest BIC). I am wondering if the authors considered conducting a direct model comparison to statistically compare model fits.

      We agree that we should run formal, direct model comparison tests. We now ran likelihood-ratio tests which showed that the Responsibility model was the best. We now report this in the Results section, just below Table 1:

      “A likelihood ratio test (Equation 9) revealed that the Responsibility model fitted better than all the other models, including the Responsibility Redux model (Study 1: all LR ≥ 47.36, p < 0.0001; Study 2: all LR ≥ 77.83, p < 0.0001).”

      (5) In the reporting of imaging results, the authors report in a univariate analysis that a small cluster in the left anterior insula showed a stronger response to low outcomes for the partner as a result of participant choice rather than from partner choice. It then seems as though the authors performed small volume correction on this cluster to see whether it survived. If that is accurate, then I would suggest that this result be removed because it is not recommended to perform SVC where the volume is defined based on a result from the same whole-brain analysis (i.e., it should be done a priori).

      As indicated in the manuscript, the small insula cluster centered at [-28 24 -4] and shown in Figure 4F survived corrections for multiple tests within the anatomically-defined anterior insula (based on the anatomical maximum probability map described in Faillenot et al., 2017), which is independent of the result of our analysis. Functionally defining the small volume based on the same data would indeed be circular and misleading “double-dipping”. We have most certainly NOT done this. The reason why we selected the anterior insula is because it is one of the regions most frequently associated with guilt (see the explanations in our Introduction, which refers for example to Bastin et al., 2016; Lamm & Singer, 2010; Piretti et al., 2023). Thus we feel that performing small-volume correction within the anatomically-defined anterior insula is a valid analysis. We fully acknowledge that, independently of any correction, the effect and the cluster are small. We now write:

      “We found a weak response in a small cluster within the left anterior insula (peak T = 3.95, d = 0.59, 22 voxels, peak intensity at [-28 24 -4]; Figure 4F). Given the documented association between anterior insula and guilt (see Introduction), we proceeded to test whether this result survived correction for family-wise errors due to multiple comparisons restricted to the left anterior insula gray matter [defined anatomically and thus independently from our findings, as the anterior short gyrus, middle short gyrus, and anterior inferior cortex in an anatomical maximum probability map (Faillenot et al., 2017)]. This correction resulted in a p value of 0.024. This result, although it is only a small effect in a small cluster, is consistent with the mixed model analysis reported earlier.”

      Reviewer #2 (Public review):

      Summary

      This manuscript focuses on the role of social responsibility and guilt in social decision-making by integrating neuroimaging and computational modeling methods. Across two studies, participants completed a lottery task in which they made decisions for themselves or for a social partner. By measuring momentary happiness throughout the task, the authors show that being responsible for a partner's bad lottery outcome leads to decreased happiness compared to trials in which the participant was not responsible for their partner's bad outcome. At the neural level, this guilt effect was reflected in increased neural activity in the anterior insula, and altered functional connectivity between the insula and the inferior frontal gyrus. Using computational modeling, the authors show that trial-by-trial fluctuations in happiness were successfully captured by a model including participant and partner rewards and prediction errors (a 'responsibility' model), and model-based neuroimaging analyses suggested that prediction errors for the partner were tracked by the superior temporal sulcus. Taken together, these findings suggest that responsibility and interpersonal guilt influence social decision-making.

      Strengths

      This manuscript investigates the concept of guilt in social decision-making through both statistical and computational modeling. It integrates behavioral and neural data, providing a more comprehensive understanding of the psychological mechanisms. For the behavioral results, data from two different studies is included, and although minor differences are found between the two studies, the main findings remain consistent. The authors share all their code and materials, leading to transparency and reproducibility of their methods.

      The manuscript is well-grounded in prior work. The task design is inspired by a large body of previous work on social decision-making and includes the necessary conditions to support their claims (i.e., Solo, Social, and Partner conditions). The computational models used in this study are inspired by previous work and build on well-established economic theories of decision-making. The research question and hypotheses clearly extend previous findings, and the more traditional univariate results align with prior work.

      The authors conducted extensive analyses, as supported by the inclusion of different linear models and computational models described in the supplemental materials. Psychological concepts like risk preferences are defined and tested in different ways, and different types of analyses (e.g., univariate and multivariate neuroimaging analyses) are used to try to answer the research questions. The inclusion and comparison of different computational models provide compelling support for the claim that partner prediction errors indeed influence task behavior, as illustrated by the multiple model comparison metrics and the good model recovery.

      We thank the reviewer very much for their comprehensive description of our study and the positive assessment of our study and approach.

      Weaknesses

      As the authors already note, they did not directly ask participants to report their feelings of guilt. The decrease in happiness reported after a bad choice for a partner might thus be something else than guilt, for example, empathy or feelings of failure (not necessarily related to guilt towards the other person). Although the patterns of neural activity evoked during the task match with previously found patterns of guilt, there is no direct measure of guilt included in the task. This warrants caution in the interpretation of these findings as guilt per see.

      We fully agree that not directly asking participants about feelings of guilt is a clear limitation of our study. While we already mention this in our Discussion, we have expanded our discussion of the consequences on the interpretation of our results along the lines described by the reviewer in the revised manuscript. We would like to thank the reviewer for proposing these lines of thought, and have now made the following changes to the text:

      In the first paragraph of the discussion, we now write: “Being responsible for choosing a lottery that yielded a low outcome for a partner made our participants feel worse than witnessing the same outcome resulting from their partner’s choice, which we interpret as interpersonal guilt; although we note that we have not asked participants specifically about which emotion they felt in these situations.

      Later on, in the third paragraph focusing on the anterior insula, we now write: “This replicates a large body of evidence associating aIns with feelings of guilt evoked during social decisions (see Introduction). Because we have neither asked our participants specifically what they felt in these situations, nor specifically whether they experienced guilt, we cannot exclude the possibility that they have instead or in addition felt empathy for their partner, a feeling of failure or bad luck, or some other emotion.”

      As most comparisons contrast the social condition (making the decision for your partner) against either the partner condition (watching your partner make their decision) or the solo condition (making your own decision), an open question remains of how agency influences momentary happiness, independent of potential guilt. Other open questions relate to individual differences in interpersonal guilt, and how those might influence behavior.

      How agency influences momentary happiness or variations thereof during the course of an experiment such as ours is an interesting question in itself. We now ran linear mixed models assessing agency (i.e. we compared happiness in conditions Solo & Social conditions vs. Partner condition), which revealed lower happiness in Solo and Social conditions (i.e. when it was the participant’s turn to decide) in both studies. This is interesting in itself and may reflect the drive behind responsibility aversion reported by Edelson et al.’s 2018 study: being assigned the role of the decider in a social setting may make people slightly unhappy, perhaps due to “weight of the responsibility”. We now report these findings in the Results section, including this proposed explanation; because we were not specifically interested in responsibility aversion, we do not discuss this further in the Discussion. The edited text is under the new subsection entitled ‘Momentary happiness: effects of agency, responsibility and guilt’, on page 12:

      “Next, we assessed whether happiness varied depending on the participant’s agency (Social + Solo vs. Partner), and found happiness to be lower when the participant chose, independent of the outcome (Study 1: t(3600) = -3.92, p = 0.00009, β = -0.14, 95% CI = [-0.20 -0.07]; Study 2: t(2870) = -6.07, p = 0.000000001, β = -0.24, 95% CI = [-0.31 -0.16]). . This is interesting in itself and may reflect the drive behind responsibility aversion reported by Edelson et al.’s 2018 study: being assigned the role of the decider in a social setting may make people slightly unhappy, perhaps due to “weight of the responsibility”. To specifically search for a sign of interpersonal guilt, [...]”

      Regarding individual differences: this is a very interesting topic that we have not addressed here due to the (relatively) small number of participants in our studies, but we might consider this for future follow-up studies, which we mention in the Discussion paragraph regarding open questions.

      This manuscript is an impressive combination of multiple approaches, but how these different approaches relate to each other and how they can aid in answering slightly different questions is not very clearly described. The authors could improve this by more clearly describing the different methods and their added value in the introduction, and/or by including a paragraph on implications, open questions, and future work in the discussion.

      We thank the reviewer for their appreciation of our complementary approach, and agree that we had not sufficiently explained the reasons why we used several methods. We have now added a paragraph explaining this at the end of the Introduction (page 5):

      “We analysed our behavioural data using several complementary methods: choices were modelled with mixed-effects regressions serving as manipulation checks; risk preferences expressed in choices were assessed using a comprehensive expected utility model as well as with a simpler, more robust “risk premium” approach; and happiness data were fitted, in addition to the computational models, with several linear mixed models to assess the impact of both the participant’s and their partner’s rewards, the impact of agency and their interactions. Inspired by findings reported in previous neuroimaging of social emotions, we also used several methods to analyse our fMRI data, including conventional methods (both region-of-interest and mass univariate); mixed-effects regression models; computational model-based analyses (inspired by e.g. Konovalov et al., 2021; Rutledge et al., 2014); and functional connectivity (e.g. Edelson et al., 2018; Konovalov et al., 2021). The behavioural modelling is thus complemented by neuroimaging analyses that offer insight about both the activity in regions associated with guilt as well as their place in a wider network, providing an in-depth comprehensive analysis of the mechanisms behind guilt evoked by social responsibility.”

      In addition, as suggested we added the following paragraph on open questions and future work in the Discussion:

      “Several open questions remain at the end of this study. As discussed above, asking participants directly about which emotions they have felt during the different stages of this task would allow us to link subjective experience with our analytical measures. Testing more participants would allow us to assess the impact of inter-individual variations in personality traits on the experience as well as the behavioural and neural correlates of guilt and responsibility. Using more trials in the experiment would allow separate modelling of risk preferences in gain and loss trials in each experimental condition using expected utility models, and could allow testing whether changes in momentary happiness affect subsequent choices. Varying partner identities (friends, strangers, artificial agent) could reveal the impact of social discounting on guilt and responsibility. In sum, we believe that this experimental approach lends itself very well to the study of several aspects of social emotions.”

      However, taken together, this study provides useful insights into the neural and behavioral mechanisms of responsibility and guilt in social decision-making and how they influence behavior. 

      We thank the reviewer again for their appreciation of our work and hope that our revisions improved the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The majority of my suggestions are in the public review, so I will not repeat them here. But in general, I like the paper, and in addition to my other comments, I think that there should be more discussion of the potential limitations of the study and conclusions that can be drawn. I also thought parts of the results were a little hard to follow, particularly in the 'momentary happiness' section. Perhaps an additional subsection here might help with flow.

      We agree that we could have discussed further the limitations of our study and the conclusions that can be drawn from it, which we have now done in the last paragraphs of the Discussion in this revised version.

      To improve the structure of the section on ‘momentary happiness’, we separated this section into two, entitled: ‘Momentary happiness: links to reward‘ and ‘Momentary happiness: effects of agency, responsibility and guilt’, which should facilitate the reading of this long section. We proceeded in a similar manner for the Choices section, which is now subdivided into ‘Choices: manipulation check’ and ‘Choices: risk preferences’. We believe that these changes have indeed improved the readability of our manuscript.

      Reviewer #2 (Recommendations for the authors):

      Overall, I believe this manuscript was well-designed, consists of extensive analyses, and provides interesting new insights into the mechanisms underlying social decision-making. I mostly have some clarifying questions and minor comments, which are described below. 

      (1) Integration of prior findings in the first paragraphs of the Introduction. Although all the previous work described in the 2nd-5th paragraph introduction is interesting, it felt a bit like an enumeration of findings rather than an integrated introduction leading to the current research question. At the end of paragraph 5, it becomes clear how these findings relate to the current research question, but I believe it will improve the flow and readability of the introduction if this becomes clear earlier on.

      We agree that we could have integrated the cited previous work into the Introduction so that the text builds up to the research question. We have now extensively reworked several paragraphs in the Introduction (pages 3-5) and hope that these changes have made it easier to follow.

      (2) For the risk attitudes (Choices), you describe pooling the gains and losses and then comparing the social and solo conditions. I was wondering whether you also looked at potential differences between gains and losses (delta measure) for social versus the solo condition (so a comparison of the delta). Based on prior work, I can imagine that the difference in risk attitudes for gains and losses might differ when making decisions for yourself versus when you're doing it for a partner. In general, I was wondering how you explain these findings, as there is also a lot of work showing differences in risk-taking patterns for gains and losses.

      We agree that we could have compared delta measures between solo and social conditions. However, as we describe in the Results section and comment on in the Discussion, the relatively low number of trials made separate fitting of gain and loss trials across conditions difficult. While this question could thus be addressed in subsequent versions of our experiment with more trials, such a fine-grained analysis of the decisions was not the focus of our current study.

      (3) On page 11, you state: "in particular the partner's reward prediction errors resulting from the participants' decisions, i.e. those pRPE for which participants were responsible." From the results described in the paragraph above, this doesn't become clear (e.g., there's no distinction made between social_pRPE and partner_pRPE in the text), as it only discusses differences in weights between pRPE and sRPE. I would recommend including some more information in the main text on these main modeling findings, so one doesn't have to go to the Supplemental Materials to understand them.

      We did indeed fail to report these findings in the text! We thank the reviewer for pointing this out. We have now edited this passage as follows:

      “Crucially, we find here that the partner’s reward prediction errors (social_pRPE and partner_pRPE) contributed to explaining changes in participants’ momentary happiness: the Responsibility and ResponsibilityRedux models explained the data better than the models without these parameters (see Table 1). In particular, the partner’s reward prediction errors resulting from the participants’ decisions (social_pRPE), i.e. those pRPE for which participants were responsible, contributed to explaining our data (weights for social_pRPE were greater than 0: Responsibility model: Study 1: Z = 2.85, p = 0.004, Study 2: Z = 3.26, p = 0.001; Responsibility Redux model: Study 1: Z = 2.93, p = 0.003, Study 2: Z = 3.30, p = 0.001; weights for social_pRPE tended to be higher than weights for partner_pRPE: Responsibility model: Study 1: Z = 2.14, p = 0.033; Study 2: Z = 1.41, p = 0.16).”

      (4) The functional connectivity findings seem to come out of nowhere and are not introduced or described anywhere prior in the manuscript. It is therefore not completely clear why you conducted these analyses, or what they add above and beyond previous analyses. Already introducing this method earlier on would fix that.

      We agree that we could have introduced functional connectivity analyses earlier in the text, particularly given the many previous studies in our field using this technique. We have now done this at the end of a new last paragraph of the Introduction:

      “Inspired by findings reported in previous neuroimaging of social emotions, we also used several methods to analyse our fMRI data, including conventional methods (both region-of-interest and mass univariate); mixed-effects regression models; computational model-based analyses (inspired by e.g. Konovalov et al., 2021; Rutledge et al., 2014); and functional connectivity (e.g. Edelson et al., 2018; Konovalov et al., 2021). The behavioural modelling is thus complemented by neuroimaging analyses that offer insight about both the activity in regions associated with guilt as well as their place in a wider network, providing an in-depth comprehensive analysis of the mechanisms behind guilt evoked by social responsibility.”

      (5) For the functional connectivity findings: I was wondering why you only looked at the choice phase, and not at the feedback phase. I understand that previous work focused on the choice phase, but for the purpose of this study (focus on guilt), I can imagine it is also interesting to see what happens with feedback. In the discussion, you also state "How we feel when we witness our decisions' consequences on others is an important signal to consider when attempting to make good social decisions." (p. 19), which is more focused on the feedback rather than choice, and also supports the idea that looking at the feedback moment might be relevant.

      We agree that we could also have looked at the functional connectivity during the feedback phase. The main reason why we had originally not done so was time constraints. At the current time we would in addition point out that the manuscript is already very long and contains many analyses of behavioural and fMRI data. Adding this analysis would cost additional time and would further delay the publication of our manuscript, which we would prefer to avoid. However, one could of course look at these effects in subsequent analyses of the same data or in subsequent versions of this experiment. We have now mentioned this in the Discussion, in the paragraphs on open questions.

      Minor comments:

      (1) For some of the Figures, it would be helpful if the subtitles were more informative. For Figure 2 and Figure 3 for example, it would be nice if Study 1 and Study 2 were not only mentioned in the figure description but also in the actual figure. For Figures 3 and 4, it would be helpful to have significance stars for the bar plots as well.

      We agree that these changes make the figures more easily understandable and have implemented them all, except for adding stars on Figure 4, because all bar plots in panels C and E would have been labeled with two or more stars, which would have made the figure difficult to read. We have now mentioned the fact that all these coefficients were significant in the figure legend.

      (2) For some of the Supplementary Results, it would be very helpful if there was a legend or description. This is already the case for most of the SR, but not for all.

      We have now added a legend to all elements of the Supplementary Results.

      Some questions that came to mind while going through them:

      - Supplementary Table 1: which p-values correspond to the significance stars? This information is included for Supplementary Table 2, but not for ST1. 

      We have now added the missing information in ST1.

      - Supplementary Figure 1: do the colors correspond to different participants? 

      We have now specified that the colors do indeed correspond to different participants.

      - Supplementary Table 5 (final table): what do the - represent? As in, why is there no value for "run" for the MPFC? At first, I thought you only included the significant values, but then I noticed a few non-significant values as well, so it wasn't completely clear to me why some of the values were missing. This also applies to Supplementary Table 6.

      We have indeed forgotten to explain this. The ‘-’ in Supplementary Tables 4 and 6 indicate that the linear mixed model without the factor ‘run’ was the better-fitting one. We have now added the following explanation in the text accompanying Supplementary Table 4:

      “We tested these models both with and without the factor Run and associated interaction, and we report the best-fitting model in the table below: a dash (‘-’) in the row displaying parameters for the run and socialVsSolo:run regressors indicates that the model without factor run was better-fitting for this ROI.”

      (3) I came across a few minor typos or sentences that were not completely clear to me.

      - On page 3: "Patients with damage to ventromedial prefrontal cortex (vmPFC) seem insensitive to guilt when playing social economic games (Krajbich et al., 2009)." This sentence felt a bit out of nowhere and doesn't logically follow from the previous sentences. 

      We have now revised the descriptions of this previous study as well as several others and how they fit into the research question.

      - On page 3: "In another study, participant errors in a difficult perception task lead to a partner feeling pain and evoked activations in left aIns and dlPFC (Koban et al., 2013)." This sentence doesn't really flow, and from the wording, it is not completely clear whether it's the errors or the partner pain that led to the aIns and dlPFC activation.

      We have now revised the description of this study as well, as follows:

      “In another study, partners received painful stimuli when participants made errors during a difficult perception task. These errors evoked activations in the left aIns and dlPFC in the participants (Koban et al., 2013).”

      - Supplementary Figure 1: there is a missing period after the sentence "We then compared these new estimated parameters to the actual parameters from which the synthetic data were generated"

      We have now added a missing comma after “generated”.

      - On page 5: "We ran two experiments, Study 1 outside fMRI and Study 2 during fMRI, with separate groups of participants." I would change "outside fMRI" to outside the MRI scanner or something like that, as it's not completely correct to say "outside fMRI".

      We have changed the sentence to “outside the MRI scanner”.

      - On page 6: for the first result, there are currently two p-values reported (p < 2.5e-20 and p < 2e-16). I believe this is an error?

      This was indeed an error! We have re-run this analysis, noticed that also the degrees of freedom were miscalculated, and have updated this result and the effect of condition (solo vs social). Results are almost identical as previously and all conclusions hold. We have also checked the other analyses reported in this paragraph – all results replicate exactly.

      - On page 6: "Supplemental Table 1" should be "Supplementary Table 1" (for consistency).

      Done.

      On page 8: "participants in both conditions of both studies", I would change "of both studies" to "for both studies".

      Done.

      On page 8: for the "Momentary Happiness" paragraph, it would be helpful if you could briefly describe the Rutledge method here, for people who are unfamiliar with the approach.

      We now write the following at the beginning of this paragraph:

      “Following Rutledge and colleagues’ methodology, which considers that changes in momentary happiness in response to outcomes of a probabilistic reward task are explained by the combined influence of recent reward expectations and prediction errors arising from those expectations, we fitted computational models to each participant’s happiness data.”

      On page 10: "Wilkoxon sign-rank tests", should be "Wilcoxon".

      Done.

      We thank the reviewer for their careful reading of our manuscript. We believe that these changes have indeed improved our manuscript.

    1. SummaryThe chromatin accessibility landscape is the basis of cell-specific gene expression. We generated a multiorgan, single-nucleus chromatin accessibility landscape from the model organism Rattus norvegicus. For this single-cell atlas, we constructed 25 libraries via snATAC-seq from nine organs in the rat, with a total of over 110,000 cells. Cell classification integrating gene activity scores with known marker genes identified 77 cell types, which were strongly correlated with those in published mouse single-cell transcriptome atlases. We further investigated the enrichment of cell type- and organ-specific transcription factors (TFs), the dynamics of T-cell developmental trajectories across organs, and the conservation and specificity of gene expression patterns across species. These findings provide a foundation for further investigations of the cell composition and gene regulatory networks throughout the rat body.HighlightsGeneration of a single-cell atlas of chromatin accessibility in nine organs of the ratCharacterization of cell type- and organ-specific transcription factors (TFs)Dynamics of chromatin accessibility in developing T cells revealed by cross-organ analysisConservation and specificity of gene expression patterns among humans, mice, and rats revealed by cross-species analysisCompeting Interest StatementThe authors have declared no competing interest.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag013), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1:

      Li et al presents a manuscript where they generated a snATAC-seq atlas of 9 major organs in adult rat and integrated the atlas with mouse and human scRNA-seq and scATAC-seq data, revealing that chromatin accessbility is largely conserved between celltypes across species and that there also tissue-specific regulation in some celltypes even when they are common across several tissues. Overall, this looks like a great carefully analysed and annotated resource that would be useful for the community. I appreciate the amount of work that went into curating and analysing this dataset and i thought that the manuscript was very well written and clear.

      I think the most interesting finding is in figure 3 where the authors found unique TFs regulating the same cell-types but in different organs. However, the analysis ends abruptly other than listing these TFs. Can the authors comment on what are the functional consequences/associations of these tissue-specific TFs, perhaps in the discussion?

      The raw data is deposited into a database and can be openly downloaded but i find that the lack of processed data e.g. processed and labelled expression matrices or objects may prevent the adoption of this data by the community as it is a lengthy process to reach the author's conclusions. The authors might also want to consider incorporating an interactive platform for users to explore and navigate this dataset.

      While i appreciate that the authors have detailed in their manuscripts how they performed the data analysis, i would still encourage the authors to upload their scripts/notebooks to an open code repository otherwise again it would be prohibitive for adoption by the community as it is.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      *We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous non-quantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions. *

      2. Point-by-point description of the revisions

      Reviewer #1


      Significance

      Comment 1: At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      __Response: __We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.


      Evidence, reproducibility and clarity

      Comment 1: The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      Response: We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Comment 1: Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      __Response: __We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either 'lost' (filtered out) or 'gained' (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.


      Comment 2: Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      Response: This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      Comment 3: The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      Response: We thank the reviewer for this important comment. We replaced 'regulon' throughout the manuscript by 'co-regulated, functional gene clusters' (or similar).

      Comment 4: It is unclear whether the findings in Fig.3E are based on previous analysis of stage-specific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      __Response: __We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'-O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stage-regulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms 'ribosome biogenesis', 'rRNA processing' and 'RNA methylation' shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      Comment 5: The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      Response: We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      Comment 6: The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      Response: This control is now included in Fig S7 and we have added the corresponding description to the text.

      Comment 7: I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      Response: We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Response: Rectified, thanks for pointing this out.

      2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Response: Rectified, thanks for pointing this out.

      3) Fig.1D: "SNP frequency" is the preferred term in English.

      Response: Corrected.

      4) Fig.2A: not sure what "counts}1" mean.

      __Response: __This figure has been replaced.

      5) Ln 685: "Transcripts with FC 0.01 are represented by black dots" -> This sentence is inaccurate. The intended wording might be: "Transcripts with FC 0.01 are represented by black dots"

      Response: We thank the reviewer and corrected accordingly.

      6) Ln 698: Same as ln 685 mentioned above.

      Response: We thank the reviewer and corrected accordingly.

      7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      Response: This was corrected in the figure and the legends were updated accordingly.

      8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Response: Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      Response: This figure was removed and the legend modified.

      10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Response: Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.



      Reviewer #2

      Evidence, reproducibility and clarity

      Comment 1: In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      Response: We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      __Comment 2: __The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      __Response: __We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      __ __ Major comments:

      Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      Comment 1: As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      Response: We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.

      Comment 2: There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      Response: We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      Comment 3: I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      __Response: __We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      Comment 4: If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      __Response: __See response to comment 1 above.


      Comment 5: As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      __Response: __This control has now been added to Fig S7.

      If you have constructive further reaching suggestions that could significantly improve the study but would open new lines of investigations, please label them as "OPTIONAL". Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated time investment for substantial experiments.

      Comment 6: It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Response: Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      Are the data and the methods presented in such a way that they can be reproduced?

      Comment 7: All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      Response: We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      Are the experiments adequately replicated and statistical analysis adequate?

      Comment 8: Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      Response: See response to comment 1 above.

      Comment 9: Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      Response: We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      Comment 10: It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Response: Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      Comment 11: Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      Response: We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC-MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.


      Comment 12: For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      Response: We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      __ __Specific experimental issues that are easily addressable. Are prior studies referenced appropriately?

      * *Comment 1: Yes

      Are the text and figures clear and accurate?

      * *Comment 2: The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      Response: We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Comment 3: As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      Response: We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0-100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach - see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term "phosphostoichiometry" is imprecise and not correct in this context.

      In response, we (i) replaced the term "phosphostoichiometry" throughout the manuscript with a more accurate description, such as "normalized phosphorylation level", or "relative phosphorylation change normalized to protein abundance", and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      Comment 4: From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting p-values thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      Response: The references have been formatted.

      Comment 5: Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      Response: We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      Comment 6: It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      Response: We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein-protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, co-expression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      Comment 7: The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      __Response: __We apologize and have translated the text in English.

      Comment 8: I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      Response: We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Comment 9: Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      Response: We have the following statement to the legend: 'Confidence values were derived as described in Supplementary Methods'.

      Comment 10: Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      Response: We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter's potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      Comment 11: The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      Response: We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      Comment 12: In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      Response: The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d0

      RNAseq data:

      https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f-4c06-b4bd-bc10f2dc0b14

      Proteomic data: http://www.ebi.ac.uk/pride

      __Username: __reviewer_pxd035698@ebi.ac.uk

      __Password: __gOIcRx0g

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      __Username: __reviewer_pxd035697@ebi.ac.uk

      __Password: __7GWtBmvx

      Significance Provide contextual information to readers (editors and researchers) about the novelty of the study, its value for the field and the communities that might be interested. The following aspects are important:

      * General assessment: provide a summary of the strengths and limitations of the study. What are the strongest and most important aspects? What aspects of the study should be improved or could be developed?*

      Strengths: Comment 1: The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      Response: We thank the reviewer for this positive assessment of our work.

      Comment 2: The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      Response: We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations: __ __Comment 1: Potential lack of appropriate replication (see above).

      Response: See response to comment 1.

      Comment 2: Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      Response: See response to comment 2 above.

      Advance: compare the study to the closest related results in the literature or highlight results reported for the first time to your knowledge; does the study extend the knowledge in the field and in which way? Describe the nature of the advance and the resulting insights (for example: conceptual, technical, clinical, mechanistic, functional,...).

      Comment 3: The study applies well established techniques without any particular technical step-change. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      Response: We thank the reviewer for these positive comments.

      *Audience: describe the type of audience ("specialized", "broad", "basic research", "translational/clinical", etc...) that will be interested or influenced by this research; how will this research be used by others; will it be of interest beyond the specific field? * This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      Response: We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      Response: We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      __Summary: __The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      __Response: __We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:


      Comment 1:__ Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      Response: __We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as 'co-expression'. __

      Comment 2: Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      __Response: __We agree with the reviewer and have replaced 'regulon' with 'co-regulated gene clusters' (or similar).

      Comment 3: LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value Response: We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: 'Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value<br /> Comment 4: L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      Response: __We thank the reviewer for this comment. To clarify this section, we modified the text as follows: 'Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p <br /> __Minor Comments:

      Methods L132: Typo: "A according" should be "according."

      __Response: __The 'A' refers to RNase A. We added a comma for clarification (...RNase A, according to...)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      __Response: __We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: "The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics." The description then goes into further detail.

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      __Response: __We thank the reviewer for this comment. Unlike the paper cited above (using long-term cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Response: corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Response: corrected to 'validated'

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      Response: As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      __Results __ L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      Response: We thank the reviewers for these suggestions and have reformulated into: 'In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.'

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      Response: These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Response: corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Response: Thank you for the comment - we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Response: corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Response: corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      Response: This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      Response: We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      Response: This has been corrected to 'The discrepancies we observed in a sub-set of genes between....'.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Response: corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Response: corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      Response: We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      Response: We added this information to the text ('some of the most significantly enrichment terms included ...').

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      Response: This statement was too speculative and has been removed. Instead, we added 'Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner'.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a well-established regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      Response: We deleted the term as suggested and reformulated to '....our results confirm the important role of protein degradation....'.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      Response: We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      __Discussion __ L555: As noted in L494, reconsider using the word "unexpected."

      Response: removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      Response: We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      Response: We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      Response: We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Response: Corrected to 'some of the most significantly enriched GO terms'.

      Signed: Bart Cuypers, PhD

      **Referee cross-commenting**

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      Response: We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance (Required)):


      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      Response: We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)): __ __Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      • The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.
      • RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and post-transcriptional regulation itself.

      • Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      Response: We thank the reviewer for the time and implication dedicated to our manuscript.

      Comments:

      Further details are organised by order of apparition in the text:

      Comment 1: Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      Response: We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Comment 2: Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      __Response: __We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Comment 3: Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      Response: No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Comment 4: Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      __Response: __Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      Comment 5: L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      __Response: __We corrected this sentence to 'Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2'.

      Comment 6: L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      Response: As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNA-seq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some over-represented tandem gene arrays where all gene copies share the same location and GO term).

      Comment 8: L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      Response: We thank the reviewer for this comment and have corrected the statement accordingly.

      Comment 9: Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      Response: We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.To clarify this section, we modified the text as follows: 'Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p Comment 10: L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      Response: We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.

      Comment 10: The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      Response: We rephrased the conclusion to 'In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.' Please see the response to comment 9 regarding the unique proteins.

      Comment 11: L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      Response: We agree with the reviewer and have toned this statement down by adding the statement '....or simply be a consequence of culture adaptation'.

      Comment 12: The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      Response: We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      Comment 13: A couple of typos:

      • In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      • L225 "...peptide match was disable." should be "...peptide match was disabled."

      Response: both corrected

      __Reviewer #4 (Significance (Required)): __

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      Response: We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      Response: We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how UVC-induced DNA damage alters the interaction between the mitochondrial transcription factor TFAM and mtDNA. Using live-cell imaging, qPCR, atomic force microscopy (AFM), fluorescence anisotropy, and high-throughput DNA-chip assays, they show that UVC irradiation reduces TFAM sequence specificity and increases mtDNA compaction without protecting mtDNA from lesion formation. From these findings, the authors suggest that TFAM acts as a "sensor" of damage rather than a protective or repair-promoting factor.

      Strengths:

      (1) The focus on UVC damage offers a clean system to study mtDNA damage sensing independently of more commonly studied repair pathways, such as oxidative DNA damage. The impact of UVC damage is not well understood in the mitochondria, and this study fills that gap in knowledge.

      (2) In particular, the custom mitochondrial genome DNA chip provides high-resolution mapping of TFAM binding and reveals a global loss of sequence specificity following UVC exposure.

      (3) The combination of in vitro TFAM DNA biophysical approaches, combined with cellular responses (gene expression, mtDNA turnover), provides a coherent multi-scale view.

      (4) The authors demonstrate that TFAM-induced compaction does not protect mtDNA from UVC lesions, an important contribution given assumptions about TFAM providing protection.

      Weaknesses:

      (1) The authors show a decrease in mtDNA levels and increased lysosomal colocalization but do not define the pathway responsible for degradation. Distinguishing between replication dilution, mitophagy, or targeted degradation would strengthen the interpretation

      We thank the reviewer for their careful reading of our manuscript and thoughtful suggestions. We agree that distinguishing between replication dilution, mitophagy, and/or targeted degradation would strengthen our understanding of how UV-induced DNA damage is handled in the mitochondria. Currently we are undertaking experiments to tease this apart, but consider the scope of those experiments to be beyond this manuscript and expect to publish them in a subsequent paper rather than this one. We added text explicitly stating that these possibilities are not distinguished by our results in pages 8-9 in the Discussion under the subsection ‘Mitochondria respond to UVC-induced mtDNA damage in the absence of apparent mitochondrial dysfunction’.

      (2) The sudden induction of mtDNA replication genes and transcription at 24 h suggests that intermediate timepoints (e.g., 12 hours) could clarify the kinetics of the response and avoid the impression that the sampling coincidentally captured the peak.

      We agree and have added additional timepoints of 12 hours and 18 hours post exposure. We have updated Figure 2 to include the new data and have added text on page 4 to include these results.

      (3) The authors report no loss of mitochondrial membrane potential, but this single measure is limited. Complementary assays such as Seahorse analysis, ATP quantification, or reactive oxygen species measurement could more fully assess functional integrity.

      We focused on membrane potential because loss of membrane potential is such a well-understood of mechanism for triggering mitophagy, but agree that these additional measurements are useful. We have added experiments to assess ATP levels, but did not see changes; we have added this data to Figure 2. We have also added text highlighting that we previously assessed mtROS following the same levels of UV exposure and observed no changes (in the results section on page 5 and in the discussion section on page 9). Given that we observe no changes in membrane potential or ATP, we have opted to not move forward with Seahorse analysis for the purposes of this paper.

      (4) The manuscript briefly notes enrichment of TFAM at certain regions of the mitochondrial genome but provides little interpretation of why these regions are favored. Discussion of whether high-occupancy sites correspond to regulatory or structural elements would add valuable context.

      We agree a discussion of these findings provides context and insight into where the field is currently in understanding TFAM sequence specificity. We have updated text in the discussion (pages 9-10) to include our thoughts on the drivers of TFAM sequence specificity with regard to the discrepancy with the anisotropy data and the lack of overlap with regulatory/structural elements.

      (5) It remains unclear whether the altered DNA topology promotes TFAM compaction or vice versa. Addressing this directionality, perhaps by including UVC-only controls for plasmid conformation, would help disentangle these effects if UVC is causing compaction alone.

      We have added an additional control making this comparison and updated the text on page 7 in the results section. UVC by itself (without TFAM being present) does not alter the plasmid compaction; see new supplemental Figure S16.

      (6) The authors provide a discrepancy between the anisotropy and binding array results. The reason for this is not clear, and one wonders if an orthogonal approach for the binding experiments would elucidate this difference (minor point).

      The discrepancy between anisotropy and the binding array results is certainly unusual and contrary to previous studies that have used these arrays. In addition to the anisotropy experiments, we selected a ‘high occupancy’ and ‘low occupancy’ sequence from the binding array and performed oligomerization experiments using atomic force microscopy, which allowed us to detect small changes in cooperativity (see supplemental Figure S15). We previously only discussed this briefly in the results section on page 6, but we have now updated the discussion section (pages 9-10) to highlight this finding and put forth ideas for the field as to why we think this might be the case. While we do see that the binding array data aligns with oligomerization and cooperativity of TFAM, we still do not know what it is about these sequences that would drive such differences in TFAM binding, but we speculate that it could have something to do with flexibility of the DNA sequences.

      Assessment of conclusions:

      The manuscript successfully meets its primary goal of testing whether TFAM protects mtDNA from UVC damage and the impact this has on the mtDNA. While their data points to an intriguing model that TFAM acts as a sensor of damaged mtDNA, the validation of this model requires further investigation to make the model more convincing. This is likely warranted for a follow-up study. Also, the biological impact of this compaction, such as altering transcription levels, is not clear in this study.

      We have updated wording in the Abstract, Introduction, and elsewhere in the text (as detailed in other portions of our response) to make as explicit and clear as possible which results are supported by the in vitro versus in vivo data, and which parts are conclusions supported by the data versus hypothesized models to be tested in future work.

      Impact and utility of the methods:

      This work advances our understanding of how mitochondria manage UVC genome damage and proposes a structural mechanism for damage "sensing" independent of canonical repair. The methodology, including the custom TFAM DNA chip, will be broadly useful to the scientific community.

      Context:

      The study supports a model in which mitochondrial genome integrity is maintained not only by repair factors, but also by selective sequestration or removal of damaged genomes. The demonstration that TFAM compaction correlates with damage rather than protection reframes an interesting role in mtDNA quality control.

      Reviewer #2 (Public review):

      Summary:

      King et al. present several sets of experiments aimed to address the potential impact of UV irradiation on human mitochondrial DNA as well as the possible role of mitochondrial TFAM protein in handling UV-irradiated mitochondrial genomes. The carefully worded conclusion derived from the results of experiments performed with human HeLa cells, in vitro small plasmid DNA, with PCR-generated human mitochondrial DNA, and with UV-irradiated small oligonucleotides is presented in the title of the manuscript: "UV irradiation alters TFAM binding to mitochondrial DNA". The authors also interpret results of somewhat unconnected experimental approaches to speculate that "TFAM is a potential DNA damage sensing protein in that it promotes UVC-dependent conformational changes in the [mitochondrial] nucleoids, making them more compact." They further propose that such a proposed compaction triggers the removal of UV-damaged mitochondrial genomes as well as facilitates replication of undamaged mitochondrial genomes.

      Strengths:

      (1) The authors presented convincing evidence that a very high dose (1500 J/m2) of UVC applied to oligonucleotides covering the entire mitochondrial DNA genome alleviates sequence specificity of TFAM binding (Figure 3). This high dose was sufficient to cause UV lesions in a large fraction of individual oligonucleotides. The method was developed in the lab of one of the corresponding authors (reference 74) and is technically well-refined. This result can be published as is or in combination with other data.

      (2) The manuscript also presents AFM evidence (Figure 4) that TFAM, which was long known to facilitate compaction of the mitochondrial genome (Alam et al., 2003; PMID 12626705 and follow-up citations), causes in vitro compaction of a small pUC19 plasmid and that approximately 3 UVC lesions per plasmid molecule result in a slight, albeit detectable, increase in TFAM compaction of the plasmid. Both results can be discussed in line with a possible extrapolation to in vivo phenomena, but such a discussion should include a clear statement that no in vivo support was provided within the set of experiments presented in the manuscript.

      We thank this reviewer for their careful reading and interpretation of the manuscript. We agree that discussion of in vivo implications and extrapolations need clear statements indicating where there is not currently in vivo support. We have updated the text throughout the paper to include this.

      Weaknesses:

      Besides the experiments presented in Figures 3 and 4, other results do not either support or contradict the speculation that TFAM can play a protective role, eliminating mitochondrial genomes with bulky lesions by way of excessive compaction and removing damaged genomes from the in vivo pool.

      To specify these weaknesses:

      (1) Figure 1 - presents evidence that UVC causes a reduction in the number of mitochondrial spots in cells. The role of TFAM is not assessed.

      We are working to understand the role of TFAM in vivo following UV irradiation, but believe that work should be included in follow up studies rather than this publication.

      (2) Figure 2 - presents evidence that UVC causes lesions in mitochondrial genomes in vivo, detectable by qPCR. No direct assessment of TFAM roles in damage repair or mitochondrial DNA turnover is assessed despite the statements in the title of Figure 2 or in associated text. Approximately 2-fold change in gene expression of TFAM and of the three other genes does not provide any reasonable support to suggestion about increased mitochondrial DNA turnover over multiple explanations on related to mitochondrial DNA maintenance.

      We agree and have updated the title of Figure 2 to better reflect the findings outlined in the figure as well as the text.

      The new title is, “UVC causes mtDNA damage that decreases over time and is associated with upregulation of mtDNA replication genes, in the absence of apparent mitochondrial dysfunction.”

      We agree that there are numerous mechanistic hypotheses that could explain the decrease in mtDNA damage over time. In Figure 1, we show that there is an overall decrease in mtDNA spots, and an increase in mtDNA-lysosome colocalization, suggestive of mtDNA degradation, which could serve to remove damaged genomes. One possibility is that TFAM is playing a role in the damage removal (but not repair per cell as these lesions are not repaired). Another is changes in mtDNA turnover via increasing the replication machinery in order the synthesize non-damaged mtDNA molecules to dilute out damage. These and other possibilities are not mutually exclusive. We have added text (pages 8-9) to make explicit that additional work will be required to distinguish these possibilities. We note that we have also added an additional experiment showing that TFAM knockdown affects mtDNA damage at baseline, as well as after UVC exposure (Figure 5J).

      (3) Figure 5. Shows that TFAM does not protect either mitochondrial nucleoids formed in vitro or mitochondrial DNA in vivo from UVC lesions as well as has no effect on in vivo repair of UV lesions.

      We agree that Figure 5 shows that TFAM does not protect DNA from UVC-induced lesions, and that a roughly 2-fold increase in TFAM protein does not alter damage reduction over time. We have added new data showing that in vivo, knockdown of TFAM results in an increase in baseline (control conditions) mtDNA damage, and also alters the rate of decrease of mtDNA damage over time after UVC (Figure 5J).

      (4) Figure 6: Based on the above analysis, the model of the role of TFAM in sensing mtDNA damage and elimination of damaged genomes in vivo appears unsupported.

      We have updated the legend for Figure 6 in which we outline our hypothesized role of TFAM in sensing mtDNA damage to ensure that readers know this has yet to be fully tested in vivo. We have also updated the Figure legend title from “proposed model” to “hypothesized model,” and changed the wording in the conclusion section (page 11) to highlight more clearly that this is a working model.

      (5) Additional concern about Figure 3 and relevant discussion: It is not clear if more uniform TFAM binding to UV irradiated oligonucleotides with varying sequence as compared to non-irradiated oligonucleotides can be explained by just overall reduced binding eliminating sequence specific peaks.

      We do not believe this is the case given the similar K<sub>D</sub> values for the sequences tested. In our hands and in other publications (reviewed in PMID: 34440420), it has been well established that TFAM binds damaged DNA very well—essentially just as well as nondamaged DNA or better.

      Additionally, a reduction in overall binding on these DNA arrays tends to make sequence specific peaks more apparent. We ran our experiments at both 30 nM and 300 nM TFAM specifically to be able to assess this question. The 300 nM data can be found in supplemental Figure S7. In this figure, we notice that the peaks appear more uniform at the high concentration (comparing Figure 3A to Figure S7A). That is presumably because there is so much more binding happening across the array that the peaks associated with the strongest binders become less pronounced. For the sake of brevity, we have not added this reasoning to the text, but are willing to do so if the Reviewers and Editor feel that it is important to include.

      Reviewer #3 (Public review):

      Summary:

      The study is grounded in the observations that mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress. The manuscript focuses on the effects of UVC-induced DNA damage on TFAM-DNA binding in vitro and in cells. The authors demonstrate increased TFAM-DNA compaction following UVC irradiation in vitro based on high-throughput protein-DNA binding and atomic force microscopy (AFM) experiments. They did not observe a similar trend in fluorescence polarization assays. In cells, the authors found that UVC exposure upregulated TFAM, POLG, and POLRMT mRNA levels without affecting the mitochondrial membrane potential. Overexpressing TFAM in cells or varying TFAM concentration in reconstituted nucleoids did not alter the accumulation or disappearance of mtDNA damage. Based on their data, the authors proposed a plausible model that, following UVC-induced DNA damage, TFAM facilitates nucleoid compaction, which may serve to signal damage in the mitochondrial genome.

      Strengths:

      The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. The proposed model may inspire future follow-up studies to further study the role of TFAM in sensing UVC-induced damage.

      Weaknesses:

      The manuscript could be further improved by refining specific interpretations and ensuring terminology aligns precisely with the data presented.

      (1) In line 322, the claim of increased "nucleoid compaction" in cells should be removed, as there is a lack of direct cellular evidence. Given that non-DNA-bound TFAM is subject to protease digestion, it is uncertain to what extent the overexpressed TFAM actually integrates into and compacts mitochondrial nucleoids in the absence of supporting immunofluorescence data.

      We would like to thank this reviewer for their comments and suggestions. We feel these specific language changes have strengthened the interpretability of the text. The TFAM overexpression cells used in this experiment were given to us by Isaac et al., who demonstrated that when TFAM was overexpressed in this specific cell line, the nucleoids were indeed more compact, measured by Fiber-seq (Isaac et al., 2024; PMID: 38347148). We have removed the claim “increased compaction” from the section title, Figure 5 legend title, and from line 322 (now on page 8), and have also added an additional sentence to ensure the reader knows these cells have been shown to have presumed increased compaction by other groups.

      (2) In lines 405 and 406, the authors should avoid equating TFAM overexpression with compaction in the cellular context unless the compaction is directly visualized or measured.

      We have updated the text to ensure that it is clear that this was tested by other groups. We also changed the wording to “inaccessible (presumably compacted) nucleoids.” While we did not demonstrate altered compaction in our study, we think that based on the results from Isaac et al., it is likely that there was increased compaction. In addition, some readers might not have the context to make the connection between compaction and accessibility, so eliminating all reference to compaction could obscure the point.

      (3) In lines 304 and 305 (and several other places throughout the manuscript), the authors use the term "removal rates". A "removal rate" requires a direct comparison of accumulated lesion levels over a time course under different conditions. Given the complexity of UV-induced DNA damage-which involves both damage formation and potential removal via multiple pathways-a more accurate term that reflects the net result of these opposing processes is "accumulated DNA damage levels." This terminology better reflects the final state measured and avoids implying a single, active 'removal' pathway without sufficient kinetic data.

      We agree and have updated the language throughout the text as well as the results heading for this section.

      (4) In line 357, the authors refer to the decrease in the total DNA damage level as "The removal of damaged mtDNA". The decrease may be simply due to the turnover and resynthesis of non-damaged mtDNA molecules. The term "removal" may mislead the casual reader into interpreting the effect as an active repair/removal process.

      We agree and have restructured this sentence for clarity. We do believe there is some removal happening, given the increase in mtDNA colocalization in lysosomes alongside decrease of mtDNA spots in our live cell imaging. We have written it to reflect the inclusion of removal and resynthesis of nondamaged mtDNA molecules (see pages 8-9).

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers appreciate the quality of the presented data but concur that they do not support the primary claims in the title and abstract. The reviewers also realize that in vivo evidence for the model would require extensive new experimentation that goes beyond a reasonable revision. The recommendation is to change the title and significantly revise text, figure titles and legends for transparency, and conclusions within results and discussion sections.

      We thank the editor and all the reviewers for their feedback. We have added additional experiments, updated text throughout the entire paper to ensure our claims are supported, and revised our title. We feel that the changes we have made have indeed made the paper stronger, more transparent, and that the evidence put forth in this paper provides support for all claims made.

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify mitochondrial response kinetics by adding an intermediate (e.g., 12 hrs) recovery timepoint for transcriptional analysis to resolve when TFAM and replication genes are induced.

      We have added additional timepoints of 12 and 18 hours following exposure in Figure 2. These results strengthen our finding that the nuclear transcriptional program supporting mtDNA replication appears to be activated prior to the nuclear transcriptional program supporting mitochondrial transcription, in that POLG and TFAM come up before POLRMT and ND1.

      (2) Strengthen functional readouts by assessing additional parameters of mitochondrial function to substantiate the claim that UVC does not impair mitochondrial performance.

      We have referenced our previously-published data on mtROS and added a measurement of ATP following UVC exposure in Figure 2.

      (3) Consider exploring whether mtDNA degradation occurs via mitophagy, nucleoid-phagy, or another pathway-potentially by using inhibitors or markers of these processes.

      While we agree that this is an important follow up question and are currently working on experiments to address this, those experiments are outside the scope of this manuscript.

      (4) Provide additional details for the high occupancy TFAM sites. Provide brief annotation or discussion of genomic regions showing strong TFAM binding under non-irradiated conditions that are lost during UVC treatment. This would be helpful to the field as a whole.

      We have updated our discussion section to include this.

      (5) Include or discuss a control using UVC irradiated pUC19 without TFAM to confirm that observed compaction categories are TFAM dependent rather than an UVC induced DNA distortion.

      We have added in a supplemental figure (Figure S16) containing comparison of area analysis of control pUC19 and UV-irradiated pUC19 and we have added associated text in the results section of the paper.

      (6) It would be interesting to explore the link between compaction to transcriptional output. In the TFAM overexpression model, the authors could measure expression of mtDNA encoded transcripts (e.g., ND1, COX1) to connect increased compaction with altered mitochondrial transcription.

      While we agree that understanding how the compactional status alters mitochondrial transcription is worthwhile, we believe this is beyond the scope of this paper. Furthermore, this connection has previously been shown by Bruser et al., 2021 (PMID: 34818548) who showed that more compact nucleoids are not undergoing active transcription. It will be interesting to see in future work if mtDNA damage drives changes in both compaction as well as transcriptional activity.

      (7) Clarify quantitative presentation in figure 2F to explicitly note whether the observed increase in fluorescence intensity was statistically insignificant and confirm that the assay sensitivity is sufficient to detect small potential changes. As presented it is not clear if there is a change.

      We have changed the presentation of Figure 2F. There is a slight increase in membrane potential at the 24-hour time point and we have made that clear in the text as well. We included FCCP as a (standard) positive control, for which we can detect the associated decrease in membrane potential for. While it is always possible that a very small decrease occurred that we were unable to detect, we note that none of the six UVC-exposed groups that we tested even trended towards a decrease in MMP, making it less likely that there was an effect that we simply lacked the power or sensitivity to detect.

      (8) It would be interesting if the authors can comment on whether TFAM induced compaction after UVC might shield mtDNA from other, repairable lesions (e.g., oxidative or alkylation damage), offering a broader context for this mechanism beyond just UVC.

      In theory, we believe this is possible. It will also be interesting to see if the increased compaction following UVC also protects or shields the mtDNA from other enzymatic processes, such as repair proteins that may be searching for repairable lesions such as oxidative or alkylation damage. In this case, it seems as though the increased compaction would prevent the repair from happening at genomes harboring damage.

      In this study we show with our in vitro nucleoids that the increased compaction does not protect against UVC, but this is likely because UVC does not need physical access to the DNA in order to damage it, as the wavelengths of UVC (centered in this case at 254nm) are readily absorbed by proteins and thus can go right through the proteins. Currently, we know that increased compaction by TFAM makes the DNA inaccessible to the enzymes required to methylate DNA used in Fiber-seq (PMID: 38347148), but we do not know if the compaction is tight enough to prevent ROS or alkylating agents from damaging the DNA. We have updated text in the discussion on page 10 to highlight some of these ideas.

      Reviewer #2 (Recommendations for the authors):

      Please, go over all display items and text and clarify details that can help readers to understand important specifics of the experiments. Examples are provided below:

      (1) Abstract and Introduction - indicate species and cell line

      We have updated the text to include this information.

      (2) Table 1 "TFAM KD measurements"- title and footnotes are entirely cryptic. Please, clarify the experimental design, question(s) addressed and conclusions drawn from data.

      We have updated the title of Table 1 to "Binding of TFAM to array sequences, measured using fluorescence anisotropy,” and clarified the footnotes to make sure it is clear which sequences were selected for AFM oligomerization experiments.

      (3) Figure 3 and Material and Methods - specify UVC dose.

      We have added this information to both the figure legend and the methods section.

      (4) Figure 4 - specify UVC dose.

      We have added this information to the figure legend.

      (5) Figure 5. Panel B indicate which band is TFAM and which is HA-tag; Indicate clearly which panel is showing in vivo or in vitro results.

      We have updated the figure to label the untagged TFAM and HA-tagged TFAM and changed the panel titles to specify if they are in vivo results.

    1. In what situations would impromptu speaking be used? Since we’ve already started thinking of the similarities between public speaking and conversations, we can clearly see that most of our day-to-day interactions involve impromptu speaking. When your roommate asks you what your plans for the weekend are, you don’t pull a few note cards out of your back pocket to prompt your response. This type of conversational impromptu speaking isn’t anxiety inducing because we’re talking about our lives, experiences, or something we’re familiar with. This is also usually the case when we are asked to speak publicly with little to no advance warning. For example, if you are at a meeting for work and you are representing the public relations department, a colleague may ask you to say a few words about a recent news story involving a public relations misstep of a competing company. In this case, you are being asked to speak on the spot because of your expertise. A competent communicator should anticipate instances like this when they might be called on to speak, so they won’t be so surprised. Of course, being caught completely off guard or being asked to comment on something unfamiliar to you creates more anxiety. In such cases, do not pretend to know something you don’t, as that may come back to hurt you later. You can usually mention that you do not have the necessary background information at that time but will follow up later with your comments.

      This reading explains that each delivery method—impromptu, manuscript, and memorized—has specific strengths and weaknesses depending on the speaking situation. I found it interesting that impromptu speaking, although anxiety-inducing, can actually strengthen public speaking skills because it forces speakers to think quickly and organize ideas on the spot. However, it also carries the risk of rambling or overstating knowledge. Manuscript delivery, on the other hand, offers precision and consistency, especially for complex information, but often reduces audience engagement because the speaker may sound like they are reading rather than speaking naturally.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work shows that resistance profiles to a variety of drugs are variable between different mycobacterial species and are not correlated with growth rate or intrabacterial compound concentration (at least for linezolid, bedaquiline, and Rifampicin). Note that intrabacterial compound concentration does not distinguish between cytosolic and periplasmic/cell wall-associated drugs. The susceptibility profiles for a wide range of mycobacteria tested under the same conditions against 15 commonly used antimycobacterial drugs provide the first recorded cross-species comparison which will be a valuable resource for the scientific community. To understand the reasons for the high Rifampicin resistance seen in many mycobacteria, the authors confirm the presence of the arr gene known to encode a Rif ribosyltransferase involved in Rif resistance in M. smegmatis in the resistant mycobacteria after confirming the absence of on-target mutations in the RpoB RRDR. Metabolomic analyses confirm the presence of ribosylated Rif in some of the naturally resistant mycobacteria which may not be entirely surprising but an important confirmation. Presumably M. branderi is highly resistant despite lacking the arr homolog due to the rpoB S45N mutation. M. flavescens has an MIC similar to that of M. smegmatis, despite having both Arr-1 and Arr-X. Various Arr-1 and Arr-X proteins are expressed and characterized for catalytic activity which shows that Arr-X is a faster enzyme,, especially with respect to more hydrophobic rifamycins. M. flavescens has similar MIC values to Rifapentine and Rifabutin to M. smegmatis. Thus, the Arr-1 versus Arr-X comparison does not provide a complete explanation for the underlying reasons driving natural Rif resistance in mycobacteria. Downregulation of Arr-X expression in M. conceptionense confers increased sensitivity to Rifabutin confirming its role as a rifamycin-inactivating enzyme.

      Overall, the comparison of cross-species susceptibility profiles is novel; the demonstration that MIC is not correlated with intracellular drug concentration is important but not sufficiently interrogated, the demonstration that Arr-X is also a Rif ADP-ribosyltransferase is a good confirmation and shows that it is more efficient than Arr-1 on hydrophobic rifamycins is interesting but maybe not entirely surprising. The manuscript seems to have two parts that are related, but the rifamycin modification aspect of the work is not strongly linked to the first part since it interrogates the modification of one drug but not the common cause of natural resistance for other drugs.

      Reviewer #2 (Public review):

      Summary:

      The authors use a variety of methods to investigate the mechanisms of innate drug resistance in mycobacteria. They end up focusing on two primary determinants - drug accumulation, which correlates rather poorly with resistance for many species, and, for the rifamycins, ADP-ribosyltransferases. The latter enzymes do appear to account for a good deal of resistance, though it is difficult to extrapolate quantitatively what their relative contributions are.

      Overall, they make excellent use of biochemical methods to support their conclusions. Though they set out to draw very broad lessons, much of the focus ends up being on rifamycins. This is still a very interesting set of conclusions.

      Strengths:

      (1) A very interesting approach and set of questions.

      (2) Outstanding technical approaches to measuring intracellular drug concentrations and chemical modification of rifamycins.

      (3) Excellent characterization of variant rifamycin ADP-ribosyltransferases

      Weaknesses:

      (1) Figure 3c/d: These panels show the same experiment done twice, yet they display substantially different results in certain cases. For instance, M. smegmatis appears to show an order of magnitude lower RIF accumulation in panel d compared to M. flavescens, despite them displaying equal accumulation in panel c. The authors should provide justification for this variation, particularly as quantitative intra-species comparisons are central to the conclusions of this figure.

      The data in panels 3c and 3d are from different sets of experiments. The reviewer is correct with regards to M. smegmatis. The data indeed is ~ 1 order of magnitude different. However, the data for other species is very similar. The reviewer may also have noticed that the error bars are also larger in 3d, compared to 3c, indicating a greater variation between independent experiments use in 3d. We do not have a good explanation for this, other than the experiments shown in 3d were associated with greater biological variability.

      (2) There are several technical concerns with Figure 3 that affect how to interpret the work. According to the methods, the authors did not appear to normalize to an internal standard, only to an external antibiotic standard (which may account for some of the technical variation alluded to above).

      We agree that using a labeled drug as an internal standard (IS) would be ideal. However, the experiment initially followed an untargeted metabolomics approach, which later shifted to relative drug quantification. At that stage, normalizing with IS was impractical because proper implementation would require multiple IS across the chromatographic range. Therefore, we opted for total ion current (TIC) normalization, which accounts for variability in overall metabolite abundance—even though the experimental setup was already adjusted for each bacterial species’ growth rate. Additionally, we prepared external standard curves for each drug to enable quantification, and the amount of drug added to each plate was considered when reporting these values.

      Second, the authors used different concentrations of drug for each species to try to match the species' MICs. I appreciate the authors' thinking on this, but I think for an uptake experiment it would be more appropriate to treat with the same concentration of drug since uptake is likely saturable at higher drug concentrations. In the current setup, for the species with higher MIC, they have to be able to uptake substantially more antibiotics than the species with low MIC in order to end up with the same normalized uptake value in Figure 3d. It would be helpful to repeat this experiment with a single drug concentration in the media for all species and test whether that gives the same results seen here.

      We respectfully disagree with the reviewer. Experiments such as the one proposed by the review work well when MIC values are a few fold apart, for strains of the same species, but have not been tested when MIC values are 100-1000-fold apart, with different species. Furthermore, what would be the interpretation of compound uptake at 1000-fold the MIC for one species and MIC level for another? By using antibiotic concentrations at the respective MIC for each species we are at least under conditions where we know the biological effect of the antibiotic across species is the same, based on its potency.

      (3) Figure 4f: This panel seems to argue against the idea that the efficacy of RIF ribosylation is what's driving drug susceptibility. M. flavescens is similarly resistant to RIF as M. smegmatis, yet M. flavescens has dramatically lower riboslyation of RIF. This is perhaps not surprising, as the authors appropriately highlight the number of different rif-modifying enzymes that have been identified that likely also contribute to drug resistance. However, I do think this means that the authors can't make the claim that the resistance they observe is caused by rifamycin modification, so those claims in the text and figure legend should be altered unless the authors can provide further evidence to support them. This experiment also has results that are inconsistent with what appears to be an identical experiment performed in Supplemental Figure 5b. The authors should provide context for why these results differ.

      In regard to enzyme efficiency, the apparent rate of all Arr-1 is relatively similar in converting RIF into ADP-Ribosyl-Rif between species. However, Arr-X is much more efficient when compared to Arr-1 in both M. flavescents and M. conceptionense. This is indicated by the apparent rate measured and displayed on figure 5c.

      Proteomics data shows that there is upregulation of Arr-1 and Arr-X upon rifampicin treatment in M. flavescens and M. conceptionense. However, the same experiment was not performed in Arr-1 KD. Therefore, we can’t verify through this approach if the activity observed in vivo directly correlates with a higher expression of Arr-X alone. Of note, likely both enzymes contribute to resistance to rifamycins, as per our results with the Arr-X KD and sensitization of M. conceptionense to RIF.

      Author response image 1.

      It is also worth mentioning that there are other enzymes in the pathway of RIF ribosylation and their efficiency is unknown (Author response image 2). Therefore ADP-Ribosyl-RIF It is not an “end-metabolite” and maybe not the sole determinant of RIF resistance via ADP-ribosylation. Downstream enzymes can also account for the difference observed between M. flavescens and M. smegmatis.

      Author response image 2.

      It is correct that the Rifampicin MIC for M. flavescens is the same as M. smegmatis.

      (4) Fig 4f/5c: M. flavescens has both Arr-1 and Arr-X, yet it appears to not have ribosylated RIF. This result seems to undermine the authors' reliance on the enzyme assay shown in Fig 5c - in that assay, M. flavescens Arr-X is very capable of modifying rifampicin, yet that doesn't appear to translate to the in vivo setting. This is of importance because the authors use this enzyme assay to argue that Arr-X is a fundamentally more powerful RIF resistance mechanism than Arr-1 and that it has specificity for rifabutin. However, the result in Figure 4f would argue that the enzyme assay results cannot be directly translated to in vivo contexts. For the authors to claim that Arr-X is most potent at modifying rifabutin, they could test their CRISPRi knockdowns of Arr-X and Arr-1 under treatment with each of the rifamycins they use in the enzyme assay. The authors mentioned that they didn't do this because all the strains are resistant to those compounds; however, if Arr-X is important for drug resistance, it would be reasonable to expect to see sensitization of the bacteria to those compounds upon knockdown.

      The reviewer is reading Fig. 4f incorrectly, probably because it is plotted in a linear scale instead of logarithmic scale. Ribosylated Rif is present in M. flavescens, just at lower levels than M. conceptionense and M. smegmatis. In species where there is no Arr-1 or Arr-3, ribosylated RIF is not detected at all (e.g. M. tuberculosis), i.e., concentration is zero. Therefore, any detection of ribosylated RIF can be considered significant. In addition, as mentioned before, ADP-ribosylation of RIF is not the final product of the reaction and further studies need to be undertaken to understand subsequent reactions.

      (5) Figure 5d: The authors use this CRISRPi experiment to claim that ArrX from M. conceptionanse is more potent at inactivating rifabutin than Arr-1. This claim depends on there being equal degrees of knockdown of Arr-1 and Arr-X, so the authors should validate the degree of knockdown they get. This is particularly important because, to my knowledge, nobody has used this system in M. conceptionanse before.

      We agree with the reviewer that a qPCR should have been performed to define the extent of interference in the strain. generated Unfortunately, at this time a qPCR was not performed in the strains tested to confirm the extent of down regulation. Although it is the best practice to validate the strain KD, there is no indication that the effect observed is due to unspecific downregulation. The genetic environment in which Arr-X is positioned is different from Arr-1 and the targeting oligonucleotides are specific and would not promiscuously bind to Arr-1. Said that, this is indeed a fault in our setup.

      (6) The authors' arguments about Arr-X and Arr-1 would be strengthened by showing by LC/MS that Arr-X knockdown in M. conceptionense results in more loss of ribosyl-rifabutin than knockdown of Arr-1.

      We agree with the reviewer that performing the LC-MS analysis of the Arr-x knockdown would have strengthened the argument of our paper. Unfortunately, this experiment was not performed.

      Reviewer #3 (Public review):

      This manuscript presents a macroevolutionary approach to the identification of novel high-level antibiotic resistance determinants that takes advantage of the natural genetic diversity within a genus (mycobacteria, in this case) by comparing antibiotic resistance profiles across related bacterial species and then using computational, molecular, and cellular approaches to identify and characterize the distinguishing mechanisms of resistance. The approach is contrasted with "microevolutionary" approaches based on comparing resistant and susceptible strains of the same species and approaches based on ecological sampling that may not include clinically relevant pathogens or related species. The potential for new discoveries with the macroevolution-inspired approach is evident in the diversity of drug susceptibility profiles revealed amongst the selected mycobacterial species and the identification and characterization of a new group of rifamycin-modifying ADP-ribosyltransferase (Arr) orthologs of previously described mycobacterial Arr enzymes. Additional findings that intra-bacterial antibiotic accumulation does not always predict potency within this genus, that M. marinum is a better proxy for M. tuberculosis drug susceptibility than the commonly used saprophyte M. smegmatis, and that susceptibility to semi-synthetic antibiotic classes is generally less variable than susceptibility to antibiotics more directly derived from natural products strengthen the claim that the macroevolutionary lens is valuable for elucidating general principles of susceptibility within a genus.

      There are some limitations to the work. The argument for the novelty of the approach could be better articulated. While the opportunities for new discoveries presented by the identification of discrepant susceptibility results between related species are evident, it is less clear how the macroevolutionary approach is further leveraged for the discovery of truly novel resistance determinants. The example of the discovery of Arr-X enzymes presented here relied upon foundational knowledge of previously characterized Arr orthologs. There is little clarity on what the pipeline for identifying more novel resistance determinants would look like. In other words, what does the macroevolutionary perspective contribute to discovery from the point of finding interspecies differences in susceptibility? Does the framework still remain distinct from other discovery frameworks and approaches? If so, how?

      Thanks for pointing this out, as this is a critical feature of our study and method. Our approach relies on inter-species comparative genomics and phenotypes, and therefore, it is distinct from inter-strains comparison. This difference is dramatic, and it becomes clearer when we are comparing the core genome of M. tuberculosis (one species) 92% with the core genome of the genus, circa of 1%. While we focus on rifamycin in this manuscript, future manuscripts will investigate many of the other dozens of “inconsistencies” observed between the genetic makeup of different mycobacterial species and there actual performance in the presence of different antibiotics.

      While the experimentation and analyses performed appear well-designed and rigorous, there are a few instances in which broad claims are based on inferences from sample sets or data sets that are too limited to provide robust support. For example, the claim that rifampicin modification, and precisely ADP-ribosylation, is the dominant mechanism of resistance to rifampicin in mycobacteria may be a bit premature or an over-generalization, as other enzymatic modification mechanisms and other mechanisms such as helR-mediated dissociation of rifampicin-stalled RNA polymerases, efflux, etc were not examined nor were CRISPRi knockdown experiments conducted beyond an experiment to tease out the role of Arr-X and Arr-1 in one strain. The general claim that intra-bacterial antibiotic accumulation does not predict potency in mycobacteria may be another over-generalization based on the limited number of drugs and species studied, but perhaps the intended assertion was that antibiotic accumulation ALONE does not predict potency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) The metabolomics is done using mycobacteria grown on filters. Initially, mycobacterial cells are grown on the filters for 5 doublings before being transferred to drug-containing (or free) agar for one doubling. Is this based on calculated doubling time in liquid culture or a true determination of the fact that the biomass increases to what would amount to 5 doublings?

      The doubling time used is the one determined in liquid media. Although it is possible that the growth kinetics in solid media is slightly different from liquid (±10%), this experimental design is well established for M. tuberculosis (since Proc Natl Acad Sci U S A. 2010 May 25;107(21):9819-24.) and M. smegmatis (unpublished). Therefore, we used the growth rate as a proxy for having the same biomass of cells for each species tested. A maximum difference of 10% was observed between M. tuberculosis growth in liquid and in solid media, however, cells grow exponentially for much longer in filters. This makes filter-based experiments more reliable, as few growth phase-derived differences are present.

      (2) The demonstration that intrabacterial drug concentrations vary between mycobacterial species in a manner not related to MIC for at least LZD and RIF, is an important finding. However, intrabacterial does not mean cytoplasmic since a considerable fraction could be present in the periplasmic/cell wall layers. Ideally, this would need to be determined but would of course be a massive undertaking since the method needs validation & optimization for each mycobacterial species. Nevertheless, this has to be mentioned. In addition, three drugs are limiting. Measuring additional drug concentrations in these 5 mycobacteria would at least establish some confirmation about the extent of this lack of correlation. Thus, could the authors measure concentrations of additional drugs with intracellular targets?

      Testing additional drugs can be beneficial and would be an expansion of our paper, which will definitely be on future plans for further studies focusing on other antibiotics described here. It would also provide new insights into other possible mechanisms of resistance in mycobacterial species. However, in this study we aimed to first determine the antibiotic response profile in different mycobacterial species, and once we identified interesting resistance phenotypes that could not be readily explained by known mechanisms of resistance, we narrowed it down to certain drugs and species that would potentially provide insights into new mechanisms of antibiotic resistance. Finally, exploring drug concentration across multiple bacterial compartments is a dauting task and it has not been done extensively with any species, not to mention with multiple species, many of which are still lacking any study of their actual cell envelope.

      (3) CRISPRi was used to reduce transcription in M. conceptionense. What was the level of gene downregulation?

      As mentioned previously, a setback from our setup is that the level of KD was not measured at this instance.

      Minor comments:

      (1) The introduction mentions the fast and slow-growing mycobacteria which are classified based on the time that it takes to observe colonies on solid agar. However, in liquid medium, there is less correlation between the reported growth on agar and doubling time in liquid (Figure 1b, Figure 2d). This could be mentioned in the results section. In Figure 2d, the filled circles represent fast-growers but this does not hold well for liquid culture and it might make more sense to not distinguish between fast- and slow-growers in these graphs. A small complication would also be the fact that the doubling time represents growth in a liquid medium with Tyloxapol as a detergent whereas the MIC and metabolomics are done on solid agar with no detergent. The metabolomics is done after a doubling but for those where agar growth and liquid growth have large discrepancies in growth rate, there could be some differences.

      Apologies for this misunderstanding. Fast- and slow-growth phenotypes are determined in Lowenstein-Jensen (LJ) agar, not in 7H10 agar (used in our study and most studies of mycobacteria). Furthermore, this is a qualitative definition, not a quantitative one. Therefore, our measurements do not need to correlate with fast- and slow-growth phenotypes, unless we had used that one specific medium. Furthermore, in liquid medium, we determined growth rate directly, which is never done with LJ medium.

      In addition to adding the same amount of cells to each filter, we also perform TIC normalization, which should account for how rich the samples were – and therefore how much material we had. Therefore, we do not observe discrepancies due to differences in growth rate and the presence/absence of detergent in the media.

      It is also worth mentioning that this experimental set up has been well established in many M. tuberculosis labs that study metabolism. Importantly, the use of detergent drastically affects mass spectrometry, and therefore cannot be used.

      (2) Figure 1g in the text should be Figure 1f.

      Apologies, it has been fixed.

      (3) Figure S1 would be ideal to have in (supplementary) table format.

      This data is now being provided in a table format.

      (4) Table S1 - ethambutol misspelt.

      Spelling has been corrected.

      (5) MIC for species such as M. abscessus could depend on medium (7H9-based medium can give different MIC values than CAMH).

      Indeed, different media can significantly change MIC values, and this is true for many bacterial species, if not all. For this study we used only species that could be grown in 7H9 broth containing 10 % ADC, 0.05% glycerol 0.05% tyloxapol and 7H10 plates containing 10% OADC and 0.05% glycerol. MIC<sub>99</sub> was determined in the latter as we found more efficient and robust to do our tests it in solid media. The goal of our experiment was not to the determined the “true” MIC for the antibiotics tested, as this value does not exist. It was to find lack of correlations between relative values and the presence of genes that can account for it.

      (6) The statement "the experiment was performed at a concentration of antibiotic equal to its MIC" initially seems confusing. It was not equal to the MIC but performed at 6-fold the respective MIC of the species in question. Maybe re-phrasing this would help.

      Apologies for this oversight. It has been corrected.

      (7) Note that some mutations outside the RRDR (eg. V170F and I491F) can also cause Rif resistance.

      Author response image 3.

      A Rainbow diagram of RpoB X-Ray structure coloured according to sequence conservation. Dark purple indicates high conservation, whereas dark orange indicates low conservation. RIF (showed in magenta) is bound to RpoB. Zoomed view displays that the RIF-binding pocket is considerably conserved. B RpoB protein sequence has an 81bp region called Rifampicin Resistance Determining Region (RRDR) that is known to be important for RIF binding and is where most mutations occur in drug-resistant TB. Sequence alignment displays that the RRDR region is conserved with the exception of M. branderi, which has an Asn instead of a Ser residue in position 456 (numbering is related to the M. tuberculosis sequence), highlighted in bold.

      Attached we have a structural alignment of RpoB of the species highlighted on this paper. Although there is variability within the sequences, which is also displayed in Author response image 3 with the conservation analysis, the residues that have been implicated with resistance (including V170 and I491) are conserved. Alignment sent on .fasta file that can be opened in jalview.

      (8) Discuss how the RpoB S450N mutation in M. branderi confers the observed level of resistance.

      That’s a great point, thank you. Now it reads as:

      “The rifampicin (RIF) binding pocket is generally conserved, but Mycobacterium branderi has an S450N mutation in the RRDR region. While this specific mutation hasn't been found in clinical isolates, it's located at the binding site and may confer resistance (273). Although both serine (S) and asparagine (N) have similar side chains, related mutations like S450Q have been linked to resistance (156). Thus, M. branderi may be RIF-resistant due to this mutation. In contrast, M. conceptionense, M. flavescens, and M. smegmatis show no target sequence differences that explain their resistance”

      (9) The statement that the three tested NTM are sensitive to rifabutin ("resistant to all rifamycins except for rifabutin") needs to be interpreted considering what sensitivity means. The MIC is still high (1.6-3.1 ug/mL) when compared to that of Mtb. The 2-fold differences in MIC between M. smegmatis and M. conceptionense do not really prove or disprove the role of Arr-X in rifabutin resistance.

      We fixed the sentence to be more careful with the language on the text. We agree, but it is worth mentioning that generally with bacteria there is a regulation by the CLSI. Each bacterial species has a range that is considered sensitive or resistant, but these are not available for the species used in this study. In general, bacteria with MIC values above 8 µg/mL are considered resistant to rifampin (J Antibiot 2014 67:625).

      (10) Figure 1d: It's hard to quantify the sensitivity of the plates. Can this be done by MIC? Was only rifabutin tested or also rifampicin?

      The initial experiments described on the paper were all performed using Rifampicin only. Then, the MIC for the remaining rifamycins was determined for M. smegmatis, M. flavescens and M. conceptionense, and can be perused on “Supplementary table 4”. Figure 5d is to illustrate the effect of the KD in M. conceptionense sensitivity to rifabutin.

      (11) Is there data to show the ADP-ribosylation of rifabutin in M. conceptionense and the CRISPRi strains?

      Unfortunately, we did not perform LC-MS analysis on M. conceptionense CRISPRi strains exposed to rifabutin to measure potential ADP-ribosylation.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be useful if the authors would complete Figure 1A by determining growth rates for the remaining 18 strains that they currently omitted.

      These growth rates were obtained using roller bottles and in at least 3 independent experiments, unfortunately the throughput is far ideal. The goal of the experiment was to highlight difference in growth rate, beyond fast- and slow-growth, which we did. Adding the remaining values would not change this conclusion. Growth rate variation in 7H9 is significant and the point is made in our figure.

      (2) The authors should justify their choice of species used in Figures 3-4. It would be useful to know, for instance, if the authors chose these species in an unbiased fashion, or if they were chosen because the authors had already determined that they possess rifamycin-modifying enzymes of interest. In that case, they wouldn't necessarily be a representative sample to use for the correlation analysis of antibiotic uptake and potency in Figure 3.

      They were chosen because of their resistance profile for BDQ, LZD and RIF. This has been addressed in the text, which now reads “Given the antibiotic response profiles observed, we selected BDQ, LZD and RIF to explore the molecular causes of these dramatic changes in antibiotic potency observed across the Mycobacterium genus.”

      (3) Figure 4b: The data in this panel appear inconsistent - for instance, M. houstonense appears to grow at 10X Mtb MIC, but fails to grow at 1X Mtb MIC. Repeating this experiment would better establish the validity of the authors' claims about the relative susceptibility of these strains to RIF.

      The figures got rotated when exported from illustrator. Corrected figure is uploaded, and original plate photos are also uploaded for clarity.

      (4) Figure 4e: Does Arr-X get upregulated in these proteomic datasets? The authors' argument that proteomic upregulation correlates with important drug resistance genes would imply that it might be, so that would be useful information to provide.

      Arr-X is slightly upregulated, but not statistically significant – this could be due to the native expression of Arr-1. Data is displayed in a previous answer.

      (5) I wasn't able to find the supplementary tables that the authors allude to - not sure if that was a file mixup, but those tables would be useful for interpreting the manuscript.

      We are sorry that you couldn’t access the table. It must be a file corruption issues, as the other reviewers were able to. We will make sure that all tables are available and accessible.

      (6) For LC/MS, the authors use peak height instead of peak area, which they argue correlates better with the amount of drug in cells because of the poor peak shape they observed for linezolid. This is not standard practice, so the authors should provide evidence to support this claim by running an LC/MS standard curve, then showing the correlation between peak height and amount of compound added as well as the correlation between peak area and compound.

      Thank you for pointing that out, accuracy calculated and displayed. Both peak area and height can be used, but indeed area is standard practice.

      (7) The authors should provide methods information about the LC column and the gradient settings used for LC-MS, as well as the settings of the MS.

      The full method has been added to the paper.

      Reviewer #3 (Recommendations for the authors):

      I have only minor comments aside from the information in the Public Review:

      (1) Results, section on Intra-bacterial antibiotic accumulation, line 8: "experiment was performed at a concentration of antibiotic PROPORTIONAL to its MIC" would be more accurate?

      Agreed and adjusted according to Reviewer’s suggestion.

      (2) Results, section on A minor role for pre-existing target modification, last sentence: the mere presence of RIF-ribosylating enzymes does not, in and of itself indicate that "RIF modification, and precisely ADP-ribosylation, is the dominant mechanism of resistance to RIF in mycobacteria", as other mechanisms and other forms of modifying enzymes are known to confer rifamycin resistance, with redundancy (e.g., other rifampicin-modifying enzymes, or helR-mediated dissociation of rifampicin-stalled RNA polymerases from DNA). It would be more appropriate to suggest the results presented to this point indicate RIF modification is common among mycobacteria. The evidence from the CRISPRi knockdown of Arrs shown in Fig 5d is the kind of evidence that suggests ribosylation as a dominant mechanism, at least against rifabutin in this particular species.

      Absolutely, there are other possible modifying enzymes that could be encoded by these mycobacterial species. There is a possibility that M. flavescens and M. smegmatis encode for a putative helR (attached alignment) but further experiments would need to be carried out to confirm its ability to displace RIF in the RNAP. Interestingly, the presence of both Arr and HelR has been studied in M. abscessus and those mechanisms of resistance are independent from each other (Molecular Cell 2022 82(17):3166-3177.e5).

      (3) Discussion, 2nd sentence needs grammatical editing.

      Rephrased and it reads “Using our mycobacterial library, we identified for the first time high- and ultra-high-level intrinsic resistance (3) to many of the antibiotics tested. Of note, the resistant phenotype is naturally occurring and not a result of mutations due to exposure to the antibiotic in the clinic – which is the more traditional approach for probing mechanisms of antibiotic resistance. Our observations revealed that resistance profiles are highly variable across the genus and do not follow phylogeny, implicating HGT as the key mechanism for acquisition of resistance determinants and evolution of antibiotic resistance in mycobacteria (42).”

      (4) Discussion, page 7, first line: the inclusion of LZD and BDQ in this statement seems at odds with Figure 2c and the statements in the first paragraph of page 5 highlighting these as examples of drugs to which most mycobacteria are susceptible.

      Indeed, many of the species are susceptible, however the MIC<sub>99</sub> levels observed have never been reported before, and therefore we found it to be an interesting finding to highlight. From a treatment perspective, knowing which species are sensitive to which drugs is of course the most useful outcome of our study.

      (5) The next sentence..."We found that resistance to these antibiotics in mycobacteria cannot be explained by uptake/efflux mechanisms..." is a bit of an over-generalization and conflicts with the evidence presented earlier that efflux could be playing a role in BDQ resistance and the published evidence establishing a clinically significant role for efflux-mediated BDQ resistance in M. tuberculosis, M. avium complex and M. abscessus complex.

      We rephrased it to make it more specific to our findings. It reads “We found that resistance to these antibiotics in mycobacteria do not correlate with by uptake/efflux mechanisms in the species tested and it does not correlate with growth rate. Identification of mycobacterial species highly resistant to BDQ and LZD is worrisome as most of this species, if not all, have never been exposed to these drugs.”

      (6) Methods, section on In vitro activity assay of Arr enzymes, line 1: reference(s) should be provided for previously reported methods.

      Reference now added.

      (7) Figure 2d: the low end of the susceptibility range is not well defined.

      In this figure the susceptibility is not defined as the lowest area of the graph, but the lower concentrations are indeed harder to be defined. Hopefully supplementary figure 1 and the additional table containing the MIC can be informative to address this comment.

      (8) Figures 3c,d: the presentation of the relative antibiotic concentrations could be harmonized between the graphs in 3c and those in 3d to enable a more ready comparison.

      We disagree. The goal of these different panels is exactly to illustrate two distinct points. C gives the relative concentration of antibiotic, while D correlates relative concentration with MIC99. The use of log scale in D further clarifies that there is no correlation between intracellular antibiotic concentration and potency (MIC). This information is not present in C.

      (9) Figure 4f and Supplementary Figure 5b: it is difficult to understand the limited amount of ribsosyl-RIF in M. flavescens in Fig 4f relative to Supplementary Figure 5b (esp. when considering M. smeg as a common comparator); and, further, to understand the seeming lack of correlation between RIF susceptibility, ribosylation and Arr number and catalytic efficiency for these two strains without considering additional resistance mechanisms.

      In reality the difference between figure 4f and Supplementary figure 5b is mainly due to M. smegmatis – that has an apparent lower production of ribosyl-RIF in the experiment described in the supplementary figure. The values for M. flavescens are relatively similar. In addition, the ADP-Ribosyl-RIF is not the final metabolite of the pathway.

      In regards of having the entire picture, it is true that we were unable to completely unravel and correlate MIC value, expression of Arr-1, expression of Arr-3, efficiency of each enzyme, production of ADP-Ribosyl-RIF and the presence of other possible mechanisms of resistance and this is indeed a setback in our study, and of most studies ever published, which usually focus on one resistant determinant.

    1. Author response:

      The following is the authors’ response to the original reviews

      Many thanks for your helpful and constructive comments for our work examining the effect of inhibiting both the insulin receptor (IR) and IGF1 receptor (IGF1R) in the podocyte. We are pleased to submit an updated manuscript addressing your concerns.

      (1) A major concern was a lack of mechanistic insight into how deletion (or knock-down) of both receptors caused the spliceosomal phenotype (Reviewer 1 and Reviewer 3).

      We now think this is due to the lack of a network of insulin/IGF phospho-signalling events to a variety of spliceosomal proteins and kinases. The reasons for this are as follows:

      A. Since submitting our paper Turewicz et al have published a comprehensive phospho-proteomic paper examining the effects of 100nM insulin on human primary myotubes (DOI: 10.1038/s41467-025-56335-6). They discovered that multiple post-translational phosphorylation events occur in a variety of spliceosomal proteins at differing time points (1 minute to 60 minutes). Furthermore, they show that mRNA splicing is rapidly modified in response to insulin stimulation in their cells. This follows elegant work from Bastista et al who studied diabetic and non-diabetic iPSC derived human myositis and also detected a spliceosome phosphorylation signature (DOI: 10.1016/j.cmet.2020.08.007).

      B. We have examined phospho-proteosome changes that occur in wild -type podocytes (expressing both the IR and IGF1R) compared to double (IR and IGF1R) knockout cells using phosho-proteomics. We have done this 3 days after inducing receptor knockdown, before major cell loss, and have stimulated the cells with either 10nM insulin or 100mg IGF1.

      Interestingly, we detected several post-translational modifications (PTM) in our data set that are also present in Turewicz’s studies. Of note, 100nM insulin (as used by Turewicz) will signal through both the insulin and IGF1 receptor (and hybrid Insulin/IGF1 receptors) which is relevant to our studies.

      Our work shows a cascade of phospho- signalling events affecting multiple components of the spliceosomal complex and evidence of kinase modulation (phosphorylation) (New Figure 7 and supplementary Figure 5). Also new results section in paper (lines 391-425 in track changes version). We acknowledge that we only studied a single time point after stimulation (10 minutes) and could have missed other PTM in the spliceosomal complex and other kinases. This is mentioned in our new limitations of study section (lines 595-606). This will be a focus of future work. We did not find major PTM differences when stimulating with either insulin or IGF1 in our studies and suspect that the doses of insulin (10nM) and IGF1 (100mg) used are still able to signal through cognate receptors.

      Furthermore, we have examined the relative contributions of the insulin and IGF1 receptor in detail in the model (addressed in point 13 below).

      (2) The phenotype of the mouse is only superficially addressed. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness. (Reviewer 1)

      We apologise for not making this clear, but we did assess the level of receptor knockdown in both the animal and cell models. The in vivo model showed variable and non-complete levels of insulin receptor and IGF1 receptor podocyte knock down (shown in supplementary Figure 1C). This is why we made the in vitro floxed podocyte cell lines in which we could robustly knockdown both the IR and IGF1R. We show this using Western blotting (shown in Figure 2A). We agree that calling the models knockout is misleading and have changed all to knock down (KD) now.

      (3) The mouse experiments would be improved if the serum creatinine’s were measured to provide some idea how severe the kidney injury is. (Reviewer 1)

      There is variability in creatinine levels which is not uncommon in transgenic mouse models (probably partly due to variability in receptor knock down levels with cre-lox system). This is part of rationale of developing the robust double receptor knockout cell models where we robustly knocked out both receptors by >80%. We have added measured creatinine levels in a subset of mice in supplementary data (New Supplementary Figure 1E) and mention this in the text (lines 285-286). As some mice died we expect they may have developed acute kidney injury, but we did not serially measure the creatinine’s in every mouse over time. We could have assessed the GFR in a more sensitive way to look at differences. However, we consider the highly significant levels of albuminuria and histological damage observed in our models show a significant kidney phenotype.

      (4) An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't work, an explanation in the text would suffice. (Reviewer 1).

      We did consider doing this but on reflection think it is very unlikely to rescue the phenotype as an array of different spliceosomal proteins quantitatively changed and were differentially phosphorylated / dephosphorylated throughout the complex (as we hope our revised work illustrates now). We think a single protein rescue is highly unlikely to work. We hope this is an appropriate explanation for this action. We have mentioned this in the text now in our discussion (lines 601-602).

      (5) As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on. (Reviewer 1).

      Thank you for this suggestion. We did not extensively examine the metabolism of the mice however we did perform blood glucose measurement and weight which are included in the paper (Figure 1A and Figure 1B).

      (6) The authors should caveat the cell experiments by discussing the ramifications of studying the 50% of the cells that survive vs the ones that died. (Reviewer 1).

      We appreciate this and this was the rationale behind cells being studied after 3 days differentiation for total and phospho-proteomics before significant cell loss to avoid the issue of studying the 50% of cells that survive (which happened at 7 days). We have made this clearer in the manuscript. We also have added the data showing less cell death at 3 days in the cell model (New Supp Figure 2B).

      (7) It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity. (Reviewer 2)

      We did this and have added to manuscript (line 113).

      (8) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable. (Reviewer 2)

      All graphs have now been changed to SD rather than SEM.

      (9) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for this method; a reference could be helpful. (Reviewer 2)

      This is an excellent question. We had 30% overlap as the parameters used for analysis were very stringent. We suspect we could get more than 30% by being less stringent, which still be considered as similar events if requested. Our methods were based on FLAIR analysis (PMID: 32188845). We have added this reference to the manuscript (Line 242 & 680).

      (10) With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild. (Reviewer 1)

      Thank you. The receptor knock-out (KO) in the mice is highly unlikely to be complete (Please see comments above and Supplementary Figure 1C). There are many examples of “KO” animal models targeting other tissues showing that complete KO of these receptors seems difficult to achieve, particularly in reference to the IGF1 receptor. In the brain, which also contains terminally differentiated cells, barely 50% of IGF1R knockdown was achieved in the target cells (PMID:28595357). In ovarian granulosa cells (PMID:28407051) -several tissue specific drivers tried but couldn't achieve any better than 80%. The paper states that 10% of IGF1R is sufficient for function in these cells so they conclude that their knockdown animals are probably still responding to IGF1. Finally, in our recent IGF1R podocyte knockdown model we found Cre levels were important for excision of a single homozygous floxed gene (PMID: 38706850) hence we were not surprised that trying to excise two homozygous floxed genes (insulin receptor and IGF1 receptor) was challenging. This was the rationale for making the double receptor knockout cell lines to understand processes / biology in more detail. As stated earlier, we have changed our description of the mice and cell lines from knock-out to knock-down throughout the revised manuscript as this is more accurate.

      (11) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full-time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post-natal development or number of glomeruli? (Reviewer 3)

      We have added further urinary Albumin:creatinine ratio (uACR) data at 12, 16 and 20 weeks to manuscript. We do not think there was a major developmental phenotype as albuminuria did not become significantly different until several months of age (new Supp Figure 1B). We did consider using a doxycycline inducible model but we know the excision efficiency is much less than the constitutive podocin-cre driven model Author response image 1. This would likely give a very mild (if any) phenotype when attempting to knockout both receptors and not reveal the biology adequately. We acknowledge the weaknesses of the animal model and this was the rationale for generating the cell models.

      (12) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo. (Reviewer 3)

      This is a good point. We have now immune-stained the DKD and WT mice for Sf3b4 (a spliceosomal change in our in vitro proteomics) and also find a significant reduction in this protein in podocytes of the DKD mice (New Figure 3F).

      (13) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions. (Reviewer 3)

      We have now performed proteomics and phospho-proteomics in all 4 cell types (Wild-type, Insulin receptor knock down, IGF1R knockdown and double knockdown) at 3 days (New Figure 8 and supplementary Figure 6. Also new results section lines 425 to 450). This shows that both receptors contribute to the pathways (and hence there is a high level of compensation built into the system). For total proteins we detected that spliceosomal tri-snRNP was only reduced when both receptors were lacking but other proteins / pathways had an incremental effect of losing the insulin or IGF1 receptor. Likewise, the spliceosomal phospho-signaling events can go through either the insulin or igf1 receptors predominantly or through both. We think this reflects the complexity of this system and how evolutioatily it has developed in mammals to protect against its loss.

      Finally in revision we have rewritten the discussion with a “limitations of the study” section and hopefully in an easier to read fashion for the readership.

      Author response image 1.

      (A) mT/mG reporter mouse crossed to constitutional podocin Cre heterozygous mouse. Illustrates podocyte specificity for Cre driver and excision Of reporter Figure shows GFP expression in Cre producing cells (top panel scale bar=250vm; bottom panel scale bar=50pm). Cre expression causes GFP to be switched on. (B) mT/mG reporter mouse crossed to podocin RtTA— tet-o-cre heterozygous mouse shows podocyte specificity for driver and approximately 60% excision. (top and bottom panels scale bar=250pm; middle panel scale bar=50pm). Doxycycline required for expression showing not leaky.

    1. Capulet. When the sun sets, the air doth drizzle dew; But for the sunset of my brother's son It rains downright. 2235How now! a conduit, girl? what, still in tears? Evermore showering? In one little body Thou counterfeit'st a bark, a sea, a wind; For still thy eyes, which I may call the sea, Do ebb and flow with tears; the bark thy body is, 2240Sailing in this salt flood; the winds, thy sighs; Who, raging with thy tears, and they with them, Without a sudden calm, will overset Thy tempest-tossed body. How now, wife! Have you deliver'd to her our decree? 2245 Lady Capulet. Ay, sir; but she will none, she gives you thanks. I would the fool were married to her grave! Capulet. Soft! take me with you, take me with you, wife. How! will she none? doth she not give us thanks? Is she not proud? doth she not count her blest, 2250Unworthy as she is, that we have wrought So worthy a gentleman to be her bridegroom? Juliet. Not proud, you have; but thankful, that you have: Proud can I never be of what I hate; But thankful even for hate, that is meant love. 2255 Capulet. How now, how now, chop-logic! What is this? 'Proud,' and 'I thank you,' and 'I thank you not;' And yet 'not proud,' mistress minion, you, Thank me no thankings, nor, proud me no prouds, But fettle your fine joints 'gainst Thursday next, 2260To go with Paris to Saint Peter's Church, Or I will drag thee on a hurdle thither. Out, you green-sickness carrion! out, you baggage! You tallow-face! Lady Capulet. Fie, fie! what, are you mad? 2265 Juliet. Good father, I beseech you on my knees, Hear me with patience but to speak a word. Capulet. Hang thee, young baggage! disobedient wretch! I tell thee what: get thee to church o' Thursday, Or never after look me in the face: 2270Speak not, reply not, do not answer me; My fingers itch. Wife, we scarce thought us blest That God had lent us but this only child; But now I see this one is one too much, And that we have a curse in having her: 2275Out on her, hilding! Nurse. God in heaven bless her! You are to blame, my lord, to rate her so. Capulet. And why, my lady wisdom? hold your tongue, Good prudence; smatter with your gossips, go. 2280 Nurse. I speak no treason. Capulet. O, God ye god-den. Nurse. May not one speak? Capulet. Peace, you mumbling fool! Utter your gravity o'er a gossip's bowl; 2285For here we need it not. Lady Capulet. You are too hot. Capulet. God's bread! it makes me mad: Day, night, hour, tide, time, work, play, Alone, in company, still my care hath been 2290To have her match'd: and having now provided A gentleman of noble parentage, Of fair demesnes, youthful, and nobly train'd, Stuff'd, as they say, with honourable parts, Proportion'd as one's thought would wish a man; 2295And then to have a wretched puling fool, A whining mammet, in her fortune's tender, To answer 'I'll not wed; I cannot love, I am too young; I pray you, pardon me.' But, as you will not wed, I'll pardon you: 2300Graze where you will you shall not house with me: Look to't, think on't, I do not use to jest. Thursday is near; lay hand on heart, advise: An you be mine, I'll give you to my friend; And you be not, hang, beg, starve, die in 2305the streets, For, by my soul, I'll ne'er acknowledge thee, Nor what is mine shall never do thee good: Trust to't, bethink you; I'll not be forsworn. [Exit]

      lord Capulet enters and mocks Juliet's grief however after he learns that Juliet is rejecting the wedding he gets enraged saying that he would drag her to the church himself he then gives juliet a ultimatium saying if he doesnt marry paris he would disown juliet and leave her a beggar on the streets

    2. Tybalt. Well, peace be with you, sir: here comes my man. Mercutio. But I'll be hanged, sir, if he wear your livery: 1555Marry, go before to field, he'll be your follower; Your worship in that sense may call him 'man.' Tybalt. Romeo, the hate I bear thee can afford No better term than this,—thou art a villain. Romeo. Tybalt, the reason that I have to love thee 1560Doth much excuse the appertaining rage To such a greeting: villain am I none; Therefore farewell; I see thou know'st me not. Tybalt. Boy, this shall not excuse the injuries That thou hast done me; therefore turn and draw. 1565 Romeo. I do protest, I never injured thee, But love thee better than thou canst devise, Till thou shalt know the reason of my love: And so, good Capulet,—which name I tender As dearly as my own,—be satisfied. 1570 Mercutio. O calm, dishonourable, vile submission! Alla stoccata carries it away. [Draws] Tybalt, you rat-catcher, will you walk? Tybalt. What wouldst thou have with me? 1575 Mercutio. Good king of cats, nothing but one of your nine lives; that I mean to make bold withal, and as you shall use me hereafter, drybeat the rest of the eight. Will you pluck your sword out of his pitcher by the ears? make haste, lest mine be about your 1580ears ere it be out. Tybalt. I am for you.

      tybalt spots romeo and challenges him to a fight but romeo refuses and says that we are closer than you think mercutio sees romeo as a coward and decide to draw his sword challenging tybalt in romeos place

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the role of E2 ubiquitin enzyme, Uev1a in tissue resistance to oncogenic RasV12 in Drosophila melanogaster polyploid germline cells and human cancer cell lines. The incomplete evidence suggests that Uev1a works with the E3 ligase APC/C to degrade Cyclin A, and the strength of evidence could be increased by addressing the expression of CycA in the ovaries and the uev1a loss of function in human cancer cells. This work would be of interest to researchers in germline biology and cancer.

      Thank you for your valuable assessment. The requested data on CycA expression (Figure 4E-G) and uev1a loss-of-function in human cancer cells (Figure 8 and Figure 8-figure supplement 2) have been added to the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uncovers a protective role of the ubiquitin-conjugating enzyme variant Uev1A in mitigating cell death caused by over-expressed oncogenic Ras in polyploid Drosophila nurse cells and by RasK12 in diploid human tumor cell lines. The authors previously showed that overexpression of oncogenic Ras induces death in nurse cells, and now they perform a deficiency screen for modifiers. They identified Uev1A as a suppressor of this Ras-induced cell death. Using genetics and biochemistry, the authors found that Uev1A collaborates with the APC/C E3 ubiquitin ligase complex to promote proteasomal degradation of Cyclin A. This function of Uev1A appears to extend to diploid cells, where its human homologs UBE2V1 and UBE2V2 suppress oncogenic Ras-dependent phenotypes in human colorectal cancer cells in vitro and in xenografts in mice.

      Strengths:

      (1) Most of the data is supported by a sufficient sample size and appropriate statistics.

      (2) Good mix of genetics and biochemistry.

      (3) Generation of new transgenes and Drosophila alleles that will be beneficial for the community.

      We greatly appreciate your comments.

      Weaknesses:

      (1) Phenotypes are based on artificial overexpression. It is not clear whether these results are relevant to normal physiology.

      Downregulation of Uev1A, Ben, and Cdc27 together significantly increased the incidence of dying nurse cells in normal ovaries (Figure 5-figure supplement 2), indicating that the mechanism we uncovered also protects nurse cells from death during normal oogenesis.

      (2) The phenotype of "degenerating ovaries" is very broad, and the study is not focused on phenotypes at the cellular level. Furthermore, no information is provided in the Materials and Methods on how degenerating ovaries are scored, despite this being the most important assay in the study.

      Thank you for pointing out this issue. We quantified the phenotype of nurse cell death using “degrading/total egg chambers per ovary”, not “degenerating ovaries”. Normal nurse cell nuclei exhibit a large, round morphology in DAPI staining (see the first panel in Figure 1D). During early death, they become disorganized and begin to condense and fragment (see the second panel in Figure 1D). In late-stage death, they are completely fragmented into small, spherical structures (see the third panel in Figure 1D), making cellular-level phenotypic quantification impossible. Since all nurse cells within the same egg chamber are interconnected, their death process is synchronous. Thus, quantifying the phenotype at the egg-chamber level is more practical than at the cellular level. We have added the description of this death phenotype and its quantification to the main text (Lines 104-108).

      (3) In Figure 5, the authors want to conclude that uev1a is a tumor-suppressor, and so they over-express ubev1/2 in human cancer cell lines that have RasK12 and find reduced proliferation, colony formation, and xenograft size. However, genes that act as tumor suppressors have loss-of-function phenotypes that allow for increased cell division. The Drosophila uev1a mutant is viable and fertile, suggesting that it is not a tumor suppressor in flies. Additionally, they do not deplete human ubev1/2 from human cancer cell lines and assess whether this increases cell division, colony formation, and xenograph growth.

      We apologize for any misleading description. We aimed to demonstrate that UBE2V1/2, like Uev1A in Drosophilanos>Ras<sup>G12V</sup>+bam-RNAi” germline tumors, suppress oncogenic KRAS-driven overgrowth in diploid human cancer cells. Importantly, this function of Uev1A and UBE2V1/2 is dependent on Ras-driven tumors; there is no evidence that they act as broad tumor suppressors in the absence of oncogenic Ras. Drosophila uev1a mutants were lethal, not viable (see Lines 135-137), and germline-specific knockdown of uev1a (nos>uev1a-RNAi) caused female sterility without inducing tumors. These findings suggest that Uev1A lacks tumor-suppressive activity in the Drosophila female germline in the absence of Ras-driven tumors. We have revised the manuscript to prevent misinterpretation. Furthermore, we have added data demonstrating that the combined knockdown of UBE2V1 and UBE2V2 significantly promotes the growth of KRAS-mutant human cancer cells, as suggested (Figure 8 and Figure 8-figure supplement 2).

      (4) A critical part of the model does not make sense. CycA is a key part of their model, but they do not show CycA protein expression in WT egg chambers or in their over-expression models (nos.RasV12 or bam>RasV12). Based on Lilly and Spradling 1996, Cyclin A is not expressed in germ cells in region 2-3 of the germarium; whether CycA is expressed in nurse cells in later egg chambers is not shown but is critical to document comprehensively.

      We appreciate your critical comment. CycA is a key cyclin that partners with Cdk1 to promote cell division (Edgar and Lehner, 1996). Notably, nurse cells are post-mitotic endocycling cells (Hammond and Laird, 1985) and typically do not express CycA (Lilly and Spradling, 1996) (see the last sentence, page 2518, paragraph 3 in this 1996 paper). However, their death induced by oncogenic Ras<sup>G12V</sup> is significantly suppressed by monoallelic deletion of either cycA or cdk1 (Zhang et al., 2024). Conversely, ectopic CycA expression in nurse cells triggers their death (Figure 4C, D). These findings suggest that polyploid nurse cells exhibit high sensitivity to aberrant division-promoting stress, which may represent a distinct form of cellular stress unique to polyploid cells. In the revised manuscript, we have provided the CycA-staining data, comparing its expression in normal nurse cells versus cells undergoing oncogenic Ras<sup>G12V</sup>-induced death (Figure 4E-G).

      (5) The authors should provide more information about the knowledge base of uev1a and its homologs in the introduction.

      Thank you for your suggestion. In the revised introduction, we have provided a more detailed description of Uev1A (Lines 72-79). Additionally, we have introduced its human homologs, UBE2V1 and UBE2V2, in the main text (Lines 143-145).

      Reviewer #2 (Public review):

      Summary:

      The authors performed a genetic screen using deficiency lines and identified Uev1a as a factor that protects nurse cells from RasG12V-induced cell death. According to a previous study from the same lab, this cell death is caused by aberrant mitotic stress due to CycA upregulation (Zhang et al.). This paper further reveals that Uev1a forms a complex with APC/C to promote proteasome-mediated degradation of CycA.

      In addition to polyploid nurse cells, the authors also examined the effect of RasG12V-overexpression in diploid germline cells, where RasG12V-overexpression triggers active proliferation, not cell death. Uev1a was found to suppress its overgrowth as well.

      Finally, the authors show that the overexpression of the human homologs, UBE2V1 and UBE2V2, suppresses tumor growth in human colorectal cancer xenografts and cell lines. Notably, the expression of these genes correlates with the survival of colorectal cancer patients carrying the Ras mutation.

      Strength:

      This paper presents a significant finding that UBE2V1/2 may serve as a potential therapy for cancers harboring Ras mutations. The authors propose a fascinating mechanism in which Uev1a forms a complex with APC/C to inhibit aberrant cell cycle progression.

      We greatly appreciate your comments.

      Weakness:

      The quantification of some crucial experiments lacks sufficient clarity.

      Thank you for highlighting this issue. We have provided more details regarding the quantification data in the revised manuscript.

      References

      Edgar, B.A., and Lehner, C.F. (1996). Developmental control of cell cycle regulators: a fly's perspective. Science 274, 1646-1652.

      Hammond, M.P., and Laird, C.D. (1985). Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91, 267-278.

      Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10, 2514-2526.

      Zhang, Q., Wang, Y., Bu, Z., Zhang, Y., Zhang, Q., Li, L., Yan, L., Wang, Y., and Zhao, S. (2024). Ras promotes germline stem cell division in Drosophila ovaries. Stem Cell Reports 19, 1205-1216.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The figure legends insufficiently describe the figures. One example is Figure 3, where there are no details in the figure legend about what conditions apply to each panel and each lane of the gels.

      For clarity and brevity, detailed experimental conditions are described in the Materials and Methods section. Figure legends therefore focus on summarizing the key findings. Thank you for your understanding!

      (2) The font size on the figure is too small.

      Thank you for your constructive suggestion. In response, we have enlarged all font sizes to improve readability.

      (3) There are places where the authors overstate their results, and there are issues with the clarity of the text:

      (3a) Lines 170: "excessive" is not appropriate. Their prior study showed a mild increase in proliferation.

      “Excessive” has been removed in the revised manuscript (Lines 215-216).

      (3b) Line 187-8: The authors should restate this sentence. Here's a possibility. Over-expression of Uev1a suppressed the phenotypes caused by CycA over-expression.

      This sentence has been restated as “Notably, this cell death was suppressed by co-overexpression of CycA and Uev1A, indicating a genetic interaction between them”. (Lines 229-231).

      (3c) Lines 266-7: The properties of Uev1a (ie, lacking a conserved Cys) should be in the introduction.

      This information has been added to the revised introduction (Lines 74-76).

      (3d) Line 318: "markedly" is an overstatement of the prior results.

      Our quantification data revealed that “nos>Ras<sup>G12V</sup>; bam<sup>-/-</sup>” ovaries are three times larger than “nos>GFP; bam<sup>-/-</sup>” control ovaries (see Figure 4A-C in Zhang et al., Stem Cell Reports 19, 1205-1216). Given this substantial difference, we think that using "markedly" is not an overstatement.

      (4) Data not shown occurs in a few places in the text. Given the ability to supply supplemental information in eLife preprints, these data should be shown.

      Thanks for your suggestion. All “not shown” data have been added to the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Comments

      (1) Cyclin A (CycA) is a key player in this study, but the authors do not provide evidence showing the upregulation of CycA following Ras overexpression in either polyploid or diploid cells. Data on CycA expression should be included.

      Thank you for your constructive suggestion. These data have been added to the revised manuscript (Figure 4E-G).

      (2) DNA replication stress, cellular senescence, and cell death should be assessed under Ras overexpression (RasOE) and RasOE + Uev1A RNAi conditions to support the model proposed in Figure 4F.

      We apologize for any confusion caused by our initial model. We do not have evidence that DNA replication stress and cellular senescence occur under these conditions. Cell death can be readily detected through the presence of fragmented nuclei and condensed DNA (see Figure 1D). The model has been updated accordingly (Figure 9E).

      (3) Appropriate controls should be performed alongside the experimental sets. The same nos>Ras+GFPi data set was repeatedly used in Figures 1I, 2B, 2H, and Figures 2, S2B, which is not ideal.

      All these experiments were performed under identical conditions. Therefore, we deem it appropriate to use the same control data across these analyses.

      (4) Overall, the microscopic images are too small and hard to see.

      Thank you for raising this important point. In the revised manuscript, all images and the font size on figures have been enlarged for improved clarity.

      (5) Figure 1H

      Why is the frequency of egg chamber degradation quite less in nos>RasG12V+GFP-RNAi (about 40%) than nos > RasG12V (about 80%)? And the authors do not show that there is a significant difference between those two conditions, although it should be there. We will need the explanation from the authors on why there is a difference here.

      These overexpression experiments were conducted using the GAL4/UAS system. While both “nos>Ras<sup>G12V</sup>+GFP-RNAi” and “nos>Ras<sup>G12V</sup>” contain a single nos-GAL4 driver, they differ in UAS copy number: the former incorporates two UAS elements compared to only one in the latter (see the detailed genotypes in Source data 2). These results demonstrate that UAS copy number impacts experimental outcomes in our system.

      In the previous paper (Zhang et al. (2024), Figure 7H shows that the frequency of egg chambers in nos>RasG12V is 33%, although this paper shows it as about 80%. There seems to be a difference in flies' age (previous paper: 7d, this paper: 3d), but this data raises the question of why nos>RasG12V shows more egg chamber degradation this time.

      We greatly appreciate your careful observation. The nurse-cell-death phenotype exhibits a spectrum from mild to severe manifestations [see Figure 1D and our response to weekness (2) in Reviewer #1’s public reviews]. While our 2024 paper exclusively quantified egg chambers with severe phenotypes as degrading, the current study included both mild and severe cases in this classification. We do not think fly age could account for this substantial phenotypic difference. A detailed description of the nurse-cell-death phenotype and its quantification have been added to the revised manuscript (Lines 104-108).

      In the following experiments, only nos>RasG12V+GFP-RNAi is used as a control (Figures 2B, H, S2B). I wonder if these results would give us a different conclusion if nos>RasG12V were used as a control.

      As explained above, the UAS copy number does matter in our analyses, so it is important to keep them identical for comparison.

      (6) In the abstract, the authors mention that uev1a is an intrinsic factor to protect cells from RasG12V-induced cell death. RasG12V does not induce much cell death of cystocytes with bam-gal4, whereas it induces a lot of nurse cells' death. Does it mean the intrinsic expression level of uev1a is low in nurse cells (or polyploid cells) compared to cystocytes (or diploid cells)?

      Overexpression of Ras<sup>G12V</sup> driven by bam-GAL4 exhibited only minimal nurse cell death (Figure 1D, E). Additionally, Uev1A exhibited low intrinsic expression levels in both cystocytes and nurse cells (Figure 3E and Figure 5-figure supplement 1).

      (7) Is uev1a-RNAi alone sufficient to induce egg chamber degradation? Or does it have any effect on ovarian development? (Related to question #1 in minor comments)

      While nos>uev1a-RNAi resulted in female sterility, it alone was insufficient to induce egg chamber degradation. However, simultaneous downregulation of Uev1A, Ben, and Cdc27 triggered significant egg chamber degradation (Figure 5-figure supplement 2).

      (8) Which stages of egg chambers get degraded with RasG12V induction?

      This is a good question. In our analyses, we noted that degrading egg chambers exhibited considerable size variability (Figure 1D). Because degradation disrupts normal morphological cues, precise staging of these egg chambers is nearly impossible.

      (9) I suggest testing the cellular senescence marker as well if the authors mention that CycA-degradation by Uev1a-APC/C complex prevents cellular senescence induced by RasG12V in a schematic image of Figure 4 (e.g., Dap/p21, SA-β-gal).

      As addressed in our response to your Major Comment (2), we lacked experimental evidence to support cellular senescence in this context. We have therefore revised the model accordingly (Figure 9E). While this study focuses specifically on cell death, investigating potential roles of cellular senescence remains an important direction for future research. Thank you for your suggestion!

      Minor Comments

      (1) Figure 1D: Df#7584

      It seems that the late-stage egg chamber is missing in this condition. Why does this occur without egg chamber degradation? Is there a possibility that we do not see egg chamber degradation because this deficiency line does not have a properly developed egg chamber that can have a degradation?

      While this image represents only a single sample, we have confirmed the presence of late-stage egg chambers in other samples. If “Df#7584/+” females were unable to support late-stage egg chamber development, complete sterility would be expected due to the lack of mature eggs. However, as shown in this image (Figure 1D), the ovary contains mature eggs, and the “Df#7584/+” fly strain remains fertile.

      (2) Based on the results that DDR signaling functions as keeping egg chambers from degradation, the authors may be better to check the DNA-damage markers in nos>RasG12V, nos>RasG12V +uev1a. (e.g. γ-H2AX)

      Thank you for your constructive recommendation. These data have been added to the revised manuscript (Figure 3C).

    1. Author response:

      eLife Assessment

      Using genome databases, the authors performed solid bioinformatic analyses to trace the genomic history of the clinically relevant Staphylococcus aureus tetracycline resistance plasmid pT181 over the last seven decades. They discovered that this element has transitioned from a multicopy plasmid to a chromosomally integrated element, and the work represents a valuable demonstration of the use of publicly available data to investigate plasmid biology and inform clinical epidemiology. This work will appeal to researchers interested in staphylococcal evolution and plasmid biology.

      Thank you, we agree with this overview. We also think this work is interesting to people interested in antimicrobial resistance and bacterial genome structure.

      Public Reviews:

      Reviewer #1 (Public review):

      The study provides a robust bioinformatic characterization of the evolution of pT181. My main criticism of the work is the lack of experimental validation for the hypotheses proposed by the authors.

      Comments on the study:

      (1) One potential reason for the decline in pT181 copy number over time may be a high cost associated with the multicopy state. In this sense, it would be interesting if the authors could use (or construct) isogenic strains differing only in the state of the plasmid (multicopy/integrated). With this system, the authors could measure the fitness of the strains in the presence and absence of tetracycline, and they could be able to understand the benefit associated with the plasmid transition. The authors discuss these ideas, but it would be nice to test them.

      We agree that the relative fitness of integrated versus multicopy plasmids is interesting and a costly multicopy state could explain the transition of independent pT181 replicons to chromosomal integration. This is a project we are exploring for a future study. However, we think that this additional experimental work goes beyond the scope of the paper.

      (2) It would be interesting to know the transfer frequencies of the multicopy mobilizable pT181 plasmid, compared to the transfer frequency of the plasmid integrated into the SSCmec element (which can be co-transferred, integrated in conjugative plasmids, or by transduction).

      We agree with the reviewer that this is an interesting question. However, we think inferring these rates from natural sequence data is not feasible in this case given the low heterogeneity of the plasmid sequence. A laboratory-based experimental study could not address the real transfers we observe over the course of decades, as in vitro S. aureus transfer rates are often not good proxies for in vivo (McCarthy et al., 2014). In addition, we do not know what is moving the integrated plasmid. pT181 could be moved by a phage or plasmid, so we are uncertain what the correct experiment would be to explore this.

      (3) One important limitation of the study that should be mentioned is that inferring pT181 PCN from whole genome data can be problematic. For example, some DNA extraction methods may underestimate the copy number of small plasmids because the small, circular plasmids are preferentially depleted during the process (see, for example, https://www.nature.com/articles/srep28063).

      We will investigate this issue further in the revisions. The kits used to extract DNA for the earlier-collected samples may possibly yield more plasmid DNA relative to the chromosome compared to newer ones on average; however, we think this is not driving the decline that we observe in multicopy pT181 copy number. Multiple BioProjects find the same result, where earlier samples have higher copy number compared to later samples. We expect extraction methods to be consistent within a BioProject, suggesting that this decline is genuine and not technical. In revisions, we intend to evaluate the effect of date of sequencing and additional metadata on copy number.

      Reviewer #2 (Public review):

      Summary:

      The authors performed bioinformatic analyses to trace the genomic history of the clinically relevant pT181 plasmid. Specifically, they:

      (1) Tracked the presence of pT181 across different S. aureus strain backgrounds through time. It was first found in one, later multiple strains, though this may reflect changes in sampling over time.

      (2) Estimated the mutation rate of the chromosome and plasmid.

      (3) Estimated the plasmid copy number of pT181, and found that it decreased over time. The latter was supported by two sets of statistical analyses, first showing that the number of single-copy isolates increased over time, and second, that the multicopy isolates demonstrated a lower PCN over time.

      (4) Reported the different integration sites at which pT181 integrated into the genome.

      As a caveat, they mentioned that identical plasmid sequences have variable plasmid copy numbers across different genomes in their dataset.

      Strengths:

      This is a very solid, well-considered bioinformatic study on publicly available data. I greatly appreciate the thoughtful approach the authors have taken to their subject matter, neither over- nor underselling their results. It is a strength that the authors focused on a single plasmid in a single bacterial species, as it allowed them to take into account unique knowledge about the biology of this system and really dive deep into the evolution of this specific plasmid. It makes for a compelling case study. At the same time, I think the introduction and discussion can be strengthened to demonstrate what lessons might be drawn from this case study for other plasmids.

      Weaknesses:

      The finding that the pT181 copy number declined over time is the most interesting claim of the paper to me, and not something that I have seen done before. While the authors have looked at some confounders in this analysis, I think this could be strengthened further in a revision.

      In the revisions, we will further explore the impact that technical variation could have in contributing to copy number variation and update our claims for the decline in copy number of the independent replicon over time and variation for the same plasmid sequence accordingly. Multiple BioProjects show earlier samples have higher copy number compared to later samples; we expect extraction methods to be consistent within a BioProject, supporting our initial findings that this decline over time is not due to technical variation.

      For the flow of the storyline, I also think the estimation of mutation rates (starting L181) and integration into the chromosome (starting L255) could be moved to the supplement or a later position in the main text.

      We will revisit the text organization for flow and clarity of storyline.

      Clearly, the use of publicly available data prevents the authors from controlling the growth and sequencing conditions of the isolates. It is striking that they observe a clear signal in spite of this, but I would have loved to see more discussion of the metadata that came with the publicly available sequences and even more use of that metadata to control for confounding.

      In revisions, we will further investigate possible contributors to the observed decline in copy number of multicopy pT181 over time. We have incorporated the date of sample collection and BioProject in our analysis, but not the date of sequencing or extraction technique.

      References

      McCarthy, A. J., Loeffler, A., Witney, A. A., Gould, K. A., Lloyd, D. H., & Lindsay, J. A. (2014). Extensive horizontal gene transfer during Staphylococcus aureus co-colonization in vivo. Genome Biology and Evolution, 6(10), 2697–2708. https://doi.org/10.1093/gbe/evu214

    1. ord itself imilieu in which records are creattermined by all these factors: fustructures, as well as records-creaobservation I am not abandoninggrounding in the evidence, structuway. I am asserting, however, thatcircumstances of creation a

      When I think about this passage alongside the rise of artificial intelligence (AI), Cook’s emphasis on context feels even more urgent. Terry Cook argues that records are shaped by the functional and structural environments in which they are created. In an AI-driven world, where systems generate, sort, and analyze massive volumes of data automatically, understanding that broader context becomes essential. AI can process content at scale, but without contextual grounding, it risks misinterpreting records or reinforcing surface-level patterns. I see AI as both an opportunity and a challenge for appraisal theory. On one hand, AI tools can help identify patterns across enormous bureaucratic systems, making macro-level analysis more feasible. They can cluster records, detect trends, and even suggest appraisal priorities. This could strengthen Cook’s top-down approach by giving archivists analytical support in mapping institutional functions. On the other hand, AI systems are trained on existing data, which may already reflect institutional biases and power imbalances. If archivists rely too heavily on AI-driven selection, we risk automating those biases. Cook stresses that archivists must actively and consciously shape the archival record. AI does not remove that responsibility—it arguably heightens it. I cannot simply defer judgment to an algorithm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Abdelmageed et al. investigate age-related changes in the subcellular localization of DNA polymerase kappa (POLK) in the brains of mice. POLK has been actively investigated for its role in translesion DNA synthesis and involvement in other DNA repair pathways in proliferating cells, very little is known about POLK in a tissue-specific context, let alone in post-mitotic cells. The authors investigated POLK subcellular distribution in the brains of young, middle-aged, and old mice via immunoblotting of fractioned tissue extracts and immunofluorescence (IF). Immunoblotting revealed a progressive decrease in the abundance of nuclear POLK, while cytoplasmic POLK levels concomitantly increased. Similar findings were present when IF was performed on brain sections. Further, IF studies of the cingulate cortex (Cg1), the motor cortex (M1, M2), and the somatosensory (S1) cortical regions all showed an age-related decline in nuclear POLK. Nuclear speckles of POLK decrease in each region, meanwhile, the number of cytoplasmic POLK granules decreases in all four regions, but granule size is increasing. The authors report similar findings for REV1, another Y-family DNA polymerase.

      The authors then investigate the colocalization of POLK with other DNA damage response (DDR) proteins in either pyramidal neurons or inhibitory interneurons. At 18 months of age, DNA damage marker gH2AX demonstrated colocalization with nuclear POLK, while strong colocalization of POLK and 8-oxo-dG was present in geriatric mice. The authors find that cytoplasmic POLK granules colocalize with stress granule marker G3BP1, suggesting that the accumulated POLK ends up in the lysosome.

      Brain regions were further stained to identify POLK patterns in NeuN+ neurons, GABAergic neurons, and other non-neuronal cell types present in the cortex. Microglia associated with pyramidal neurons or inhibitory interneurons were found to have a higher abundance of cytoplasmic POLK. The authors also report that POLK localization can be regulated by neuronal activity induced by Kainic acid treatment. Lastly, the authors suggest that POLK could serve as an aging clock for brain tissue, but POLK deserves further characterization and correlation to functional changes before being considered as a biomarker.

      Strengths:

      Investigation of TLS polymerases in specific tissues and in post-mitotic cells is largely understudied. The potential changes in sub-cellular localization of POLK and potentially other TLS polymerases open up many questions about DNA repair and damage tolerance in the brain and how it can change with age.

      Weaknesses:

      The work is quite novel and interesting, and the authors do suggest some potentially interesting roles for POLK in the brain, but these are in and of themselves a bit speculative. The majority of the findings of this paper draw upon findings from POLK antibody and its presumed specificity for POLK. However, this antibody has not been fully validated and needs further work. Further validation experiments using Polk-deficient or knocked-down cells to investigate antibody specificity for both immunoblotting and immunofluorescence should be performed. More mechanistic investigation is needed before POLK could be considered as a brain aging clock.

      We are thankful for the overall enthusiasm and positive comments.

      (a) Concern over POLK antibody characterization in mouse:

      We performed siRNA and shRNA knock downs in mouse primary cortical neurons as well as efficiently transfectable murine lines like 4T1 and Neuro-2A showing knock down of 99kDa and 120kDa bands recognized by sc-166667 anti-POLK antibody (exact figure number Figure 1 and S1). We show that in IF sc-166667 and A12052 (Figure S1G) shows similar immunostaining patterns and we used sc-166667 in all reported figures and western blots.

      (b) More mechanistic investigation is needed before POLK could be considered as a brain aging clock:

      We sincerely appreciate the valuable suggestion. We agree as a terminal assay POLK nucleo-cytoplasmic status is not practical for longitudinal studies. However, we believe it may serve an investigative/correlative endogenous signal for determining tissue age, that may be useful to "date" brain sections, since not many such cell biological markers exist. We have added clarification texts to address this.

      Reviewer #2 (Public review):

      Summary:

      Abdelmageed et al., demonstrate POLK expression in nervous tissue and focus mainly on neurons. Here they describe an exciting age-dependent change in POLK subcellular localization, from the nucleus in young tissue to the cytoplasm in old tissue. They argue that the cytosolic POLK is associated with stress granules. They also investigate the cell-type specific expression of POLK, and quantitate expression changes induced by cell-autonomous (activity) and cell nonautonomous (microglia) factors.

      I think it is an interesting report but requires a few more experiments to support their findings in the latter half of the paper. Additionally, a more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol, what is POLK doing in the cytosol, and what is it interacting with; would greatly increase the impact of this report. However, additional mechanistic experiments are mostly not needed to support much of the currently presented results, again, it would simply increase the impact.

      (a) Concern on more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol:

      We sincerely appreciate the reviewer’s enthusiasm and valuable guidance in helping us better understand the mechanism of nuclear-cytoplasmic POLK dynamics. Previously, we developed a modified aniPOND (accelerated native isolation of proteins on nascent DNA) protocol, which we termed iPoKD-MS (isolation of proteins on Pol kappa synthesized DNA followed by mass spectrometry), to capture proteins bound to nascent DNA synthesized by POLK in human cell lines (bioRxiv https://www.biorxiv.org/content/10.1101/2022.10.27.513845v3). In this dataset, we identified potential candidates that may regulate nuclear/cytoplasmic POLK dynamics. These candidates are currently undergoing validation in human cell lines, and we are preparing a manuscript on these findings. Among these, some candidates, including previously identified proteins such as exportin and importin (Temprine et al., 2020, PMID: 32345725), are being explored further as potential POLK nuclear/cytoplasmic shuttles. We are also conducting tests on these candidates in mouse cortical primary neurons to assess their role in POLK dynamics. In the revised version of the manuscript, we have included a discussion of our current understanding.

      (b) Question on “… what is POLK doing in the cytosol, and what is it interacting with …”: Our data so far indicate that POLK accumulates in stress granules and lysosomes. We are very grateful for the reviewer’s insightful suggestions and will make every effort to incorporate them in the revised manuscript. We characterized POLK accumulation in the cytoplasm using six additional endo-lysosomal markers, as recommended by the reviewer. This data is now part of entirely new Figure 3.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors show that DNA polymerase kappa POLK relocalizes in the cytoplasm as granules with age in mice. The reduction of nuclear POLK in old brains is congruent with an increase in DNA damage markers. The cytoplasmic granules colocalize with stress granules and endo-lysosome. The study proposes that protein localization of POLK could be used to determine the biological age of brain tissue sections.

      Strengths:

      Very few studies focus on the POLK protein in the peripheral nervous system (PNS). The microscopy approach used here is also very relevant: it allows the authors to highlight a radical change in POLK localization (nuclear versus cytoplasmic) depending on the age of the neurons. 

      The conclusions of the study are strong. Several types of neurons are compared, the colocalization with several proteins from the NHEJ and BER repair pathways is tested, and microscopy images are systematically quantified.

      Weaknesses:

      The authors do not discuss the physical nature of POLK granules. There is a large field of research dedicated to the nature and function of condensates: in particular numerous studies have shown that some condensates but not all exhibit liquid-like properties (https://www.nature.com/articles/nrm.2017.7, https://pubmed.ncbi.nlm.nih.gov/33510441/ https://www.mdpi.com/2073-4425/13/10/1846). The change of physical properties of condensates is particularly important in cells undergoing stress and during aging. The authors should discuss this literature.

      We highly appreciate the reviewer bringing up the context of biomolecular condensates. Our iPoKD-MS data referenced above suggests candidates from various biomolecular condensates that we are currently investigating. We appreciate the reviewer providing important literature cited these articles in text and potential biomolecular condensates are discussed in the revised version. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is quite novel and interesting, and the authors do suggest some potentially interesting roles for POLK in the brain, but these are in of themselves a bit speculative. The majority of the findings of this paper rely upon the POLK antibody and its specificity for POLK, which is not fully characterized and needs further work (validation of antibodies using immunoblots of Polk KO cells or siRNA KD of POLK in murine cells) to provide confidence in the authors' findings. 

      Points

      siRNA knockdown of Polk in primary neurons showed a dramatic reduction in signal by IF even though qPCR analysis showed a reduction of only ~35% at the transcript level. Typically many DNA repair genes need to be knocked down by 80% or more to see discernable differences at the protein level. siRNA knockdown in a murine cell line (MEFs, neurons, or some other easily transfectable cell type) needs to be performed with immunoblotting with whole cell and fractionated (nuclear/cytoplasmic) lysates in order to better validate the anti-POLK antibodies and which bands that are visualized during immunoblotting are specific to POLK.

      We performed siRNA and shRNA knock downs in mouse primary cortical neurons as well as efficiently transfectable murine lines like 4T1 and Neuro-2A showing knock down of 99kDa and 120kDa bands recognized by sc-166667 anti-POLK antibody (exact figure number Figure 1 and S1). We show that in IF sc-166667 and A12052 (Figure S1G) shows similar immunostaining patterns and we used sc-166667 in all reported figures and western blots.

      Figure 1B and C, it is not clear which antibody(ies) are used for the immunoblotting of nuclear and cytoplasmic fractions and for a blot with whole tissue lysates. Please place the antibody vendor or clone next to the corresponding blot or describe it in the figure legend. Bands of varying sizes are present in 1B (and Figure S1) but only a band at 99 kDa was shown in 1C. Because there are no bands of equivalent size present in the nuclear and cytoplasmic fractions in Figure 1B, please describe or denote which bands were used for quantification purposes for nuclear and cytoplasmic POLK.

      This has been clarified by using only one antibody throughout the manuscript sc-166667. We observed in whole cell lysate an intense ~99kDa and a faint ~120kDa band, which gets intense in nuclear fraction and is absent in cytoplasmic fraction. We have noted this in multiple human cell lines and hiPSC-derived neurons, which is our ongoing work. We do not know yet if the ~120kDa is a modification or isoform of POLK. We have hints from our proteomics data that it may be SUMOylated or ubiquitinylated or other post translational modifications. We added this in the discussion section.

      Figure 1I, is there a quantification beyond just the representative image? There is no green staining pattern outside the cytoplasm in the 1-month-old M1 images that is present in all the other images in the panel.

      Fig 1I is now Fig S1G in the revised manuscript. Since REV1 and POLH were not central to the study that focused on POLK, they were meant to be exploratory data panels and as such we did not quantify beyond the qualitative evaluation, which broadly resembled POLK’s disposition with age. We have noted there are some sample to sample variability in the background signal. In general, outside the cytoplasm as subcellularly segmented by fluorescent nissl expression, tends to be variable by brain areas but also higher in older brains

      "Association with PRKDC further suggests POLK's role in the "gap-filling" step in the NHEJ repair pathway in neurons." There is no strong evidence in the literature for mammalian POLK playing a role in NHEJ. Some description of a role in HR has been described, however. The reference regarding the iPoKD-MS data set that provides evidence of POLK associating with BER and NHEJ factors is listed as Paul, 2022 but is in the reference list as Shilpi Paul 2022.

      We removed this speculative statement and citation fixed.

      Figure 4A, what is the age of the mouse for the representative images?

      19 months and now mentioned in the figure legend

      Figure 4C, Could the data from the different ages be plotted side by side to better evaluate the differences for each cell type/region?

      Data is plotted side by side

      Why was the one-month time point chosen as this could still represent the developing and not mature murine brain? 

      Reviewer correctly noted that a 1 month brain is still developing, but mostly from the behavioral and circuit maturation standpoint. However, from cell division and neurogenesis perspective, that is considered to be complete by first postnatal month, with neuron production thereafter largely restricted to specialized adult niches in the dentate gyrus and subventricular zone–olfactory bulb pathway; these adult neurogenic stem cells are embryonically derived and are regulated in ways that are distinct from the early, expansionary developmental waves of neurogenesis. In our study we performed our measurements in the cortical areas only. (Caviness et al., 1995, PMID: 7482802; Ansorg et al., 2012, PMID: 22564330; Ming & Song, 2011, PMID: 21609825; Bond et al., 2015, PMID: 26431181; Bond et al., 2021, PMID: 33706926; Bartkowska et al., 2022, PMID: 36078144). Also, in Figure 6A it was incorrectly mentioned to be just 1month, we rechecked our metadata and noted that young brains were comprised of 1 and 2 month old brains and now it has been corrected.

      Furthermore, can the authors describe which sex of mice was used in these experiments and the justification if a single sex was used? If both sexes were used, were there any dimorphic differences in POLK localization patterns?

      This is an important aspect, but in the beginning to keep mice numbers within manageable limits, we were focusing more on the age component. While both males and female brains were assayed but due to uneven sample distribution between sexes, we could not estimate if there were any statistically significant sexual dimorphic differences in IN, PN and NNs. Future studies will investigate the sex component as a function of age.

      The suggestion of POLK as a brain aging clock may be a bit premature as the functional and behavioral consequences of cytoplasmic POLK sequestration are not fully known. Furthermore, investigation of POLK levels in other genetic models of neurodegeneration or with gerotherapeutics would be needed to establish if the POLK brain clock is responsive to changes that shift brain aging. Lastly, this clock may be impractical and not useful for longitudinal studies due to the terminal nature of assessing POLK levels.

      We agree as a terminal assay POLK nucleo-cytoplasmic status is not practical for longitudinal studies. However, we believe it may serve an investigative/correlative endogenous signal for determining tissue age, that may be useful to "date" brain sections, since not many such cell biological markers exist. We have added clarification text.

      Some discussion of the Polk-null mice is warranted, as they only have a slightly shortened lifespan, and any disease phenotypes were not reported. This stands in contrast to other DNA repair-deficient mice that mimic premature aging and show behavioral and motor deficits. This calls into question the role of POLK in brain aging.

      Discussion statements on Polk-null mice has been added.

      Please correct the catalog number for the SCBT anti-POLK antibody to sc-166667

      Typographical error has been corrected

      Reviewer #2 (Recommendations for the authors):

      Results:

      Figure by figure 

      (1) A progressive age-associated shift in subcellular localization of POLK The authors state that POLK has not been studied in nervous tissue before and they want to see if it is expressed, and if it changes subcellular location as a function of age. The authors argue age = stress like that seen in previous models using genotoxic agents and cancer cells. Indeed, POLK seems to convincingly change subcellular location from the nucleus to larger cytosolic puncta. 

      (2) Nuclear POLK co-localizes with DNA damage response and repair proteins This was a difficult dataset for me to decipher. To me, it appears as though POLK colocalizes with these examined proteins in the CYTOSOL, not the nucleus. Especially, in the oldest mice.

      We added in the discussion that DNA repair proteins were observed to be present in the cytoplasm and biomolecular condensates citing relevant reviews and primary references.

      (3) POLK in the cytoplasm is associated with stress granules and lysosomes in old brains LAMP1 has some issues as a lysosome marker. The authors even state it can be on endosomes. It would be nice to use a marker for mature lysosomes, some fluorescent reporter that is activated only by lysosomal proteases or pH. It is also of interest if POLK is localized to the membrane or the inside of these structures. The authors have access to an airyscan which is sufficient to examine luminal vs membrane localization on larger organelles like lysosomes.

      We thank the reviewer for pushing us to investigate the nature of cytoplasmic POLK in endo-lysosomal compartments. We have now added a full-page figure on the cell biological results from six different markers, subset (Cathepsin B and D) are known to present in the lumens of endo-lysosomes, in Figure 3. Further high-resolution membrane vs lumen was not pursued, which is perhaps better suited in cultured neurons rather than thick fixed tissues.

      (4) Differentially altered POLK subcellular expression amongst excitatory, inhibitory, and nonneuronal cells in the cortex.

      This seems fine. I don't see anything wrong with the author's statement that there is more POLK in neurons vs non-neuronal cells. 

      (5) Microglia associated with IN and PN have significantly higher levels of cytoplasmic POLK I don't see really any convincing evidence of the author's claim here. They find a difference at early-old age, but not at old-old, or other ages. This is explained by "However, this effect is lost in late-old age (Figure 5D), likely due to the MG-mediated removal of the INs.". But no trend being observed, no experiment to show sufficiency, and no experiment to uncover a directional relationship; this is a tough claim to stand by.

      Changes made in text to reflect speculative nature of this observation

      (6) Subcellular localization of POLK is regulated by neuronal activity

      Interesting and fairly difficult experiment. Can the authors talk more about what these values mean? I am confused as to why there is a decline in nuclear puncta at 80 min. Also, why are POLK counts in 6c similar at baseline between young and early-old? In Figures 5 and 6 I also worry about statistical analysis. Are all assumptions checked to use t-tests? Why not always use a test that has fewer assumptions?

      We have explained in the text the artificial nature of few hour long acute slice preparations is very different and inherently a stressful environment, especially for the old brains, compared to the vascular perfused PFA fixed brain tissues tested between young and old ages.

      We don’t have a proper explanation for the initial dip in nuclear puncta in both young and old brains at 80min of very similar magnitude. It could be a separate biological phenomenon that occurs at much shorter time scales that would not otherwise be captured in a fixed tissue assay and needs careful investigation using live tissue fluorescence imaging that is beyond the scope of this manuscript.

      We apologize for the typographical error in the figure legend. We rechecked our R code and the tests were all Wilcoxon rank-sum (Mann–Whitney U) two-sided nonparametric.

      Figure 6B & E had absurdly small p values due to large sample numbers. So, we implemented random sampling of 100 cells repeating for 200 times and presented the distribution of p values and Cohen’s d in the supplement and reported the median p value and Cohen’s in the main plot.

      (7) POLK as an endogenous "aging clock" for brain tissue

      Trainable model. What are the criteria for the model, and how does it work? The cutoffs it uses to classify each age group might be interesting in that the model may have identified a trait the researchers were unaware of. Otherwise, it is not especially useful. Maybe as an independent 'blind' analysis of the data?

      We have added a better description of the models, assumptions and how two different unsupervised approaches converge on the same set of features with high AUROCs.

      Minor questions:

      The cartoons (1a, 2a-b, 5a, 6a) help a lot. However, I still had to work a bit to understand some of the graphs (e.g., 5d, 6b-e, fig 7). Is there a simpler way to present them? Maybe simply additional labelling? I'm not sure.

      A more thorough discussion of statistical tests is warranted I think. I am not very clear why some were chosen (t-test vs nonparametric with fewer assumptions). Infinitesimally small p values also make me think maybe incorrect tests were done or no power analysis was performed beforehand. A fix for this is just discussing what went into the testing methods and why they were chosen.

      Statistical analysis for Fig2 (using Generalized Estimating Equations), and Fig6 (with random repeated subsampling; method explained in text, figure legend updated and supplementary data on the distribution of p values and cohen’s d are added) to address the very small p values. Descriptions rewritten in relevant text.

      In the absence of further mechanistic experiments, it would still be interesting to hear what the authors think is going on and what the significance of this altered subcellular location means. How do the authors think this is occurring? I think they are arguing that cytosolic localization of POLK is 100% detrimental to the neuron. ("The reduction of nuclear POLK in old brains is congruent with an increase in DNA damage markers") Do they have any idea what the 'bug' is in the POLK system then?

      Statements in the discussion has been added.

      Reviewer #3 (Recommendations for the authors):

      POLK is detected as small " as small "speckles" inside the nucleus at a young age (1-2 months) and larger "granules" can be seen in the cytoplasm at progressively older time points (>9 months). In the nucleus, is POLK bound to DNA? In the cytoplasm, how are the POLK molecules organized: are they bound to a substrate or are they just organized as a proteins condensate without DNA?

      In human U2OS cell line Dnase1 treatment leads to loss of POLK from the nucleus as well as its activity as reported in Fig5 of Paul, S. et. al. 2023 bioRxiv. While we haven’t reproduced these results in mouse primary neurons, we anticipate a similar situation which will be tested in the future. We have addressed limited aspects of the POLK in the cytoplasm in all new Fig3 with six endo-lysosomal markers, and added text.

      When POLK proteins accumulate in the cytoplasm in aging cells, do they also repair condensates in the cytoplasm? What is the function of cytoplasmic POLK granules? More generally, is it known if other granules or foci, such as repair foci are found in the cytoplasms in aging cells, or in cells under stress?

      Six markers for endo-lysosomes were tested to characterize the cytoplasmic granules now shown in Fig3.

      While the authors quantify the number and sizes of the POLK signal, they don't discuss their physical nature. Some membrane-less condensates exhibit liquid-like properties, such as stress granules, P-bodies, or in the nucleus some repair condensates. In some diseased tissues, some condensates lose their liquid properties and become solid-like. Is it known if POLK condensates behave like liquid condensates or they are simply formed by bound molecules on DNA? Since they are larger and fewer in the cytoplasm, is it because several small puncta fused together to form a larger one? It would be worthwhile to discuss these points.

      Discussion statements on the nature of condensates in context of the POLK cytoplasmic signal has been added.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled, "Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer's Disease", is about a study of sleep/wake phenomena in a knockin mouse strain carrying "three mutations in the human App gene associated with elevated risk for early onset AD". Traditional, in-depth characterization of sleep/wake states, EEG parameters, and response to sleep loss are employed to provide evidence, "supporting the use of this strain as a model to investigate interventions that mitigate AD burden during early disease stages". The sleep/wake findings of earlier studies (especially Maezono et al., 2020, as noted by the authors) were extended by several important, genotype-related observations, including age-related hyperactivity onset that is typically associated with increased arousal, a normal response to loss of sleep and to multiple sleep latency testing, and a stronger AD-like phenotype in females. The authors conclude that the AppNL-G-F mice demonstrate many of the human AD prodromal symptoms and suggest that this strain may serve as a model for prodromal AD in humans, confirming the earlier results and conclusions of Maezono et al. Finally, based on state bout frequency and duration analyses, it is suggested that the AppNL-G-F mice may develop disruptions in mechanism(s) involved in state transition.

      Strengths:

      The study appears to have been, technically, rigorously conducted with high quality, in-depth traditional assessment of both state and EEG characteristics, with the concordant addition of activity and temperature. The major strengths of this study derive from observations that the AppNL-G-F mice: (1) are more hyperactive in association with decreased transitions between states; (2) maintain a normal response to sleep deprivation and have normal MSLT results; and (3) display a sex specific, "stronger" insomnia-like effect of the knockin in females.

      Weaknesses:

      The weaknesses stem from the study's impact being limited due to its being largely confirmatory of the Maezono et al. study, with advances of importance to a potentially more focused field. Further, the authors conclude that AppNL-G-F mice have disrupted mechanism(s) responsible for state transition; however, these were not directly examined. The rationale for this conclusion is stated by the authors as based on the observations that bouts of both W and NREM tend to be longer in duration and decreased in frequency in AppNL-G-F mice. Although altered mechanism(s) of state transition (it is not clear what mechanisms are referenced here) cannot be ruled out, other explanations might be considered. For example, increased arousal in association with hyperactivity would be expected to result in increased duration of W bouts during the active phase. This would also predictably result in greater sleep pressure that is typically associated with more consolidated NREM bouts, consistent with the observations of bout duration and frequency.

      Reviewer 1 succinctly summarizes the advances of this study beyond the ground-breaking Maezono et al (2020) study of this “humanized” mouse model exhibiting amyloid deposition. Whereas Maezono et al. conducted sleep/wake studies on male App<sup>NL-G-F</sup> mice at 6 and 12 months of age, we had the unusual opportunity to study both sexes of homozygous App<sup>NL-G-F</sup> mice and WT littermates at 14-18 months of age and to conduct a longitudinal assessment of many of the same individuals at 18-22 months. In addition to baseline sleep/wake and EEG spectral analyses, we (1) measured subcutaneous body temperature and activity to obtain a broader picture of the physiology and behavior of this strain at advanced ages; (2) assessed baseline sleepiness in this strain using the murine version of the clinically-relevant Multiple Sleep Latency Test (MSLT); (3) evaluated the response of App<sup>NL-G-F</sup> mice and WT littermates to a perturbation of the sleep homeostat; (4) compared the sleep/wake characteristics of male vs. female App<sup>NL-G-F</sup> mice at 18-22 months and, (5) to assess the stability of the phenotypes, analyzed these data over a continuous 14-d recording rather than the conventional 24h recordings typical of most sleep/wake studies including Maezono et al. We found that a long wake/short sleep phenotype was characteristic of homozygous App<sup>NL-G-F</sup> mice at these advanced ages which is also evident in the Maezono et al. (2020) study at 12 months of age (but not at 6 months), although the authors do not comment on this phenotype and instead focus on the reduced REM sleep which is particularly evident in female App<sup>NL-G-F</sup> mice in our study. Remarkably, despite being awake ~20% longer per day, we find that App<sup>NL-G-F</sup> mice are no sleepier than WT mice as determined by the MSLT and that their sleep homeostat is intact when challenged by 6-h sleep deprivation. At both advanced ages, the long wake/short sleep phenotype is due primarily to longer Wake bouts and shorter bouts of both NREM and REM sleep during the dark phase. Moreover, hyperactivity develops in older in App<sup>NL-G-F</sup> mice, particularly females, which contributes to this phenotype. We agree with Reviewer 1 that “hyperactivity would be expected to result in increased duration of W bouts during the active phase” and that this could result in more consolidated NREM bouts and we will modify the manuscript to discuss this alternative. However, the suggestion of greater sleep pressure is not borne out by the MSLT studies as we did not observe the shorter sleep latencies and increased sleep during the nap opportunities on the MSLT that we have observed in other mouse strains. Moreover, due to their short sleep phenotype, App<sup>NL-G-F</sup> mice would be entering the sleep deprivation study with a greater sleep debt than WT mice, yet we did not observe greater EEG Slow Wave Activity in this strain during recovery from sleep deprivation. Thus, we have suggested that App<sup>NL-G-F</sup> mice are unable to transition from Wake to sleep as readily as their WT littermates. Our observations summarized above set the stage for subsequent mechanistic studies in aged App<sup>NL-G-F</sup> mice, although realistically, mice of this age and genotype are a rare commodity.

      Reviewer #2 (Public review):

      Summary:

      The authors have used a knock-in mouse model to explore late-in-life amyloid effects on sleep. This is an excellent model as the mutated genes are regulated by the endogenous promoter system. The sleep study techniques and statistical analyses are also first-rate.

      The group finds an age-dependent increase in motor activity in advanced age in the NLGF homozygous knock-in mice (NLGF), with a parallel age-dependent increase in body temperature, both effects predominate in the dark period. Interestingly, the sleep patterns do not quite follow the sleep changes. Wake time is increased in NLGF mice, and there is no progression in increased wake over time. NREMS and REM sleep are both reduced, and there is no progression. Sleep-wake effects, however, show a robust light:dark effect with larger effects in the dark period. These findings support distinct effects of this mutation on activity and temperature and on sleep. This is the first description of the temporal pattern of these effects. NLGF mice show wake stability (longer bout durations in the dark period (their active period) and fewer brief arousals from sleep. Sleep homeostasis across the lights-on period is normal. Wake power spectral density is unaffected in NLGF mice at either age. Only REM power spectra are affected, with NLGF mice showing less theta and more delta. There are interesting sex differences, with females showing no gene difference in wake bout number, while males show a gene effect. Similarly, gene effects on NREM bout number seem larger in males than in females. Although there was no difference in homeostatic response, there was normalization of sleep-wake activity after sleep deprivation.

      Strengths:

      Approach (model extent of sleep phenotyping), analysis.

      Weaknesses:

      The weaknesses are summarized below and are viewed as "addressable".

      (1) The term insomnia. Insomnia is defined as a subjective dissatisfaction with sleep, which cannot be ascertained in a mouse model. The findings across baseline sleep in NLGF mice support increased wake consolidation in the active period. The predominant sleep period (lights on) is largely unaffected, and the active period (lights off) shows increased activity and increased wake with longer bouts. There is a fantastic clue where NLGF effects are consistent with increased hypocretinergic (orexinergic) neuron activity in the dark period, and/or increased drive to hypocretin neurons from PVH.

      (2) Sleep-wake transitions are impaired: This should not be termed an impairment. It could actually be beneficial to have greater state stability, especially wake stability in the dark or active period. There is reduced sleep in the model that can be normalized by short-term sleep loss. It is fascinating that recovery sleep normalized sleep in the NLGF in the immediate lights-on and light-off period. This is a key finding.

      Reviewer 2 suggests a provocative hypothesis to test. Curiously, although a recent Science paper suggests that hyperexcitable hypocretin/orexin neurons in aging mice results in greater sleep/wake fragmentation, hyperexcitability of this system could result in hyperactivity and longer wake bouts in aged App<sup>NL-G-F</sup> mice.

      Reviewer #3 (Public review):

      Summary:

      In this study, Tisdale et al. studied the sleep/wake patterns in the biological mouse model of Alzheimer's disease. The results in this study, together with the established literature on the relationship of sleep and Alzheimer's disease progression, guided the authors to propose this mouse model for the mechanistic understanding of sleep states that translates to Alzheimer's disease patients. However, the manuscript currently suffers from a disconnect between the physiological data and the mechanistic interpretations. Specifically, the claim of "impaired transitions" is logically at odds with the observed increase in wake-state stability or possible hyperactivity. Additionally, the description of the methods, the quantification, and the figure presentation could be substantially improved. I detail some of my concerns below.

      Strengths:

      The selection of the knock-in model is a notable strength as it avoids the artifacts associated with APP overexpression and more closely mimics human pathology. The study utilizes continuous 14-day EEG recordings, providing a unique dataset for assessing chronic changes in arousal states. The assessment of sex as a biological variable identifies a more severe "insomniac-like" phenotype in females, which aligns with the higher prevalence and severity of Alzheimer's disease in women.

      Weaknesses:

      The study seems to lack a clear hypothesis-driven approach and relies mostly on explorative investigations. Moreover, lack of quantitative analytical methods as well as shaky logical conclusions, possibly not supported by data in its current form, leaves room for major improvement.

      Since this paper studied sleep states, the "Methods" section is quite unclear on what specific criteria were used to classify sleep states. There is no quantitative description of classifying sleep based on clear, reproducible procedures. There are many reasonably well-characterized sleep scoring systems used in rat electrophysiological literature, which could be useful here. The authors are generally expected to describe movement speed and/or EMG and/or EEG (theta/delta/gamma) criteria used to classify these epochs. The subjective (manual) nature of this procedure provides no verifiable validation of the accuracy and interpretability of the results.

      One of the bigger claims is that "state transition mechanism(s)" are impaired. However, Figure 7 shows that model mice exhibit significantly more long wake bouts (>260s) and fewer short wake bouts (<60s). Logically, an "impaired switch" (the flip-flop model, Saper et al., 2010) results in state fragmentation. The data here show the opposite: the wake state has become too stable. This suggests the primary defect is not in the transition mechanism itself, but possibly in a pathological increase in arousal drive (hyper-arousal), likely linked to the dark-phase hyperactivity shown in Figures 4 and 5. Also, a point to note is that this finding is not new.

      Figure 3 heatmaps lack color bars and units. Spectral power must be quantitatively defined and methods well-explained in the Methods section. Without these, the reader cannot discern if the "reduced power" in females is a global suppression of signal or a frequency-specific shift. Additionally, the representative example used to claim shorter sleep bouts lacks the statistical weight required for a major physiological conclusion. How does a cooler color (not clear what range and what the interpretation is) mean shorter sleep bout in female mice? The authors should clearly mark the frequency ranges that support their claims. In this figure, there is a question mark following the theta/delta range. The authors should avoid speculation and state their claims based on facts. They should also add the theta and delta ranges in the plot, such that readers can draw their own conclusions.

      Figure 8 and the MSLT results show that model mice are "no sleepier than WT mice" and have a functional homeostatic rebound. This presents a logical flaw in the "insomnia" narrative. True insomnia in AD patients typically involves a failure of the homeostatic process or a debilitating accumulation of sleep debt. If these mice do not show increased sleepiness (shorter latency) despite ~19% less sleep, the authors might be describing a "reduced need" for sleep or a "hyper-aroused" state, possibly not a clinical insomnia phenotype.

      In Figure 9, LFP power shown and compared in percentages is problematic, as LFP power distribution is known to be skewed (follows power law). This is particularly problematic here because all the frequencies above ~20 Hz seem to be totally flattened or nonexistent, which makes this comparison of power severely limited and biased towards the relative frequency in the highly skewed portion of the LFP power spectrum, i.e., very low frequency ranges like delta, theta, and possibly beta. This ignores low, mid, and high gamma as well as ripple band frequencies. NREM sleep is known to have relatively greater ripple band (100-250 Hz) power bursts in hippocampal regions, and REM sleep is known to have synchronous theta-gamma relationships.

      We agree with the reviewer that the “Classification of arousal states” section was missing the key description of how we scored the recordings into arousal states based on EEG, EMG and locomotor activity; this was an oversight as the corresponding text exists in all our previous sleep/wake studies published over several decades. Reviewer 1 also points out the alternative interpretation that “the wake state has become too stable.” However, I think we are using different words to say the same thing: that the transition from wake to sleep is impaired whether it is due to hyperarousal or to a defect in the flip/flop switch that results in greater Wake stability. We will revise Fig 3 (Reviewer 2 suggests combining with Fig 14) but note that the X-axis is labelled 0-25 Hz and that this figure was intended to be descriptive -- illustrating how unusual the female App<sup>NL-G-F</sup> mice are relative to WT -- rather than a quantitative analysis of spectral power as in Fig. 14. Both Reviewer 2 and 3 suggest that we are using “insomnia” incorrectly, which we have simply used to describe less sleep per 24h period. Reviewer 2 states that “Insomnia is defined as a subjective dissatisfaction with sleep” and Reviewer 3 suggests a narrow definition of insomnia as due only to “a failure of the homeostatic process or a debilitating accumulation of sleep debt.” In a revised manuscript, we will define “insomnia” as an operational term to succinctly mean “less sleep”. Regarding the problem of presenting spectral power in percentages, we completely agree with the reviewer. However, we intentionally presented spectral power density, a measure of relative power, as in Figure 3A and 3B of Maezono et al. (2020). At the risk of making Fig. 9 even more busy, we will revise Fig. 9 to add labels for all Y-axes.

      In addition to a revised Fig. 9, in the revised manuscript, we will reformat Tables 1-3, Figs. S1 and S2 for legibility and correct an error in Fig. 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (BRP) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the N-terminus of BRP, exactly where BRP is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of BRP clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of BRP expression levels on site, using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that, in a specific time frame after conditioning, the new analysis solidly revealed changes in BRP levels at specific synapses that are associated with aversive learning.

      Strengths:

      Expression of splitGFP bound to BRP enables intensity analysis of BRP expression levels as exactly one GFP molecule is expressed per BRP. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus the BRP label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and convincingly show that BRP assessment can be achieved with such densely packed active zones. The result that BRP levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise convincing, and strongly suggests that the tool and pipeline developed here can be used in an in vivo context.

      Weaknesses:

      Although BRP is an important scaffold protein and its expression levels were associated with function and plasticity, I am still somewhat reluctant to accept that synapse structure profiling can be inferred from only assessing BRP expression levels and BRP cluster volume. Also, is it guaranteed that synaptic plasticity is not impaired by the large GFP fluorophore? Could the GFP10 construct that is tagged to BRP in all BRP-expressing cells, independent of GAL4, possibly hamper neuronal function? Is it certain that only active zones are labeled? I do see that plastic changes are made visible in this study after an associative learning experiment with BRP intensity and cluster volume as read-out, but I would be reassured by direct measurement of synaptic plasticity with splitGFP directly connected to BRP, maybe at a different synapse that is more accessible.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that Brp is an important, but not the only player in the active zone. We have included new data to demonstrate that split-GFP tagging does not severely affect the localization and plasticity of Brp and the function of synapses by showing: (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs; (3) activity-dependent Brp remodeling in R8 photoreceptors; (4) no defect in memory performance when labeling Brp::rGFP in KCs; These four lines of additional evidence further corroborate our approach to characterize endogenous Brp as a proxy of active zone structure.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters as presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process, allowing for the comparison of synapse profiles within single cells, cell types, MB compartments, and between different individuals. The aim is to analyse in more detail neuronal connectivity and circuits in this centre of associative learning. These are notoriously difficult to investigate due to the density of cells and structures within a cell. The authors detect and characterize cell-type-specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and detect consequent AZ re-organisation.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will be the entry point for many future analyses of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logics. Therefore, this approach is of high importance for the scientific community and a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are, in many aspects, well-supported by the data presented. To further support the key findings of the manuscript, additional controls, comments, and possibly broader functional analysis would be helpful. In particular:

      (1) All experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10).The Materials and Methods section does not contain any cloning strategy (gRNA, primer, PCR/sequencing validation, exact position of tag insertion, etc.) and only refers to a bioRxiv publication. It might be helpful to add a Materials and Methods section (at least for the BRP:GFP11 line). Additionally, as this is an on locus insertion the in BRP-ORF, it needs a general validation of this line, including controls (Western Blot and correlative antibody staining against BRP) showing that overall BRP expression is not compromised due to the GFP insertion and localizes as BRP in wild type flies, that flies are viable, have no defects in locomotion and learning and memory formation and MB morphology is not affected compared to wild type animals.

      We thank the reviewer for suggesting these important validations. We included details of the design of the construct and insertion site to the Methods section, performed several new experiments to validate the split-GFP tagging of Brp, and present the data in the revision.

      First, to examine whether the transcription of the brp gene is unaffected by the insertion of GFP<sub>11</sub>, we conducted qRT-PCR to compare the brp mRNA levels between brp::GFP<sub>11</sub>, UAS-GFP1-10 and UAS-GFP1-10 and found no difference (Figure 1 - figure supplement 1A).

      To further verify the effect of GFP<sub>11</sub> tagging at the protein level, we performed anti-Brp (nc82) immunohistochemistry of brains where GFP is reconstituted pan-neuronally. We found unaltered neuropile localization of nc82 signals (Figure 1 - figure supplement 1C). In presynaptic terminals of the mushroom body calyx, we found integration of Brp::rGFP to nc82 accumulation (Figure 1D). We performed super-resolution microscopy to verify the configuration of Brp::rGFP and confirmed the donut-shape arrangement of Brp::rGFP in the terminals of motor neurons (see Wu, Eno et al., 2025 PLOS Biology), corroborating the nanoscopic assembly of Brp::rGFP at active zones (Kittel et al., 2006 Science).

      Furthermore, co-expression of RFP-tagged voltage-gated calcium channel alpha subunit Cacophony (Cac) and Brp::rGFP in PAM-γ5 dopaminergic neurons revealed strong presynaptic colocalization of their punctate clusters (Figure 1E), suggesting that rGFP tagging of Brp did not damage key protein assembly at active zones (Kawasaki et al., 2004 J Neuroscience; Kittel et al., Science).

      These lines of evidence suggest that the localization of endogenous Brp is barely affected by the C-terminal GFP<sub>11</sub> insertion or GFP reconstitution therewith. This is in line with a large body of studies confirming that the N-terminal region and coiled-coil domains, but not the C-terminal, region of Brp are necessary and sufficient for active zone localization (Fouquet et al., 2009 J Cell Biol; Oswald et al., 2010 J Cell Biol; Mosca and Luo, 2014 eLife; Kiragasi et al., 2017 Cell Rep; Akbergenova et al., 2018 eLife; Nieratschker et al., 2009 PLoS Genet; Johnson et al., 2009 PLoS Biol; Hallermann et al., 2010 J Neurosci). We nevertheless report homozygous lethality and found the decreased immunoreactive signals in flies carrying the GFP<sub>11</sub> insertion (Figure 1 - figure supplement 1B).

      For these reasons, we always use heterozygotes for all the experiments therefore there is no conspicuous defect in locomotion as reported in the original study (Wagh et al., 2005 Neuron). To functionally validate the heterozygotes, we measured the aversive olfactory memory performance of flies where GFP reconstitution was induced in Kenyon cells using R13F02-GAL4. We found that all these transgenes did not alter mushroom body morphology (Figure 7 - figure supplement 1) or memory performance as compared to wild-type flies (Figure 7 - figure supplement 2), suggesting the synapse function required for short-term memory formation is not affected by split-GFP tagging of Brp.

      (2) Several aspects of image acquisition and high-throughput quantification data analysis would benefit from a more detailed clarification.

      (a) For BRP cluster segmentation it is stated in the Materials and Methods state, that intensity threshold and noise tolerance were "set" - this setting has a large effect on the quantification, and it should be specified and setting criteria named and justified (if set manually (how and why) or automatically (to what)). Additionally, if Pyhton was used for "Nearest Neigbor" analysis, the code should be made available within this manuscript; otherwise, it is difficult to judge the quality of this quantification step.

      (b) To better evaluate the quality of both the imaging analysis and image presentation, it would be important to state, if presented and analysed images are deconvolved and if so, at least one proof of principle example of a comparison of original and deconvoluted file should be shown and quantified to show the impact of deconvolution on the output quality as this is central to this study.

      We thank the reviewer for suggesting these clarifications. We have included more description to the revised manuscript to clarify the setting of segmentation, which was manually adjusted to optimize the F-score (previous Figure 1D, now moved to Figure 1 -figure supplement 5). We have included the code used for analyzing nearest neighbor distance, AZ density and local Brp density in the revised manuscript (Supplementary file 1), together with a pre-processed sample data sheet (Supplementary file 2).

      Regarding image deconvolution, we have clarified the differential use of deconvolved and not-deconvolved images in the revised manuscript. We have also included a quantitative evaluation of Richardson-Lucy iterative deconvolution (Figure 1 - figure supplement 4). We used 20 iterations due to only marginal FWHM improvement beyond this point (Figure 1 - figure supplement 4).

      (3) The major part of this study focuses on the description and comparison of the divergent synapse parameters across cell-types in MB compartments, which is highly relevant and interesting. Yet it would be very interesting to connect this new method with functional aspects of the heterogeneous synapses. This is done in Figure 7 with an associative learning approach, which is, in part, not trivial to follow for the reader and would profit from a more comprehensive analysis.

      (a) It would be important for the understanding and validation of the learning induced changes, if not (only) a ratio (of AZ density/local intensity) would be presented, but both values on their own, especially to allow a comparison to the quoted, previous AZ remodelling analysis quantifying BRP intensities (ref. 17, 18). It should be elucidated in more detail why only the ratio was presented here.

      We thank the reviewer for the suggestion on the presentation of learning-induced Brp remodeling. The reported values in Figure 7C are the correlation coefficient of AZ density and local intensity in each compartment, but not the ratio. These results suggest that subcompartment-sized clusters of AZs with high Brp accumulation (Figure 6) undergo local structural remodeling upon associative learning (Figure 7). For clarity, we have included a schematic of this correlation and an example scatter plot to Figure 6. Unlike the previous studies (refs 17 and 18), we did not observe robust learning-dependent changes in the Brp intensity, possibly due to some confounding factors such as overall expression levels and conditioning protocols as described in the previous and following points, respectively.

      (b) The reason why a single instead of a dual odour conditioning was performed could be clarified and discussed (would that have the same effects?).

      (c) Additionally, "controls" for the unpaired values - that is, in flies receiving neither shock nor odour - it would help to evaluate the unpaired control values in the different MB compartments.

      We use single odor conditioning because it is the simplest way to examine the effect of odor-shock association by comparing the paired and unpaired group. Standard differential conditioning with two odors contains unpaired odor presentation (CS-) even in the ‘paired’ group. We now show that single-odor conditioning induces memory that lasts one day as in differential conditioning (Figure 7B; Tully and Quinn, J Comp Phys A 1985).

      (d) The temporal resolution of the effect is very interesting (Figure 7D), and at more time points, especially between 90 and 270 min, this might raise interesting results.

      The sampling time points after training was chosen based on approximately logarithmic intervals, as the memory decay is roughly exponential (Figure 7B). This transient remodeling is consistent with the previous studies reporting that the Brp plasticity was short-lived (Zhang et al., 2018 Neuron; Turrel et al., 2022 Current Biol).

      (e) Additionally, it would be very interesting and rewarding to have at least one additional assay, relating structure and function, e.g. on a molecular level by a correlative analysis of BRP and synaptic vesicles (by staining or co-expression of SV-protein markers) or calcium activity imaging or on a functional level by additional learning assays.

      We thank the reviewer for raising this important point. We have performed calcium imaging of KC presynaptic terminals to correlate the structure and function in another study (see Figure 2 in Wu, Eno et al., 2025 PLOS Biology for more detail). The basal presynaptic calcium pattern along the γ compartments is strikingly similar to the compartmental heterogeneity of Brp accumulation (see also Figure 2 in this study). Considering colocalization of other active-zone components, such as Cac (Figure 1E), we propose that the learning-induced remodeling of local Brp clusters should transiently modulate synaptic properties.

      As a response to other reviewers’ interest, we used Brp::rGFP to measure different forms of Brp-based structural plasticity upon constant light exposure in the photoreceptors and upon silencing rab3 in KCs. Since these experiments nicely reproduced the results of previous studies (Sugie et al., Neuron 2013; Graf et al., Neuron 2009), we believe the learning-induced plasticity of Brp clustering in KCs has a transient nature.

      Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4-expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment-specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appear strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation.

      We performed additional validation on the tool, including (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs (Figure 1D-E); 3) activity-dependent active zone remodeling in R8 photoreceptors (Figure 1F). These will be detailed in our point-by-point response below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors keep stating, they profile or assess synaptic structure by analyzing BRP localization, cluster volume, and intensity. However, I do not think that BRP cluster volume and intensity warrant an educated statement about presynaptic structure as a whole. I do not challenge the usefulness of BRP cluster analysis for synapse evaluation, but as there are so many more players involved in synaptic function, BRP analysis certainly cannot explain it all. This should at least be discussed.

      It is correct that Brp is not the only player in the active zone. We have included more discussion on the specific role of Brp (line 84 to 89) and other synaptic markers (line 250) and edited potentially misunderstanding text.

      (2) I do see that changes in BRP expression were observed following associative learning, but is it certain, that synaptic plasticity is generally unaffected by the large GFP fluorophore? BRP is grabbing onto other proteins, both with its C- and N-termini. As the GFP is right before the stop codon, it should be at the N-terminus. How far could BRP function be hampered by this? Is there still enough space for other proteins to interact?

      We thank the reviewer for sharing the concerns. We here provided three lines of evidence to demonstrate that the Brp assembly at active zones required for synaptic plasticity is unaffected by split-GFP tagging.

      First, we assessed olfactory memory of flies that have Brp::rGFP labeled in Kenyon cells and found the performance comparable to wild-type (Figure 7 - figure supplement 2), suggesting the Brp function required for olfactory memory (Knapek et al., J Neurosci 2011) is unaffected by split-GFP tagging.

      Second, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors. We have included this result into the revised manuscript (Figure 1F).

      To further validate protein interaction of Brp::rGFP, we focused on Rab3, as it was previously shown to control Brp allocation at active zones (Graf et al., 2009 Neuron). To this end, we silenced rab3 expression in Kenyon cells using RNAi and measured the intensity of Brp::rGFP clusters in γ Kenyon cells. As previously reported in the neuromuscular junction, we found that rab3 knock-down increased Brp::rGFP accumulation to the active zones, suggesting that Brp::rGFP represents the interaction with Rab3. We have included all the new data to the revised manuscript (Figure 1 - figure supplement 3).

      (3) It may well be that not only active-zone-associated BRP is labeled but possibly also BRP molecules elsewhere in the neuron. I would like to see more validation, e.g., the percentage of tagged endogenous BRP associated with other presynaptic proteins.

      To answer to what extent Brp::rGFP clusters represent active zones, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% of Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (4) Z-size is ~200 nm, while x/y pixel size is ~75 nm during acquisition. How far down does the resolution go after deconvolution?

      The Z-step was 370 nm and XY pixel size was 79 nm for image acquisition. We performed 20 iterations of Richarson-Lucy deconvolution using an empirical point spread function (PSF). We found that the effect of deconvolution on the full-width at half maximum (FWHM) of Brp::rGFP clusters improves only marginally beyond 20 iterations, when the XY FWHM is around 200 nm and the XZ FWHM is around 450 nm (Figure 1 - figure supplement 4).

      (5) Figure Legend 7: What is a "cytoplasm membrane marker"? Does this mean membrane-bound tdTom is sticking into the cytoplasm?

      We apologize for the typo and have corrected it to “plasma membrane marker”.

      (6) At the end of the introduction: "characterizing multiple structural parameters..." - which were these parameters? I was under the assumption that BRP localization, cluster volume, and intensity were assessed. I do not see how these are structural parameters. Please define what exactly is meant by "structural parameters".

      We apologize for the confusion. By "structural parameters”, we indeed referred to the volume, intensity and molecular density of Brp::rGFP clusters. We have revised the sentence to “Characterizing the distinct parameters and localization of Brp::rGFP cluster.”

      (7) Next to last sentence of the introduction: "Characterizing multiple structural parameters revealed a significant synaptic heterogeneity within single neurons and AZ distribution stereotypy across individuals." What do the authors mean by "significant synaptic heterogeneity"?

      By “synaptic heterogeneity”, we refer to the intracellular variability of active zone cytomatrices reported by Brp clusters. For instance, the intensities of Brp::rGFP clusters within Kenyon cell subtypes were variable among compartments (Figure 2). Intracellular variability of the Brp concentration of individual active zones was higher in DPM and APL neurons than Kenyon cells (Figure 3). These variabilities demonstrate intracellular synaptic heterogeneity. We have revised the sentence to be more specific to the different characters of Brp clusters.

      (8) I do not understand the last sentence of the introduction. "These cell-type-specific synapse profiles suggest that AZs are organized at multiple scales, ranging from neighboring synapses to across individuals." What do the authors mean by "ranging from neighboring synapses to across individuals"? Does this mean that even neighboring synapses in the same cell can be different?

      We have revised the sentence to “These cell-type-specific synapse profiles suggest that AZs are spatially organized at multiple scales, ranging from interindividual stereotypy to neighboring synapses in the same cells.”

      By “neighboring synapses", we refer to the nearest neighbor similarity in Brp levels in some cell-types (Figure 6A-C), and also the sub-compartmental dense AZ clusters with high Brp level in Kenyon cells (Figure 6D-H). By “across individuals”, we refer to the individually conserved active zone distribution patterns in some neurons (Figure 5).

      (9) The title talks about cell-type-specific spatial configurations. I do not understand what is meant by "spatial configurations"? Do you mean BRP cluster volume? I think the title is a little misleading.

      By “spatial configuration”, we refer to the arrangement of Brp clusters within individual mushroom body neurons. This statement is based on our findings on the intracellular synaptic heterogeneity (see also response to comment #7). We have streamlined the text description in the revised manuscript for clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 3A: exemplary two AZs are compared here, a histogram comparing more AZs would aid in making the point that in general, AZ of similar size have different BRP level (intensities) and how much variation exists.

      We have included histograms for Brp::rGFP intensity and cluster volumes to Figure 3 in the revised manuscript.

      (2) Line 52: "endogenous synapses" is a confusing term; it's probably meant that the protein levels within the synapse are endogenous and not overexpressed. 

      We apologize for the confusion and have revised the term to “endogenous synaptic proteins.”

      (3) It is not clear from the Materials and Methods section, whether and where deconvolved or not-deconvolved images were used for the quantification pipeline. Please comment on this. 

      We have now revised the Method section to clarify how deconvolved or not-deconvolved images were differently used in the pipeline.

      (4) Line 664 (C) not bold.

      We have corrected the error.

      (5) 725 "Files" should be Flies.

      We have corrected the error.

      (6) 727 two times "first".

      We have corrected the error.

      (7) Figure 7. All (A) etc., not bold - there should be consistent annotation. 

      We want to thank the reviewer for the detailed proof and have corrected all the errors spotted.

      Reviewer #3 (Recommendations for the authors):

      (1) Has there been an expression of the construct in a non-neuronal cell? Astrocyte-like cell? Any glia? As some sort of control for background and activity?

      As the reviewer suggested, we verified the neuronal expression specificity of Brp::rGFP. Using R86E01-GAL4 and Amon-GAL4, we compared Brp::rGFP in astrocyte-like glia and neuropeptide-releasing neurons. We found no Brp::rGFP puncta in the neuropils in astrocyte-like glia compared to neurons, suggesting Brp::rGFP is specific to neurons. We have included this new dataset to the revised manuscript (Figure 1 - figure supplement 2).

      (2) Similarly, expression of the construct co-expressed with a channelrhodopsin, and induction of a 'learning'-like regime of activity, similarly in a control type of experiment, expression of an inwardly rectifying channel (e.g. Kir2.1) to show that increases in size of the BRP puncta are truly activity dependent? The NMJ may be an optimal neuron to use to see the 'donut' structures of the AZs and their increase with activity. Also, are these truly AZs we are seeing here? Perhaps try co-expressing cacophony-dsRed? If the GFP Puncta are active zones, then they should be surrounded by cacophony.

      We would like to clarify that we did not find Brp::rGFP size increase upon learning. Instead, we demonstrated that associative training transiently remodelled sub-compartment-sized AZ “hot spots” in Kenyon cells, indicated by the correlation of local intensity and AZ density (Figure 6-7).

      To demonstrate split-GFP tagging does not affect activity-dependent plasticity associated with Brp, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors (Figure 1F).

      As the reviewer suggested, we performed the STED microscopy for the larval motor neuron and confirmed the donut-shape arrangement of Brp::rGFP (Wu, Eno et al., PLOS Biol 2025).

      Also following the reviewer’s suggestion, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (3) In the introduction: Intro, a sentence about BRP - central organiser of the active zone, so a key regulator of activity.

      We have included a few more sentences about the role Brp in the active zones to the revised manuscript.

      (4) Figure 1 E, line 650 'cite the resource here'. 

      We thank the reviewer for pointing out the error and we have corrected it.

      (5) Many readers may not be MB aficionados, and to make the data more accessible, perhaps use a cartoon of an MB with the cell bodies of the neurons around the MB expressing the constructs highlighted so that the reader can have a wider idea of the anatomy in relation to the MB.

      We appreciate these comments and have appended cartoons of the MB to figures to help readers understand the anatomy.

    1. Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. Two types of learned associations are characterized, one being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify time-resolved SC and SR correlates, which are used to test properties of their dynamics.

      The conclusion is reached that SC and SR associations can independently and simultaneously guide behavior. This conclusion is based on results showing SC and SR correlates are: (1) not entirely overlapping in cross-decoding; (2) simultaneously observed on average over trials in overlapping time bins; (3) independently correlate with RT; and (4) have a positive within-trial correlation.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations.

      Nice idea to orthogonalize ISPC condition (MC/MI) from stimulus features.

      Weaknesses:

      I still have my concern from the first round that the decoders are overfit to temporally structured noise. As I wrote before, the SC and SR classes are highly confounded with phase (chunk of session). I do not see how the control analyses conducted in the revision adequately deal with this issue.

      In the figures, there are several hints that these decoders are biased. Unfortunately, the figures are also constructed in such a way that hides or diminishes the salience of the clues of bias. This bias and lack of transparency discourage trust in the methods and results.

      I have two main suggestions:

      (1) Run a new experiment with a design that properly supports this question.

      I don't make this suggestion lightly, and I understand that it may not be feasible to implement given constraints; but I feel that this suggestion is warranted. The desired inferences rely on successful identification of SC and SR representations. Solidly identifying SC and SR representations necessitates an experimental design wherein these variables are sufficiently orthogonalized, within-subject, from temporally structured noise. The experimental design reported in this paper unfortunately does not meet this bar, in my opinion (and the opinion of a colleague I solicited).

      An adequate design would have enough phases to properly support "cross-phase" cross-validation. Deconfounding temporal noise is a basic requirement for decoding analyses of EEG and fMRI data (see e.g., leave-one-run-out CV that is effectively necessary in fMRI; in my experience, EEG is not much different, when the decoded classes are blocked in time, as here). In a journal with a typical acceptance-based review process, this would be grounds for rejection.

      Please note that this issue of decoder bias would seem to weaken the rest of the downstream analyses that are based on the decoded values. For instance, if the decoders are biased, in the within-trial correlation analysis, how can we be sure that co-fluctuations along certain dimensions within their projected values are driven by signal or noise? A similar issue clouds the LMM decoding-RT correlations.

      (2) Increase transparency in the reporting of results throughout main text.

      Please do not truncate stimulus-aligned timecourses at time=0. Displaying the baseline period is very useful to identify bias, that is, to verify that stimulus-dependent conditions cannot be decoded pre-stimulus. Bias is most expected to be revealed in the baseline interval when the data are NOT baseline-corrected, which is why I previously asked to see the results omitting baseline correction. (But also note that if the decoders are biased, baseline-correcting would not remove this bias; instead, it would spread it across the rest of the epoch, while the baseline interval would, on average, be centered at zero.)

      Please use a more standard p-value correction threshold, rather than Bonferroni-corrected p<0.001. This threshold is unusually conservative for this type of study. And yet, despite this conservativeness, stimulus-evoked information can be decoded from nearly every time bin, including at t=0. This does not encourage trust in the accuracy of these p-values. Instead, I suggest using permutation-based cluster correction, with corrected p<0.05. This is much more standard and would therefore allow for better comparison to many other studies.

      I don't think these things should be done as control analyses, tucked away in the supplemental materials, but instead should be done as a part of the figures in the main text -- including decoding, RSA, cross-trial correlations, and RT correlations.

      Other issues:

      Regarding the analysis of the within-trial correlation of RSA betas, and "Cai 2019" bias:

      The correction that authors perform in the revision -- estimating the correlation within the baseline time interval and subtracting this estimate from subsequent timepoints -- assumes that the "Cai 2019" bias is stationary. This is a fairly strong assumption, however, as this bias depends not only on the design matrix, but also on the structure of the noise (see the Cai paper), which can be non-stationary. No data were provided in support of stationarity. It seems safer and potentially more realistic to assume non-stationarity.

      This analysis was included in the supplemental material. However, given that the correlation analysis presented in the Results is subject to the "Cai 2019" bias, it would seem to be more appropriate to replace that analysis, rather than supplement it.

      Regardless, this seems to be a moot issue, given that the underlying decoders seem to be overfit to temporally structured noise (see point above regarding weakening of downstream analyses based on decoder bias).

      Outliers and t-values:

      More outliers with beta coefficients could be because the original SD estimates from the t-values are influenced more by extreme values. When you use a threshold on the median absolute deviation instead of mean +/-SD, do you still get more outliers with beta coefficients vs t-values?

      Random slopes:

      Were random slopes (by subject) for all within-subject variables included in the LMMs? If not, please include them, and report this in the Methods.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

      We thank the editors and the reviewers for their positive assessment of our work and for providing us with an opportunity to strengthen this manuscript. Please see below our responses to each comment raised in the reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations. Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Thank you for acknowledging the strength in EEG decoding and design. We have addressed all your concerns raised below point by point.

      Weaknesses:

      (1a) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      Thank you for raising this important issue. To test whether decoding might be confounded by temporally structured noise, we performed a control decoding analysis. As the reviewer correctly pointed out, cross-phase decoding is not possible due to the experimental design. Alternatively, to maximize temporal separation between the training and test data, we divided the EEG data in phase 2 and phase 1&3 into the first and second half chronologically. Phase 1 and 3 were combined because they share the same MC and MI assignments. We then trained the decoders on one half and tested them on the other half. Finally, we averaged the decoding results across all possible assignments of training and test data. The similar patterns (Supplementary Fig.1) observed confirmed that the decoding results are unlikely to be driven by temporally structured noise in the EEG data. The clarification has been added to page 13 of the revised manuscript.

      (1b) The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      As detailed in the response to your next comment, some new results using data without baseline correction show a narrower time window of above-chance decoding. We speculate that the remaining results of long-lasting above-chance decoding could be attributed to trials with slow responses (some responses were made near the response deadline of 1500 ms). Additionally, as shown in Figure 6a, the long-lasting above-chance decoding seems to be driven by color and congruency representations. Thus, it is also possible that the binding of color and congruency contributes to decoding. This interpretation has been added to page 17 of the revised manuscript.

      (1c) Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      As the reviewer mentioned, baseline-corrected data may introduce bias to the decoding results. Thus, we cited the van Driel et al (2021) paper in the revised manuscript to justify the use of EEG data without baseline-correction in decoding analysis (Page 27 of the revised manuscript), and re-ran all decoding analysis accordingly. The new results revealed largely similar results (Fig. 2, 4, 6 and 8 in the revised manuscript) with the following exceptions: narrower time window for separatable SC subspace and SR subspace (Fig. 4b), narrower time window for concurrent representations of SC and SR (Fig. 6a-b), and wider time window for the correlations of SC/SR representations with RTs (Fig. 8).

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      Thank you for this question. To test what dimensions are shared between SC and SR subspaces, we first identify which factors can be shared across SC and SR subspaces. For SC, the eight conditions are the four colors × ISPC. Thus, the possible shared dimensions are color and ISPC. Additionally, because the four colors and words are divided into two groups (e.g., red-blue and green-yellow, counterbalanced across subjects, see Methods), the group is a third potential shared dimension. Similarly, for SR decoders, potential shared dimensions are word, ISPC and group. Note that each class in SC and SR decoders has both congruent and incongruent trials. Thus, congruency is not decodable from SC/SR decoders and hence unlikely to be a shared dimension in our analysis. To test the effect of sharing for each of the potential dimensions, we performed RSA on decoding results of the SC decoder trained on SR subspace (SR | SC) (Supplementary Fig. 4a) and the SR decoder trained on SC subspace (SC | SR) (Supplementary Fig. 4b), where the decoders indicated the decoding accuracy of shared SC and SR representations. In the SC classes of SR | SC, word red and blue were mixed within the same class, same were word yellow and green. The similarity matrix for “Group” of SR | SC (Supplementary Fig. 4a) shows the comparison between two word groups (red & blue vs. yellow & green). The similarity matrix for “Group” of SC | SR (Supplementary Fig. 4b) shows the comparison between two color groups (red & blue vs. yellow & green).

      The RSA results revealed that the contributions of group to the SC decoder (Supplementary Fig. 5a) and the SR decoder (Supplementary Fig. 5b) were significant. Meanwhile, a wider time window showed significant effect of color on the SC decoder (approximately 100 - 1100 ms post-stimulus onset, Supplementary Fig. 5a) and a narrower time window showed significant effect of word on SR decoder (approximately 100 - 500 ms post-stimulus onset, Supplementary Fig. 5b). However, we found no significant effect of ISPC on either SC or SR decoders. We also performed the same analyses on response-locked data from the time window -800 to 200 ms. The results showed shared representation of color in the SC decoder (Supplementary Fig. 5c) and group in both decoders (Supplementary Fig. 5c-d). Overall, the above results demonstrated that color, word and group information are shared between SC and SR subspaces.

      Lastly, we would like to stress that our main hypothesis for the cross-subspace decoding analysis is that SR and SC subspaces are not identical. This hypothesis was supported by lower decoding accuracy for cross-subspace than within-subspace decoders and enables following analyses that treated SC and SR as separate representations.

      We have added the interpretation to page 13-14 of the revised manuscript.

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      Thank you for raising this important issue. Because the bias comes from the covariance between the regressors and the same GLM was applied to all time points in our analysis, we assume that the inflation would be similar at different time points. Therefore, we calculated the correlation of SC and SR betas ranging from -200 to 0 ms relative to stimulus onset as a baseline (i.e., no SC or SR representation is expected before the stimulus onset) and compared the post-stimulus onset correlation coefficients against this baseline. We hypothesized that if the positively within-trial correlation of SC and SR betas resulted from the simultaneous representation instead of inflation, we should observe significantly higher correlation when compared with the baseline. To examine this hypothesis, we first performed the linear discriminant analysis (Supplementary Fig. 7a) and RSA regression (Supplementary Fig. 7b) on the -200 - 0 ms window relative to stimulus onset. We then calculated the average r<sub>baseline</sub> of SC and SR betas on that time window for each participant (group results at each time point are shown in Supplementary Fig. 7c) and computed the relative correlation at each post-stimulus onset time point using (fisher-z (r) - fisher-z (r<sub>baseline</sub>)). Finally, we performed a simple t test at the group level on baseline-corrected correlation coefficients with Bonferroni correction. The results (Fig. 6c) showed significantly more positive correlation from 100 - 500 ms post-stimulus onset compared with baseline, supporting our hypothesis that the positive within-trial correlation of SC and SR betas arise from simultaneous representation rather than inflation. The related interpretation was added to page 17 of the revised manuscript.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms poststim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      Thanks for this question. We now pair each of the stimulus-locked EEG analysis in the manuscript with response-locked analysis. To control for RT variations among trial types, when using the linear mixed model (LMM) to predict RTs from trial-wise RSA results, we included a separate intercept for each of the eight trial types in SC or SR. Furthermore, at each time point, we only included trials that have not generated a response (for stimulus-locked analysis) or already started (for response-locked analysis). All the results (Fig. 3, 5, 7, 9 in the revised manuscript) are in support of our hypothesis. We added these detailed to page 31 of the revised manuscript.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

      To improve clarity, we revised the first paragraph of the SC and SR association subspace analysis to list the conditions for each of the SC and SR decoders and explain more about how the concept of being separatable can be tested by cross-decoding between SC and SR subspaces. The revised paragraph now reads:

      “Prior to testing whether controlled and non-controlled associations were represented simultaneously, we first tested whether the two representations were separable in the EEG data.

      In other words, we reorganized the 16 experimental conditions into 8 conditions for SC (4 colors × MC/MI, while collapsing across SR levels) and SR (4 words × 2 possible responses per word, while collapsing across SC levels) associations separately. If SC and SR associations are not separable, it follows that they encode the same information, such that both SC and SR associations can be represented in the same subspace (i.e., by the same information encoded in both associations). For example, because (1) the word can be determined by the color and congruency and (2) the most-likely response can be determined by color and ISPC, the SR association (i.e., association between word and most-likely response) can in theory be represented using the same information as the SC association. On the other hand, if SC and SR associations are separable, they are expected to be represented in different subspaces (i.e., the information used to encode the two associations is different). Notably, if some, but not all, information is shared between SC and SR associations, they are still separable by the unique information encoded. In this case, the SC and SR subspaces will partially overlap but still differ in some dimensions. To summarize, whether SC and SR associations are separable is operationalized as whether the associations are represented in the same subspace of EEG data. To test this, we leveraged the subspace created by the LDA (see Methods). Briefly, to capture the subspace that best distinguishes our experimental conditions, we trained SC and SR decoders using their respective aforementioned 8 experimental conditions. We then projected the EEG data onto the decoding weights of the LDA for each of the SC and SR decoders to obtain its respective subspace. We hypothesized that if SC and SR subspaces are identical (i.e., not separable), SC/SR decoding accuracy should not differ by which subspace (SC or SR) the decoder is trained on. For example, SC decoders trained in SC subspace should show similar decoding performance as SC decoders trained in SR subspace. On the other hand, if SC and SR association representations are in different subspaces, the SC/SR subspace will not encode all information for SR/SC associations. As a result, decoding accuracy should be higher using its own subspace (e.g., decoding SC using the SC subspace) than using the other subspace (e.g., decoding SC using the SR subspace). We used cross-validation to avoid artificially higher decoding accuracy for decoders using their own subspace (see Methods).” (Page 11-12).

      We also explicitly tested what information is shared between SC and SR representations (see response to comment #2). Lastly, to help the readers navigate the EEG results, we added a section “Overview of EEG analysis” to summarize the EEG analysis and their relations in the following manner:

      “EEG analysis overview. We started by validating that the 16 experimental conditions (8 unique stimuli × MC/MI) were represented in the EEG data. Evidence of representation was provided by above-chance decoding of the experimental conditions (Fig. 2-3). We then examined whether the SC and SR associations were separable (i.e., whether SC and SR associations were different representations of equivalent information). As our results supported separable representations of SC and SR association (Fig. 4-5), we further estimated the temporal dynamics of each representation within a trial using RSA. This analysis revealed that the temporal dynamics of SC and SR association representations overlapped (Fig. 6a-b, Fig. 7a-b). To explore the potential reason behind the temporal overlap of the two representations, we investigated whether SC and SR associations were represented simultaneously as part of the task representation, independently from each other, or competitively/exclusively (e.g., on some trials only SC association was represented, while on other trials only SR association was represented). This was done by assessing the correlation between the strength of SC and SR representations across trials (Fig. 6c, Fig. 7c). Lastly, we tested how SC and SR representations facilitated performance (Fig.8-9).” (Page 8-9).

      Minor suggestions:

      (6) I'd suggest using single-trial RSA beta coefficients, not t-values, as they can be more stable (it's a t-value based on 16 observations against 9 or so regressors.... the SE can be tiny).

      Thank you for your suggestion. To choose between using betas and t-values, we calculate the proportion of outliers (defined as values beyond mean ± 5 SD) for each predictor of the design matrix and each subject. We found that outliers were less frequent for t-values than for beta coefficients (t-values: mean = 0.07%, SD = 0.009%; beta-values: mean = 0.19%, SD = 0.033%). Thus, we decided to stay with t-values.

      (7) Instead of prewhitening the RTs before the HLM with drift terms, try putting those in the HLM itself, to avoid two-stage regression bias.

      Thank you for your suggestion. Because our current LMM included each of the eight trial types in SC or SR as separate predictors with their own intercepts (as mentioned above), adding regressors of trial number and mini blocks (1-100 blocks) introduced collinearity (as ISPC flipped during the experiment). We therefore excluded these regressors from the current LMM (Page 31).

      (8) The text says classical MDS was performed on decoding *accuracy* - is this accurate?

      We now clarify in the manuscript that it is the decoders’ probabilistic classification results (Page 28).

      (9) At a few points, it was claimed that a negative correlation between SC and SR would be expected within single trials, if the two were temporally dissociable. Wouldn't it also be possible that they are not correlated/orthogonal?

      We agree with the reviewer and revised the null hypothesis in the cross-trial correlation analysis to include no correlation as SC and SR association representations may be independent from each other (Page 17, 22).

      Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulusresponse pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA, they showed that the SC and SR representations were concurrently present in voltage signals, and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a solid design, but there are some confounding factors in the analyses that should be addressed to provide strong support for the conclusions.

      Strengths:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is potentially effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect, provided that some confounds are addressed.

      (2) Linking the strength of RSA scores with behavioural measures is critical to demonstrating the functional significance of the task representations in question.

      Thank you for your positive feedback. We hope our responses below address your concerns.

      Weakness:

      (1) While the use of RSA to model the decoding strength vector is a fitting choice, looking at the RDMs in Figure 7, it seems that SC, SR, ISPC, and Identity matrices are all somewhat correlated. I wouldn't be surprised if some correlations would be quite high if they were reported. Total orthogonality is, of course, impossible depending on the hypothesis, but from experience, having highly covaried predictors in a regression can lead to unexpected results, such as artificially boosting the significance of one predictor in one direction, and the other one to the opposite direction. Perhaps some efforts to address how stable the timed-resolved RSA correlations for SC and SR are with and without the other highly correlated predictors will be valuable to raising confidence in the findings.

      Thank you for this important point. The results of proportion of variability explained shown in the Author response table 1 below, indicated relatively higher correlation of SC/SR with Color and Identity. We agree that it is impossible to fully orthogonalize them. To address the issue of collinearity, we performed a control RSA by removing predictors highly correlated with others. Specifically, we calculated the variance inflation factor (VIF) for each predictor. The Identity predictor had a high VIF of 5 and was removed from the RSA. All other predictors had VIFs < 4 and were kept in the RSA. The results (Supplementary Fig. 6) showed patterns similar to the results with the Identity predictor, suggesting that the findings are not significantly influenced by collinearity. We have added the interpretation to page 17 of the revised manuscript.

      Author response table 1.

      Proportion of variability explained (r<sup>2</sup>) of RSA predictors.

      (2) In "task overview", SR is defined as the word-response pair; however, in the Methods, lines 495-496, the definition changed to "the pairing between word and ISPC" which is in accordance with the values in the RDMs (e.g., mccbb and mcirb have similarity of 1, but they are linked to different responses, so should they not be considered different in terms of SR?). This needs clarification as they have very different implications for the task design and interpretation of results, e.g., how correlated the SC and SR manipulations were.

      Thank you for pointing out this important issue with how our operationalization captures the concept in questions. In the revised manuscript, we clarified the stimulus-response (SR) association is the link between the word and the most-likely response (i.e., not necessarily the actual response on the current trial). This association is likely to be encoded based on statistical learning over several trials. On each trial, the association is updated based on the stimulus and the actual response. Over multiple trials, the accumulated association will be driven towards the most-common (i.e., most-likely) response. In our ISPC manipulation, a color is presented in mostly congruent/incongruent (MC/MI) trials, which will also pair a word with a most-likely response. For example, if the color blue is MC, the color blue, which leads to the response blue, will co-occur with the word blue with high frequency. In other words, the SR association here is between the word blue and the response blue. As the actual response is not part of the SR association, in the RDM two trial types with different responses may share the same SR association, as long as they share the same word and the same ISPC manipulation, which, by the logic above, will produce the same most-likely response. These clarifications have been added to page 4 and 29 of the revised manuscript.

      In the revised manuscript (Page 17), we addressed how much the correlated SC and SR predictors in the RDM could affect the correlation analysis between SC and SR association representation strength. Specifically, we conducted the RSA using the same GLM on EEG data prior to stimulus onset (Supplementary Fig. 7a-b). As no SC and SR associations are expected to be present before stimulus onset, the correlation between SC and SR representation would serve as a baseline of inflation due to correlated predictors in the GLM (Supplementary Fig. 7c, also see comment #3 of R1). The SC-SR correlation coefficients following stimulus onset was then compared to the baseline to control for potential inflation (Fig. 6c). Significantly above-baseline correlation was still observed between ~100-500 ms post-stimulus onset, providing support for the hypothesis that SC and SR are encoded in the same task representation.

      Minor suggestions:

      (3) Overall, I find that calling SC-controlled and SR-uncontrolled representations unwarranted. How is the level controlledness defined? Both are essentially types of statistical expectation that provide contextual information for the block of tasks. Is one really more automatic and requires less conscious processing than the other? More background/justification could be provided if the authors would like to use these terms.

      Following your advice, we have added more discussion on how controlledness is conceptualized in this work and in the literature, which reads:

      “We consider SC and SR as controlled and uncontrolled respectively based on the literature investigating the mechanism of ISPC effect. The SC account posits that the ISPC effect results from conflict and involves conflict adaptation, which requires the regulation of attention or control (Bugg & Hutchison, 2013; Bugg et al., 2011; Schmidt, 2018; Schmidt & Besner, 2008). On the other hand, the SR account argues that ISPC effect does not require conflict adaptation but instead reflects contingency leaning. That is, the response can be directly retrieved from the association between the stimulus and the most-likely response without top-down regulation of attention or control. As more empirical evidence emerged, researchers advocating control view began to acknowledge the role of associative learning in cognitive control regarding the ISPC effect (Abrahamse et al., 2016). SC association has been thought to include both automatic that is fast and resource saving and controlled processes that is flexible and generalizable (Chiu, 2019). Overall, we do not intend to claim that SC is entirely controlled or SR is completely automatic. We use SC-controlled and SR-uncontrolled representations to align with the original theoretical motivation and to highlight the conceptual difference between SC and SR associations.” (Page 24-25)

      (4) Figures 3c and d: the figures could benefit from more explanation of what they try to show to the readers. Also for 3d, the dimensions were aligned with color sets and congruencies, but word identities were not linearly separable, at least for the first 3 axes. Shouldn't one expect that words can be decoded in the SR subspace if word-response pairs were decodable (e.g., Figure 3b)?

      Thank you for the insightful observation. We now clarified that Fig. 3c and d in the original manuscript (Fig. 4c and d in the current manuscript) aim to show how each of the 8 trial types in the SC and SR subspaces are represented. The MDS approach we used for visualization tries to preserve dissimilarity between trial types when projecting from data from a high dimensional to a low dimensional space. However, such projection may also make patterns linearly separatable in high dimensional space not linearly separatable in low dimensional space. For example, if the word blue has two points (-1, -1) and (1, 1) and the word red has two points (-1, 1) and (1, -1), they are not linearly separatable in the 2D space. Yet, if they are projected from a 3D space with coordinates of (-1, -1, -0.1), (1, 1, -0.1), (-1, 1, 0.1) and (1, -1, 0.1), the two words can be linearly separatable using the 3<sup>rd</sup> dimension. Thus, a better way to test whether word can be linearly separated in SR subspace is to perform RSA on the original high dimensional space. We performed the RSA with word (Supplementary Fig. 2) on the SR decoder trained on the SR subspace. Note that in Fig. 3c and d of the original script (Fig. 4c and d in the current manuscript) there are two pairs of words that are not linearly separable: red-blue and yellow-green. Thus, we specifically tested the separability within the two pairs using the one predictor for each pair, as shown in Supplementary Fig. 2. The results showed that within both word pairs individual words were presented above chance level (Supplementary Fig. 3). Considering that the decoders are linear, this finding indicates linear separability of the word pairs in the original SR subspace. The clarification has been added to page 13 (the end of the second paragraph) of the revised manuscript.

      References

      Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.doi:10.1037/bul0000047.

      Bugg, J. M., & Hutchison, K. A. (2013). Converging evidence for control of color-word Stroop interference at the item level. Journal of Experimental Psychology:Human Perception and Performance, 39(2), 433-449. doi:10.1037/a0029145.

      Bugg, J. M., Jacoby, L. L., & Chanani, S. (2011). Why it is too early to lose control in accounts of item-specific proportion congruency effects. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 844-859. doi:10.1037/a0019957.

      Chiu, Y.-C. (2019). Automating adaptive control with item-specific learning. In Psychology of Learning and Motivation (Vol. 71, pp. 1-37).

      Schmidt, J. R. (2018). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin & Review, 26(3), 753-771. doi:10.3758/s13423018-1520-z.

      Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(3), 514-523. doi:10.1037/0278-7393.34.3.514.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Response to Review

      We would like to thank all three reviewers for their encouraging comments on our manuscript. We now submit our revised study after considerable efforts to address each of the reviewer concerns. I will first provide a response related to a major change we have made in the revision that addressed a concern common to all three reviewers, followed by a point-by-point response to individual comments.

      Replacing LRRK2ARM data with a LRRK2 specific type II kinase inhibitor: The most critical issue for all 3 reviewers was the use of our new CRISPR-generated truncation mutant of LRRK2 that we called LRRK2ARM. We had not provided direct evidence of the protein product of this truncation, which was a significant limitation. To address this we performed proteomics analysis of all clones, and to our surprise, we identified 7 peptides that were C-terminal to our "predicted" stop codon we had engineered into the CRISPR design. A repeat of the deep sequencing analysis in both directions then more clearly revealed site specific mutations leading to 4 amino acid changes at the junction of exon 19, without introducing a stop codon. Given that we could not detect the protein by western blot (even though proteomics now indicated the region of LRRK2 recognized by our antibodies was present) we decided to remove this clone from the manuscript. In the meantime we had compared the ineffectiveness of MLi-2 to block Rab8 phosphorylation during iron overload in the LRRK2G2019S cells with a type II kinase inhibitor called rebastinib. The data showed very clearly that treatment with rebastinib reversed the iron-induced phospho-Rab8 at the plasma membrane (and by western blot, in new Fig 3). Since this inhibitor is very broad spectrum inhibiting ~30% of the kinome we reached out to Sam Reck-Peterson and Andres Leschziner, experts in LRRK2 structure/function, who recently developed a much more selective LRRK2-specific type II kinase inhibitor they called RN341 and RN277 (developed with Stefan Knapp PMID: 40465731). These compounds effectively coupled the MLi-2 compound through an indole ring to a rebastinib type II compound to provide LRRK2 binding specificity to the efficient DYG "out" type II inhibitor. As with rebastinib, the new LRRK-specific kinase inhibitors also effectively reversed the cell surface p-Rab8 seen in LRRK2G2019S, iron loaded cells. These new data provide the first biological paradigm where the kinase activity of LRRK2 is resistant to type I MLi-2, yet remains highly sensitive to type II inhibitors. While the loss of our LRRK2ARM clone marks a significant change in the manuscript we believe the main message is stronger with the addition of the new LRRK2 specific type II kinase inhibitor. Our data show that it is indeed the active kinase function of LRRK2G2019S that is impacting the iron phenotypes we observe but highlight the conformational specificity upon iron overload such that MLi-2 is ineffective. The overall phenotypes we observe in LRRK2G2019S macrophages remain unchanged and are now expanded within the manuscript. We hope reviewers will agree that our work provides important new insights into LRRK2 function in iron homeostasis while opening new avenues of research in future studies.

      Given this new information we have changed the title from "LRRK2G2019S acts as a dominant interfering mutant in the context of iron overload" to the more accurate "LRRK2G2019S interferes with NCOA4 trafficking in response to iron overload leading to oxidative stress and ferroptotic cell death."

      Response to Reviewer 1

      Reviewer 1 (R1): There are two major concerns with the data in their present form. In brief, first, the G2019S cells express much less LRRK2 and more Rab8 that the WT cells and this severely affects interpretability.

      Heidi McBride (HM): We agree that the LRRK2G2019S lines express lower levels of LRRK2 than wild type, which is a previously documented phenomenon, presumably as the cell attempts to downregulate the increased kinase activity by reducing protein expression. However, the levels of Rab8 across 10s of experiments do not consistently show any differences between the wild type, G2019S and KO. We have provided more comprehensive quantifications of the blots in the revised version, and the Rab8 levels are consistent across all the blots presented in the manuscript (Figure 1A and 1B).

      R1: Second, the investigators used CRISPR to truncate the endogenous LRRK2 locus to produce a hypothetical truncated LRRK2-ARM polypeptide. This appears to have robust effects on NCOA4, in particular, which drives the overall interpretation of the data. However, the expression of this novel LRRK2 species is not confirmed nor compared to WT or G2019S in these cells (although admittedly the investigators did seek to address this with subsequent KO in the ARM cells). It would be premature to account for the changes reported without evidence of protein expression. This latter issue may be more easily addressed and could provide very strong support for a novel function/finding, see more detailed comments below, most seeking clarifications beyond the above.

      HM: As described in my common response above, we have removed the LRRK2ARM data from the manuscript.

      R1: Need to make clear in the results whether the G2019S CRISPR mutant is heterozygous or homozygous (presumably homozygous, same for ARM)

      HM: The RAW cell line we generated is homozygous for the G2019S and the KO alleles. We added this to the beginning of the results section and methods.

      R1: The text of the results implies that MLi2 was used in both WT and G2019S Raw cells, but it's only shown for G2019S. Given the premise for the use of RAW cells, it's important to show that there is basal LRRK2 kinase activity in WT cells to go along with its high protein expression. This is particularly important as the G2019S blot suggests minor LRRK2-independent phosphorylation of Rab8a (and other detected pRabs). One would imagine that pRab8 levels in both WT and G2019S would reduce to the same base line or ratio of total Rab in the presence of MLi2, but WT untreated is similar to G2019S with MLi2. This suggests no basal LRRK2 activity in the Raw cells, but I don't think that is the case.

      HM: We have included the data from MLi-2 treatment of wild type cells in Fig 3C quantified in D. Again, the baseline levels of Rab8 are unchanged across the genotypes. However, the reviewer is correct that there is some baseline LRRK2 kinase activity that is sensitive to MLi2 in wild type cells. This is seen most clearly on the autophosphorylation of LRRK2 at S1292 in Fig 3C. The pRab8 blots is not as clear in wild type cells. It is likely that LRRK2 must be actively recruited to membranes (as seen by others with LLOME, etc) to easily visualize p-Rabs in wild type cells. Nevertheless, we do clearly see the activity of autophosphorylation in wild type cells. Therefore while we understand the reviewers point that there should be some Rab8 phosphorylation in wild type cells, we don't see a significant, or very convincing, amount of it in our RAW macrophages.

      R1: Also, in terms of these cells, the levels of LRRK2 are surprisingly unmatched (Fig 1A, 1D, 1H, S1D, etc.) as are total levels of Rab8 (but in opposite directions) between the WT and G2019S. This is not mentioned in the Results text and is clearly reproducible and significant. Why do the investigators think this is? If Rab8 plays a role in iron, how do these differences affect the interpretation of the G2019S cells (especially given that MLi2 does not rescue)? Are other LRRK2-related Rabs affected at the protein (not phosphorylation level)? Could reduced levels of LRRK2 or increase Rab 8 alone or together account for some of these differences? Substantial further characterization is required as this seriously affects the interpretability of the data. Since pRab8 is not normalized to total Rab8, this G2019S model may not reflect a total increase in LRRK2 kinase activity, and could in fact have both less LRRK2 protein and less cellular kinase activity than WT (in this case).

      HM: In our hands, the RAW cells with homozygous LRRK2G2019S mutations show clearly that the total protein levels of LRRK2 is reduced compared to wild type, which is likely a compensatory effect to reduce cellular kinase activity overall. We understand that some of our previous blots were not so clear on the total Rab8 levels across the different experiments. We have repeated many of these experiments and hope the reviewer can see in Figs 1A, 3C, 3E, 3J, and Sup3A that the total Rab8 levels are stable across the conditions. We also present quantifications from 3 independent experiments normalizing the pRab8/Rab8 levels in all three genotypes in untreated and iron-loaded conditions (Supp Fig 3A and B), and upon MLi2 treatment (Fig 3C). In 3C and D the data show the effectiveness of MLi-2 to reduce pRab8 in control conditions, but the resistance to MLi-2 in FAS treated cells.

      R1: Presumably, the blots in 1H are whole cell lysates and account for the pooled soluble and insoluble NCOA4 (increased in G2019S), as there is no difference in soluble NCOA4 (Fig 2H). I suspect the prior difference is nicely reflected in the insoluble fraction (Fig 2H). This should be better explained in the Results text. This is a very interesting finding and I wonder what the investigators believe is driving this phenotype? Is the NCOA4 partitioning into a detergent-inaccessible compartment? Does this replicate with other detergents, those perhaps better at solubilizing lipid rafts? Is this a phenotype reversible with MLi2? Very interesting data.

      HM: We apologize for not being clearer in the text describing the behavior of NCOA4. The reviewer is correct that the major change in G2019S is the increased triton-X100 insoluble NCOA4. Previous work has established that NCOA4 segregates into detergent-insoluble foci upon iron overload as a way to release it from ferritin cages, and this fraction is then internalized into lysosomes through a microautophagy pathway (see Mizushima's work PMID: 36066504). In Fig 1I we show that the elevation in NCOA4 and ferritin heavy chain seen in untreated G2019S cells can be cleared upon iron chelation with DFO, indicating that the canonical NCOA4 mediated ferritinophagy (macroautophagy) pathway remains intact to recycle the iron in conditions of iron starvation. However in Figure 2 we show that conditions of iron overload, when NCOA4 segregates from ferritin (to allow cytosolic storage of iron), this form of NCOA4 cannot be degraded within the lysosome through the microautophagy pathway, and begins to accumulate. We see this with our live and fixed imaging compared to wild type cells (Fig 2A,D), and by the lack of clearance seen by western blot (Fig 2E). As for the impact of MLi-2, we observe some reversal of NCOA4 accumulation in untreated cells at 4 and 8 hrs after MLi-2 treatment (Supp Fig 2F). However, in iron loaded conditions the high NCOA4 levels in G2019S cells are MLi2 insensitive, while the elevated NCOA4 in wild type cells is reduced upon MLi2 addition (Fig. 2F, compare lates 3vs4 in wt with lanes 7vs8 in G2019S). This is consistent with a block in the microautophagy pathway of phase-separated NCOA4 degradation in G2019S cells.

      R1: Figure 2 describes the increased NCOA4-positive iron structures after iron load, but does not emphasize that the G2019S cells begin preloaded with more NCOA4. How do the investigators account for differential NCOA4 in this interpretation? Is this simply a reflection of more NCOA4 available in G2019S cells? This seems reasonable.

      HM: The reviewer is correct, we showed that there is some turnover of NCOA4 in untreated conditions through canonical ferritinophagy, but in iron overload this appears to be blocked, the NCOA4 segregates from ferritin and remains within insoluble, phase-separated structures that cannot be degraded through microautophagy. We have written the text to be more clear on these points.

      R1: These are very long exposures to iron, some as high as 48 hr which will then take into account novel transcriptomic and protein changes. Did the investigators evaluate cell death? Iron uptake would be trackable much quicker.

      HM: We agree that many things will change after our FAS treatments and now provide a full proteomics dataset on wild type and G2019S cells with and without iron overload, which is presented in Figure 4A-B. Indeed Figure 4 is entirely new to this revised submission. The proteomics highlighted a series of cellular changes that reflect major cell stress responses including the upregulation of HMOX1 (western blots to validate in Supp Fig 4A), an NRF2 transcriptional target consistent with our observation that NRF2 is stabilized and translocated to the nucleus in G2019S iron loaded cells (Sup Fig 4B,C). There are several interesting changes, and we highlighted the three major nodes, which are changes in iron response proteins, lysosomal proteins - particularly a loss of catalytic enzymes like lysozymes and granzymes consistent with the loss of hydrolytic capacity we show in Fig. 4C,D. We also noted changes in cytoskeletal proteins we suspect is consistent with the "blebbing" of the plasma membrane we see decorated with pRab8 in Fig 3. To test the activation of lipid oxidation likely resulting from the elevation in Fe2+ and oxidation signatures we employed the C11-bodipy probe and observe strong signal specific to the G2019 iron-loaded cells, particularly labelling endocytic compartments and the cell surface (Fig. 4E-G).

      Lastly, an analysis of SYTOX green uptake experiments was done to monitor the uptake of the dye into cells that have died of cell membrane rupture, commonly used to examine ferroptotic cell death. We now show the G2019S cells are very susceptible to this form of death (Fig 4H,I). These data add new functional evidence for the consequence of the G2019S mutation in an increased susceptibility to iron stress.

      R1: The legend for 2F is awkward (BSADQRED)

      HM: We have changed this to BSA-DQRed, which is a widely used probe to monitor the hydrolytic capacity of the lysosome.

      R1: Why are WT cells not included in Fig 2G?

      HM: We have now included new panels in Fig 3C,D showing wild type and G2019S +/- FAS and +/-ML-i2 with quantifications of pRab8/Rab8.

      R1: The biochemical characterization of NCOA4 in the LRRK2-arm cells is a great experiment and strength of the paper. The field would benefit by a bit further interrogation, other detergents, etc.

      HM: We have removed all of the LRRK2ARM data given our confusion over the impact of the 4 amino acid changes in exon 19 and our inability to monitor this protein by western blot. The concept that NCOA4 enters into TX100 insoluble, phase separated compartments has been well established, so we didn't explore other detergents at this point.

      R1: Have the investigators looked for aberrant Rab trafficking to lysosomes in the LRRK2-arm cells? Is pRab8 mislocalized compared to WT? Other pRabs?

      HM: We did initially show that pRab8 was also at the plasma membrane in the LRRK2ARM cells, and we still focus on this finding for the G2019S, seen in Fig 3A,B,F,H. We did try to look at other p-Rabs known to be targets of LRRK2 but none of them worked in immunofluorescence so we couldn't easily monitor specific traffic and/or localization changes for them.

      R1: The expression levels and therefore stability of the ARM fragment is not shown. This is necessary for interpretation. While very intriguing, the data in Aim 3 rely on the assumption that the ARM fragment is expressed, and at comparable levels to G2019S to account for phenotypes. The generation of second clone is admirable, but the expression of the protein must be characterized. This is especially true because of the different LRRK2 levels between WT and G2019S. One could easily conceive of exogenous expression of a tagged-ARM fragment into LRRK2 KO cells, for example, as another proof-of-concept experiment. If it is truly dominant, does this effect require or benefit from some FL LRRK2? It seems easy enough to express the LRRK2-ARM in at least WT and KO RAW cells.

      HM: We agree and our attempts to understand this clone resulted in its removal from the manuscript. We did also express cDNA encoding our ARM domain (up to exon 19), but it didn't phenocopy the CRISPR clone, which of course made sense once we had better proteomics and repeated our deep sequencing.

      In our further efforts to understand why our phenotype was MLi-2 resistant upon iron overload we expanded to examine the impact of pan-specific TypeII kinase inhibitors, and then reached out to the Reck-Peterson and Leschziner labs to obtain a newly developed LRRK2 selective type II kinase inhibitor. These all very efficiently reversed the pRab8 signals seen at the plasma membrane of G2019S cells upon iron overload (Fig 3E-K). Therefore the G2019S is not dominant negative, as we had initially supposed, rather there is a specific conformation of LRRK2 in high iron that potentially opens the ATP binding pocket to bind the type II inhibitors, but not MLi2. We do not understand exactly what this conformation is but likely involves new protein interactions specific to high iron, or perhaps LRRK2 binds iron directly as a sensor somehow that ultimately leads to the differential sensitivity we observe between type I and type II kinase inhibitors. Our data indicate that MLi-2 treatment in clinic will not be protective against iron toxicity phenotypes that may contribute to PD, where these newer selective type II LRRK2 kinase inhibitors would be effective in this conformation-specific context of iron toxicity.

      R1: Does iron overload induce Rab8a phosphorylation in a LRRK2 KO cell? This would be a solid extension on the ARM data and support the important finding that an additional kinase(s) can phosphorylate Rab8a under these conditions, and while not unexpected, this may not have been demonstrated by others as clearly. It also addresses whether the ARM domain is important to this other putative kinase(s), which may add value to the authors' model.

      HM: Iron overload does not induce pRab8 in LRRK2 KO cells, as seen by immunofluorescence in Fig 3A,B, and western blot in Supp Fig 3 A,B. With our new type II kinase inhibitor data we can confirm that the plasma membrane localized Rab8 is indeed phosphorylated by LRRK2.

      R1: Minor concern - the abstract but not the introduction emphasizes a hypothesis that loss of neuromelanin may promote cell loss in PD (through loss of iron chelation), while post mortem studies are by definition only correlative, early works suggested that the higher melanized DA neurons were preferentially lost when compared to poorly melanized neurons in PD. This speculation in the abstract is not necessary to the novel findings of the paper.

      HM: We appreciate that the links to iron in PD are correlative, we have maintained some of our discussion on this point within the manuscript given the lack of attention the field has paid to the cell biology of iron homeostasis in PD models. If there is a cell autonomous nature to the loss of DA neurons in PD, iron is very likely to be a part of this specificity in our opinion. Most of the newer MRI studies looking at iron levels in patient brains are showing higher free iron and working on this as potential biomarkers of disease. The precise timing of this relative to the stability/loss of neuromelanin is, I agree, not really clear.

      R1: (Significance (Required)): This study could shed light on a both novel and unexpected behavior of the LRRK2 protein, and open new insights into how pathogenic mutations may affect the cell. While studied in one cell line known for unusually high LRRK2 expression levels, data in this cell type have been broadly applicable elsewhere. Give the link to Parkinson's disease, Rab-dependent trafficking, and iron homeostasis, the findings could have import and relevance to a rather broad audience.

      HM: We are so very appreciative that reviewer 1 feels our work will be of interest to the PD and cell biology communities.

      Response to Reviewer 2

      Reviewer 2 (R2): Major: Please confirm that the observed phenotype is conserved within bone marrow-derived macrophages of LRRK2 G2019S mice. These mice are widely available within the community and frozen bone marrow could be sent to the labs. The main reason for this experiment is that CRISPR macrophage cell lines do sometimes acquire weird phenotypes (at least in our lab they sometimes do!) and it would strengthen the validity of the observations.

      HM: We did a series of experiments on primary BMDM derived from 3 pairs of wild type, LRRK2G2019S and LRRK2KO mice. We examined levels of ferritin heavy and light chains in steady state and withFAS treatment experiments. Unfortunately the data did not phenocopy the RAW macrophage lines we present here since FTL and FTH were mostly unchanged. We did observe an increase in NCOA4 levels, consistent with potential issues with microautophagy as observed in our RAW system.

      While we understand the danger that our phenotypes are nonspecific and linked to a CRISPR-based anomaly, there are a number of arguments we would make that these data and pathways are potentially very important to our understanding of LRRK2 mutant phenotypes and pathology. The first point is that we now include a LRRK2-specific type II kinase inhibitor that reverses the iron-overload pRab8 accumulation at the plasma membrane in LRRK2G2019S cells, showing that this is at least directly linked to LRRK2 kinase activity, even though it is resistant to MLi2.

      Second, Suzanne Pfeffer recently published their single cell RNAseq datasets from brains of untreated LRRK2G2019S mice (PMID: 39088390). She reported major changes in Ferritin heavy chain (it is lost) in very specific cell types of the brain, astrocytes, microglia and oligodendrocytes, with no changes in other cell types at all (her Fig 6 included left). This is consistent with a very context specific impact of LRRK2 on iron homeostasis that we don't yet understand.

      Third, the labs of both Cookson, Mamais and Lavoie have been working on the impact of LRRK2 mutations on iron handling in a few different model systems, including iPSCs, and see changes in transferrin recycling and iron accumulation. Those studies did not go into much detail on ferritin, NCOA4 and other readouts of iron homeostasis but are roughly in agreement with our work here. In the last biorxiv study submitted after we sent this work for review they concluded their phenotypes were reversed by MLi2 treatment, however they required 7 days of treatment for a ~20% restoration in iron levels. Given our work it would seem the impact of LRRK2G019S in high iron conditions is also very resistant to MLi2 treatment. In all these studies we do not yet know for sure whether iron overload in the brain may be a precursor to DA neuron cell death, which could be exacerbated in G2019S carriers. But we hope the reviewer will agree that our approach and findings will be useful for the field to expand on these concepts within different models of PD.

      R2: Minor comments: Supplementary Fig 1: I don't think one should normalize all controls to 1 and then do a statistical test as obviously the standard deviation of control is 0.

      HM: We agree with the reviewer that statistical testing is not appropriate when the WT control is fixed to a value of 1, as this necessarily eliminates variance in that group; accordingly, we have removed both statistical comparisons and standard deviation from the WT control while retaining variability measures for all experimental conditions. Raw densitometry values could not be pooled across independent experiments due to substantial inter-blot variability, and therefore normalization to the WT control was used solely to allow relative comparison within experiments, acknowledging the inherent quantitative limitations of Western blot densitometry. Ultimately the magnitude of the changes relative to the control lanes in each biological replicate was consistent across experiments, even if the absolute density of the bands between experiments was not always the same.

      R2: The raw data needs to be submitted to PRIDE or similar.

      HM: All of our data is being uploaded to the GEO databases, protocols to protocols.io and raw data deposited on Zenodo site in compliance with our ASAP funding requirements and the journals.

      R2: Some of the western blots could be improved. If these are the best shown, I am a little concerned about the reproducibility. How often has they been done?

      HM: We now ensure there is quantification of all the blots for at least 3 independent experiments and have worked to improve the quality of them throughout the revision period.

      R2: (Significance (Required)): Considering the importance of LRRK2 biology in Parkinson's and the new biology shown, this paper will be of great interest to the community and wider research fields.

      HM: We are so very grateful that the reviewer appreciates that the LRRK2 and PD community will find our work of interest. We hope our revisions will prove satisfactory even in the absence of ferritin changes in primary G2019S BMDM.

      Response to Reviewer 3

      Reviewer 3 (R3): What is missing in the study is the physiological relevance of these findings, mainly whether this effect actually results in higher cell death during iron overload. Since iron overload is known to result in ferroptosis, it is surprising that the authors have not checked whether the LRRK2 G2019S and ARM cells undergo more ferroptosis relative to LRRK2 WT cells.

      HM: We thank the reviewer for pushing us to monitor the functional implications of the iron mishandling upon iron overload in the G2019S RAW cell system. We now add a completely new Figure 4 to get to these functional points. We employed two tools to look at established aspects of ferroptosis, first the C11-bodipy probe that labels oxidized lipids and we see significant signals specific to the G2019S iron loaded cells, where it labels endocytic membranes and the cell surface (Fig 4 E-G). This is consistent with the elevation of free iron 2+. We also used the SYTOX green death assay where the dye is internalized into cells when the cell surface is ruptured and show that G2019S cells die upon iron overload, but not the LRRK2KO or wild type cells (Fig 4 H,I). Lastly, we performed full proteomics analysis of the wt and G2019S RAW cells in iron overload conditions. These data provide a better view of the full stress response initiated in the G2019S cells, including the upregulation of HMOX1 (an NRF2 target gene), changes in lysosomal hydrolytic enzymes consistent with the reduction in BSA-DQRed signals, and in cytoskeleton, which is consistent with the plasma membrane blebbing phenotypes we see in G2019S (Fig. 4A-D and Supp. Fig 4 data). We hope these new data help to position the phenotype into a more physiological output.

      R3: Moreover, their conclusion of the findings as "resistant to LRRK2 kinase inhibitors" is not convincing, since in most of the studies, they have removed the kinase domain, and this description implies the use of pharmacological kinase inhibition which has not been done in this paper.

      HM: We took this comment to heart and, as explained in the general response we removed the LRRK2ARM clones from the study. To understand the kinase function in the iron overload conditions we first explored the pan-specific type II kinase inhibitor rebastinib, shown to inhibit LRRK2. In contrast to MLi2, this drug effectively blocked p-Rab8 in G2019S cells exposed to high iron. However, since it is not specific and likely inhibits about 30-40% of all kinases we reached out to the Reck-Peterson and Leschziner labs who have developed a LRRK2 specific type II kinase inhibitor (published in June 2025 PMID: 40465731). They provided these to us (along with a great deal of discussion) and the two drugs both blocked the effect of LRRK2G2019 on p-Rab8 at the plasma membrane. These data show that the phenotypes we observe are indeed linked to the increased kinase activity of LRRK2, even though they are fully resistant to MLi-2. It suggests that high iron results in some alteration in LRRK2 conformation that alters the ability of MLi2 to block the kinase activity, while still allowing the type II kinase inhibitors that bind deeper in the ATP-binding pocket, to functionally block activity. We believe that these new data remove a great deal of confusion we had in the initial submission to explain the MLi-2 resistance.

      R3: There is lower LRRK2 expression in LRRK2 G2019S cells, have the authors checked Rab phosphorylation to validate the mutation?

      HM: We agree that the G2019S mutation leads a reduction in total LRRK2 levels in the cell, which is likely a compensatory effect to lower kinase activity in the cell. We do show that the G2019S mutation has clear activation of phosphorylation on both Rab8 and at the autophosphorylation site S1292 of LRRK2, as seen in Fig 1A, quantified in Fig 1B. In untreated conditions, these phosphorylation events are reversible upon treatment with MLi-2. We also provide the sequencing data in the supplement to confirm the presence of the G2019S mutation in this clone, shown in Supp Fig. 1A.

      R3: The authors should specify if their cells are heterozygous or homozygous since they are discussing a dominant interfering mutant.

      HM: The G2019S and LRRK2 KO are both homozygous. We state this early in the results section and the methods.

      R3: The transferrin phenotype validated through proteomics and western blot is solid. HM: We agree, thank you very much!

      R3: Quantification in figure 1F-G is problematic, not clear what they mean by "diffuse and lysosomal". Puncta is either colocalising with lysosomes or not colocalising. This needs to be clarified and re-analysed.

      HM: We apologize for the confusion. In control cells the Cherry tagged FTL is efficiently cycling through the lysosomes and we don't see a strong cytosolic (diffuse) pool, which likely reflects the relatively iron-poor culture conditions. However, in G2019S cells, there is a highly elevated amount of FTL, with a strong cytosolic/diffuse stain in steady state, with some flux into lysosomes. In this experiment we chelated iron to test whether this cytosolic pool of FTL was capable of clearing through the lysosomes (ferritinophagy). While there is a cytosolic (diffuse) pool that remains, the pool that fluxes into the lysosome increases in G2019S chelated cells. This is also seen by the reduction in total FTL seen by western blot (endogenous FTL). Our conclusion here is that the general ferritinophagy machinery remains functional in G2019S cells. We have changed the term "diffuse" to "cytosolic" and improved our description of this experiment in the text.

      R3: Text in the first results part called "LRRK2G2019S RAW macrophages have altered iron homeostasis" is very long. It could be divided into more sections to improve readability. HM: We have improved the text to be more descriptive of the conclusions and added new sections

      R3: If the effect is armadillo-dependent, where does LRRK2 G2019S is implicated since there is no kinase domain in these cells?

      HM: Our new data employing the LRRK2-specific type II kinase inhibitors now confirm that the effects of the G2019S on iron overload are indeed kinase dependent, it's just insensitive to MLi2.

      R3: The authors do not show any controls (PCR, sequencing) confirming knockout or truncation. HM: We did higher resolution proteomics and deep sequencing and learned that the "Arm" mutation was not a truncation but a series of 4 point mutations around exon 19. Therefore we removed all data referring to this clone and replaced it with the use of the type II kinase inhibitor experiments. We feel this removed a lot of confusion and provides much clearer conclusions on the role of the kinase activity in iron overload. We may continue to explore what the 4 amino acid mutations created such strong phenotypes, as it could reflect a critical conformational change that impacts the kinase activity. But that is for future work. We now include the sequencing files of the G2019 and KO as Supplementary Data Files 1 and 2.

      R3: The data is interesting and the image quality with the insets is very high. HM: We thank the reviewer for their positive comments!

      R3: Mutant not clearly described in text, did the authors remove just the kinase and ROC-COR domains or all the domains downstream of the Armadillo domain? This is not clear. HM: We have removed the clone from the manuscript.

      R3: The authors cannot conclude that their phenotype is due to the independence of the kinase domain specifically as they are also interfering with the GTPase activity by removing the ROC-COR domains. HM: We agree and our new drugs allow us to confirm that the phenotypes are due to kinase activity, but there is a new conformation of LRRK2 induced in high iron that renders the kinase domain resistant to MLi-2 inhibition. We discuss this in the manuscript now.

      R3: In Figure 3E, is the difference between the "ARM CTRL" and the "ARM FAS" conditions significant? A trend appears to be there, but the p-value is not shown. HM: these data are now removed.

      R3: In figure 4A, it would have been important to check if Rab8 phosphorylation is also observed in LRRK2 KO cells after administration of FAS to further evaluate the mechanism through which this Rab8 phosphorylation is occurring.

      HM: We show that the pRab8 is specific to the G2019S lines and not seen in LRRK2 KO (Fig 3A,B, Supp. Fig. 3A,B).

      R3: The vinculin bands in figure 4A are misaligned with the rest of the bands.

      HM: We now provide new blots for all of these experiments (in Fig 3) as we removed the LRRK2ARM data from the manuscript and the appropriate loading controls are all included.

      R3: The authors do not have any controls to validate the pRab8 staining in IF. This is an important caveat and needs to be addressed. HM: We now include siRNA validation of Rab8 (vs Rab10) to confirm the specificity of the antibody to pRab8 in IF where it labels the plasma membrane in G2019S iron loaded cells.

      R3: The authors should have checked if FAS administration in the LRRK2 G2019S and the ARM cells is leading to ferroptotic cell death (or cell death in general). This is key to validate the link between the altered iron homeostasis in LRRK2 G2019S cells and increased cytotoxicity observed during neurodegeneration.

      HM: As mentioned above, we have added extensively to our new Fig 4 to include full proteomics analysis of the changes in iron loaded G2019S cells, we use C11-Bodipy probes to monitor lipid oxidation, and SYTOX green assays to monitor cell death through cell surface rupture (consistent with ferroptosis). We thank the reviewer for pushing us to do these experiments and provide further relevance to the potential for LRRK2 mutations to promote cell toxicity during neurodegeneration.

      R3: Regarding the literature, the authors are missing some important papers that are preprinted and these studies need to be discussed. This includes a report with opposite findingshttps://www.biorxiv.org/content/10.1101/2025.09.26.678370v1.full and a report showing kinase independent cell death in macrophages https://www.biorxiv.org/content/10.1101/2023.09.27.559807v1.abstract

      HM: We thank the reviewers for alerting us to the biorxiv papers, one of which was submitted after we sent our manuscript to review. We are excited to see the growing interest in the impact of LRRK2 function in iron homeostasis and hope our work will contribute to this. Upon reading the study from the LaVoie lab they do show some sensitivity of the iron loaded phenotype in G2019S cells, however they see a ~20% reduction in lysosomal iron after 7 days of MLi treatment in Astrocytes (their Fig 2L). To us, this is very likely an indication of a relatively high resistance to the drug. I'm sure if they tried these new Type II inhibitors the iron load would be much more rapidly reversed. The specificity of their phenotype to Rab8 is also very interesting considering the cell surface localization we see for pRab8 in our iron loaded system. Similar comments for the Guttierez study in macrophages. We have included the findings of these papers within the manuscript and thank the reviewer for pointing them out.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this study, Lamberti et al. investigate how translation initiation and elongation are coordinated at the single-mRNA level in mammalian cells. The authors aim to uncover whether and how cells dynamically adjust initiation rates in response to elongation dynamics, with the overarching goal of understanding how translational homeostasis is maintained. To this end, the study combines single-molecule live-cell imaging using the SunTag system with a kinetic modeling framework grounded in the Totally Asymmetric Simple Exclusion Process (TASEP). By applying this approach to custom reporter constructs with different coding sequences, and under perturbations of the initiation/elongation factor eIF5A, the authors infer initiation and elongation rates from individual mRNAs and examine how these rates covary.

      The central finding is that initiation and elongation rates are strongly correlated across a range of coding sequences, resulting in consistently low ribosome density ({less than or equal to}12% of the coding sequence occupied). This coupling is preserved under partial pharmacological inhibition of eIF5A, which slows elongation but is matched by a proportional decrease in initiation, thereby maintaining ribosome density. However, a complete genetic knockout of eIF5A disrupts this coordination, leading to reduced ribosome density, potentially due to changes in ribosome stalling resolution or degradation.

      Strengths:

      A key strength of this work is its methodological innovation. The authors develop and validate a TASEP-based Hidden Markov Model (HMM) to infer translation kinetics at single-mRNA resolution. This approach provides a substantial advance over previous population-level or averaged models and enables dynamic reconstruction of ribosome behavior from experimental traces. The model is carefully benchmarked against simulated data and appropriately applied. The experimental design is also strong. The authors construct matched SunTag reporters differing only in codon composition in a defined region of the coding sequence, allowing them to isolate the effects of elongation-related features while controlling for other regulatory elements. The use of both pharmacological and genetic perturbations of eIF5A adds robustness and depth to the biological conclusions. The results are compelling: across all constructs and conditions, ribosome density remains low, and initiation and elongation appear tightly coordinated, suggesting an intrinsic feedback mechanism in translational regulation. These findings challenge the classical view of translation initiation as the sole rate-limiting step and provide new insights into how cells may dynamically maintain translation efficiency and avoid ribosome collisions.

      We thank the reviewer for their constructive assessment of our work, and for recognizing the methodological innovation and experimental rigor of our study.

      Weaknesses:

      A limitation of the study is its reliance on exogenous reporter mRNAs in HeLa cells, which may not fully capture the complexity of endogenous translation regulation. While the authors acknowledge this, it remains unclear how generalizable the observed coupling is to native mRNAs or in different cellular contexts.

      We agree that the use of exogenous reporters is a limitation inherent to the SunTag system, for which there is currently no simple alternative for single-mRNA translation imaging. However, we believe our findings are likely generalizable for several reasons.

      As discussed in our introduction and discussion, there is growing mechanistic evidence in the literature for coupling between elongation (ribosome collisions) and initiation via pathways such as the GIGYF2-4EHP axis (Amaya et al. 2018, Hickey et al. 2020, Juszkiewicz et al. 2020), which might operate on both exogenous and endogenous mRNAs.

      As already acknowledged in our limitations section, our exogenous reporters may not fully recapitulate certain aspects of endogenous translation (e.g., ER-coupled collagen processing), yet the observed initiation-elongation coupling was robust across all tested constructs and conditions.

      We have now expanded the Discussion (L393-395) to cite complementary evidence from Dufourt et al. (2021), who used a CRISPR-based approach in Drosophila embryos to measure translation of endogenous genes. We also added a reference to Choi et al. 2025, who uses a ER-specific SunTag reporter to visualize translation at the ER (L395-397).

      Additionally, the model assumes homogeneous elongation rates and does not explicitly account for ribosome pausing or collisions, which could affect inference accuracy, particularly in constructs designed to induce stalling. While the model is validated under low-density assumptions, more work may be needed to understand how deviations from these assumptions affect parameter estimates in real data.

      We agree with the reviewer that the assumption of homogeneous elongation rates is a simplification, and that our work represents a first step towards rigorous single-trace analysis of translation dynamics. We have explicitly tested the robustness of our model to violations of the low-density assumption through simulations (Figure 2 - figure supplement 2). These show that while parameter inference remains accurate at low ribosome densities, accuracy slightly deteriorates at higher densities, as expected. In fact, our experimental data do provide evidence for heterogeneous elongation: the waiting times between termination events deviate significantly from an exponential distribution (Figure 3 - figure supplement 2C), indicating the presence of ribosome stalling and/or bursting, consistent with the reviewer's concern. We acknowledge in the Limitations section (L402-406) that extending the model to explicitly capture transcript-dependent elongation rates and ribosome interactions remains challenging. The TASEP is difficult to solve analytically under these conditions, but we note that simulation-based inference approaches, such as particle filters to replace HMMs, could provide a path forward for future work to capture this complexity at the single-trace level.

      Furthermore, although the study observes translation "bursting" behavior, this is not explicitly modeled. Given the growing recognition of translational bursting as a regulatory feature, incorporating or quantifying this behavior more rigorously could strengthen the work's impact.

      While we do not explicitly model the bursting dynamics in the HMM framework, we have quantified bursting behavior directly from the data. Specifically, we measure the duration of translated (ON) and untranslated (OFF) periods across all reporters and conditions (Figure 1G for control conditions and Figure 4G-H for perturbed conditions), finding that active translation typically lasts 10-15 minutes interspersed with shorter silent periods of 5-10 minutes. This empirical characterization demonstrates that bursting is a consistent feature of translation across our experimental conditions. The average duration of silent periods is similar to what was inferred by Livingston et al. 2023 for a similar SunTag reporter; while the average duration of active periods is substantially shorter (~15 min instead of ~40 min), which is consistent with the shorter trace duration in our system compared to theirs (~15 min compared to ~80 min, on average). Incorporating an explicit two-state or multi-state bursting model into the TASEP-HMM framework would indeed be computationally intensive and represents an important direction for future work, as it would enable inference of switching rates alongside initiation and elongation parameters. We have added this point to the Discussion (L415-417).

      Assessment of Goals and Conclusions:

      The authors successfully achieve their stated aims: they quantify translation initiation and elongation at the single-mRNA level and show that these processes are dynamically coupled to maintain low ribosome density. The modeling framework is well suited to this task, and the conclusions are supported by multiple lines of evidence, including inferred kinetic parameters, independent ribosome counts, and consistent behavior under perturbation.

      Impact and Utility:

      This work makes a significant conceptual and technical contribution to the field of translation biology. The modeling framework developed here opens the door to more detailed and quantitative studies of ribosome dynamics on single mRNAs and could be adapted to other imaging systems or perturbations. The discovery of initiation-elongation coupling as a general feature of translation in mammalian cells will likely influence how researchers think about translational regulation under homeostatic and stress conditions.

      The data, models, and tools developed in this study will be of broad utility to the community, particularly for researchers studying translation dynamics, ribosome behavior, or the effects of codon usage and mRNA structure on protein synthesis.

      Context and Interpretation:

      This study contributes to a growing body of evidence that translation is not merely controlled at initiation but involves feedback between elongation and initiation. It supports the emerging view that ribosome collisions, stalling, and quality control pathways play active roles in regulating initiation rates in cis. The findings are consistent with recent studies in yeast and metazoans showing translation initiation repression following stalling events. However, the mechanistic details of this feedback remain incompletely understood and merit further investigation, particularly in physiological or stress contexts. 

      In summary, this is a thoughtfully executed and timely study that provides valuable insights into the dynamic regulation of translation and introduces a modeling framework with broad applicability. It will be of interest to a wide audience in molecular biology, systems biology, and quantitative imaging.

      We appreciate the reviewer's thorough and positive assessment of our work, and that they recognize both the technical innovation of our modeling framework and its potential broad utility to the translation biology community. We agree that further mechanistic investigation of initiation-elongation feedback under various physiological contexts represents an important direction for future research.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses single-molecule run-off experiments and TASEP/HMM models to estimate biophysical parameters, i.e., ribosomal initiation and elongation rates. Combining inferred initiation and elongation rates, the authors quantify ribosomal density. TASEP modeling was used to simulate the mechanistic dynamics of ribosomal translation, and the HMM is used to link ribosomal dynamics to microscope intensity measurements. The authors' main conclusions and findings are:

      (1) Ribosomal elongation rates and initiation rates are strongly coordinated.

      (2) Elongation rates were estimated between 1-4.5 aa/sec. Initiation rates were estimated between 0.5-2.5 events/min. These values agree with previously reported values.

      (3) Ribosomal density was determined below 12% for all constructs and conditions.

      (4) eIF5A-perturbations (KO and GC7 inhibition) resulted in non-significant changes in translational bursting and ribosome density.

      (5) eIF5A perturbations resulted in increases in elongation and decreases in initiation rates.

      Strengths:

      This manuscript presents an interesting scientific hypothesis to study ribosome initiation and elongation concurrently. This topic is highly relevant for the field. The manuscript presents a novel quantitative methodology to estimate ribosomal initiation rates from Harringtonine run-off assays. This is relevant because run-off assays have been used to estimate, exclusively, elongation rates.

      We thank the reviewer for their careful evaluation of our work and for recognizing the novelty of our quantitative methodology to extract both initiation and elongation rates from harringtonine run-off assays, extending beyond the traditional use of these experiments.

      Weaknesses:

      The conclusion of the strong coordination between initiation and elongation rates is interesting, but some results are unexpected, and further experimental validation is needed to ensure this coordination is valid. 

      We agree that some of our findings need further experimental investigation in future studies. However, we believe that the coordination between initiation and elongation is supported by multiple results in our current work: (1) the strong correlation observed across all reporters and conditions (Figure 3E), and (2) the consistent maintenance of low ribosome density despite varying elongation rates. While additional experimental validation would be valuable, we note that directly manipulating initiation or elongation independently in mammalian cells remains technically challenging. Nevertheless, our findings are consistent with emerging mechanistic understanding of collision-sensing pathways (GIGYF2-4EHP) that could mediate such coupling, as discussed in our manuscript.

      (1) eIF5a perturbations resulted in a non-significant effect on the fraction of translating mRNA, translation duration, and bursting periods. Given the central role of eIF5a, I would have expected a different outcome. I would recommend that the authors expand the discussion and review more literature to justify these findings.

      We appreciate this comment. This finding is indeed discussed in detail in our manuscript (Discussion, paragraphs 6-7). As we note there, while eIF5A plays a critical role in elongation, the maintenance of bursting dynamics and ribosome density upon perturbation can be explained by compensatory feedback mechanisms. Specifically, the coordinated decrease in initiation rates that counterbalances slower elongation to maintain homeostatic ribosome density. We also discuss several factors that complicate interpretation: (1) potential RQC-mediated degradation masking stronger effects in proline-rich constructs, (2) differences between GC7 treatment and genetic knockout suggesting altered stalling resolution kinetics, and (3) the limitations of using exogenous reporters that lack ER-coupled processing, which may be critical for eIF5A function in endogenous collagen translation (as suggested by Rossi et al., 2014; Mandal et al., 2016; Barba-Aliaga et al., 2021). The mechanistic complexity and tissue-specific nature of eIF5A function in mammals, which differs substantially from the better-characterized yeast system, likely contributes to the nuanced phenotype we observe. We believe our Discussion adequately addresses these points.

      (2) The AAG construct leading to slow elongation is very surprising. It is the opposite of the field consensus, where codon-optimized gene sequences are expected to elongate faster. More information about each construct should be provided. I would recommend more bioinformatic analysis on this, for example, calculating CAI for all constructs, or predicting the structures of the proteins.

      We agree that the slow elongation of the AAG construct is counterintuitive and indeed surprising. Following the reviewer's suggestion, we have now calculated the Codon Adaptation Index (CAI) for all constructs (Renilla 0.89, Col1a1 0.78, Col1a1 mutated 0.74). It is therefore unlikely that codon bias explains the slow translation, particularly since we designed the mutated Col1a1 construct with alanine codons selected to respect human codon usage bias, thereby minimizing changes in codon optimality. As we discuss in the manuscript, we hypothesize that the proline-to-alanine substitutions disrupted co-translational folding of the collagen-derived sequence. Prolines are critical for collagen triple-helix formation (Shoulders and Raines, 2009), and their replacement with alanines likely generates misfolded intermediates that cause ribosome stalling (Barba-Aliaga et al., 2021; Komar et al., 2024). This interpretation is supported by the high frequency (>30%) of incomplete run-off traces for AAG, suggesting persistent stalling events. Our findings thus illustrate an important potential caveat: "optimizing" a sequence based solely on codon usage can be detrimental when it disrupts functionally important structural features or co-translational folding pathways.

      This highlights that elongation rates depend not only on codon optimality but also on the interplay between nascent chain properties and ribosome progression.

      (3) The authors should consider using their methodology to study the effects of modifying the 5'UTR, resulting in changes in initiation rate and bursting, such as previously shown in reference Livingston et al., 2023. This may be outside of the scope of this project, but the authors could add this as a future direction and discuss if this may corroborate their conclusions. 

      We thank the reviewer for this excellent suggestion. We agree that applying our methodology to 5'-UTR variants would provide a complementary test of initiation-elongation coupling, and we have now added this as a future direction in the Discussion (L417-420).

      (4) The mathematical model and parameter inference routines are central to the conclusions of this manuscript. In order to support reproducibility, the computational code should be made available and well-documented, with a requirements file indicating the dependencies and their versions. 

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Reviewer #3 (Public review):

      Disclaimer:

      My expertise is in live single-molecule imaging of RNA and transcription, as well as associated data analysis and modeling. While this aligns well with the technical aspects of the manuscript, my background in translation is more limited, and I am not best positioned to assess the novelty of the biological conclusions.

      Summary:

      This study combines live-cell imaging of nascent proteins on single mRNAs with time-series analysis to investigate the kinetics of mRNA translation.

      The authors (i) used a calibration method for estimating absolute ribosome counts, and (ii) developed a new Bayesian approach to infer ribosome counts over time from run-off experiments, enabling estimation of elongation rates and ribosome density across conditions.

      They report (i) translational bursting at the single-mRNA level, (ii) low ribosome density (~10% occupancy

      {plus minus} a few percents), (iii) that ribosome density is minimally affected by perturbations of elongation (using a drug and/or different coding sequences in the reporter), suggesting a homeostatic mechanism potentially involving a feedback of elongation onto initiation, although (iv) this coupling breaks down upon knockout of elongation factor eIF5A.

      Strengths:

      (1) The manuscript is well written, and the conclusions are, in general, appropriately cautious (besides the few improvements I suggest below).

      (2) The time-series inference method is interesting and promising for broader applications. 

      (3) Simulations provide convincing support for the modeling (though some improvements are possible). 

      (4) The reported homeostatic effect on ribosome density is surprising and carefully validated with multiple perturbations.

      (5) Imaging quality and corrections (e.g., flat-fielding, laser power measurements) are robust.

      (6) Mathematical modeling is clearly described and precise; a few clarifications could improve it further.

      We thank the reviewer for recognizing the novelty of the approach and its rigour, and for providing suggestions to improve it further.

      Weaknesses:

      (1) The absolute quantification of ribosome numbers (via the measurement of $i_{MP}$ ) should be improved.This only affects the finding that ribosome density is low, not that it appears to be under homeostatic control. However, if $i_{MP}$ turns out to be substantially overestimated (hence ribosome density underestimated), then "ribosomes queuing up to the initiation site and physically blocking initiation" could become a relevant hypothesis. In my detailed recommendations to the authors, I list points that need clarification in their quantifications and suggest an independent validation experiment (measuring the intensity of an object with a known number of GFP molecules, e.g., MS2-GFP MS2-GFP-labeled RNAs, or individual GEMs).

      We agree with the reviewer that the estimation of the number of ribosomes is central to our finding that translation happens at low density on our reporters. This result derives from our measurement of the intensity of one mature protein (i<sub>MP</sub>), that we have achieved by using a SunTag reporter with a RH1 domain in the C terminus of the mature protein, allowing us to stabilise mature proteins via actin-tethering. In addition, as suggested by the reviewer, we already validated this result with an independent estimate of the mature protein intensity (Figure 5 - figure supplement 2B), which was obtained by adding the mature protein intensity directly as a free parameter of the HMM. The inferred value of mature protein intensity for each construct (10-15 a.u) was remarkably close to the experimental calibration result (14 ± 2 a.u.). Therefore, we have confidence that our absolute quantification of ribosome numbers is accurate.

      (2) The proposed initiation-elongation coupling is plausible, but alternative explanations, such as changes in abortive elongation frequency, should be considered more carefully. The authors mention this possibility, but should test or rule it out quantitatively. 

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new perturbation experiments is beyond the scope of the present work.

      (3) The observation of translational bursting is presented as novel, but similar findings were reported by Livingston et al. (2023) using a similar SunTag-MS2 system. This prior work should be acknowledged, and the added value of the current approach clarified.

      We did cite Livingston et al. (2023) in several places, but we recognized that we could add a few citations in key places, to make clear that the observation of bursting is not novel but is in agreement with previous results. We now did so in the Results and Discussion sections.

      (4) It is unclear what the single-mRNA nature of the inference method is bringing since it is only used here to report _average_ ribosome elongation rate and density (averaged across mRNAs and across time during the run-off experiments - although the method, in principle, has the power to resolve these two aspects).

      While decoding individual traces, our model infers shared (population-level) rates. Inferring transcript-specific parameters would be more informative, but it is highly challenging due to the uncertainty on the initial ribosome distribution on single transcripts. Pooling multiple transcripts together allows us to use some assumptions on the initial distribution and infer average elongation and initiation-rate parameters, while revealing substantial mRNA-to-mRNA variability in the posterior decoding (e.g. Figure 3 - figure Supplement 2C). Indeed, the inference still informs on the single-trace run-off time distribution (Figure 3 A) and the waiting time between termination events (Figure 3 - figure supplement 2C), suggesting the presence of stalling and bursting. In addition, the transcript-to-transcript heterogeneity is likely accounted for by our model better than previous methods (linear fit of the average run-off intensity), as suggested by their comparison (Figure 3 - figure supplement 2 A). In the future the model could be refined by introducing transcript-specific parameters, possibly in a hierarchical way, alongside shared parameters.

      (5) I did not find any statement about data availability. The data should be made available. Their absence limits the ability to fully assess and reproduce the findings.

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Major Comments:

      (1) Lack of Explicit Bursting Model

      Although translation "bursts" are observed, the current framework does not explicitly model initiation as a stochastic ON/OFF process. This limits insight into regulatory mechanisms controlling burst frequency or duration. The authors should either incorporate a two-state/more-state (bursting) model of initiation or perform statistical analysis (e.g., dwell-time distributions) to quantify bursting dynamics. They should clarify how bursting influences the interpretation of initiation rate estimates.

      We agree with the reviewer that an explicit bursting model (e.g., a two-state telegraph model) would be the ideal theoretical framework. However, integrating such a model into the TASEP-HMM inference framework is computationally intensive and complex. As a robust first step, we have opted to quantify bursting empirically based on the decoded single-mRNA traces. As shown in Figure 1G (control) and Figure 4G (perturbed conditions), we explicitly measured the duration of "ON" (translated) and "OFF" (untranslated) periods. This statistical analysis provides a quantitative description of the bursting dynamics without relying on the specific assumptions of a telegraph model. We have clarified this in the text (L123-125) and, as suggested, added a discussion (L415-417) on the potential extensions of the model to include explicit switching kinetics in the Outlook section.

      (2) Assumption of Uniform Elongation Rates

      The model assumes homogeneous elongation across coding sequences, which may not hold for stalling-prone inserts (e.g., PPG). This simplification could bias inference, particularly in cases of sequence-specific pausing. Adding simulations or sensitivity analysis to assess how non-uniform elongation affects the accuracy of inferred parameters. The authors should explicitly discuss how ribosome stalling, collisions, or heterogeneity might skew model outputs (see point 4).

      A strong stalling sequence that affects all ribosomes equally should not deteriorate the inference of the initiation rate, provided that the low-density assumption holds. The scenario where stalling events lead to higher density, and thus increased ribosome-ribosome interactions, is comparable to the conditions explored in Figure 2E. In those simulations, we tested the inference on data generated with varying initiation and elongation rates, resulting in ribosome densities ranging from low to high. We demonstrated that the inference remains robust at low ribosome densities (<10%). At higher densities, the accuracy of the initiation rate estimate decreases, whereas the elongation rate estimate remains comparatively robust. Additionally, the model tends to overestimate ribosome density under high-density conditions, likely because it neglects ribosome interference at the initiation site (Figure 2 figure supplement 2C). We agree that a deeper investigation into the consequences of stochastic stalling and bursting would be beneficial, and we have explicitly acknowledged this in the Limitations section.

      (3) Interpretation of eIF5A Knockout Phenotype

      The observation that eIF5A KO reduces initiation more than elongation, leading to decreased ribosome density, is biologically intriguing. However, the explanation invoking altered RQC kinetics is speculative and not directly tested. The authors should consider validating the RQC hypothesis by monitoring reporter mRNA stability, ribosome collision markers, or translation termination intermediates.

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new experiments is beyond the scope of the present work.

      (4) To strengthen the manuscript, the authors should incorporate insights from three studies.

      - Livingston et al. (PMC10330622) found that translation occurs in bursts, influenced by mRNA features and initiation factors, supporting the coupling of initiation and elongation.

      - Madern et al. (PMID: 39892379) demonstrated that ribosome cooperativity enhances translational efficiency, highlighting coordinated ribosome behavior.

      - Dufourt et al. (PMID: 33927056) observed that high initiation rates correlate with high elongation rates, suggesting a conserved mechanism across cell cultures and organisms.

      Integrating these studies could enrich the manuscript's interpretation and stimulate new avenues of thought.

      We thank the reviewer for the valuable comment. We added citations of Livingston et al. in the context of translational bursting. We already cited Madern et al. in multiple places and, although its observations of ribosome cooperativity are very compelling, they cannot be linked with our observations of a feedback between initiation and elongation, and it would be very challenging to see a similar effect on our reporters. This is why we did not expressly discuss cooperativity. We also integrated Dufourt et al. in the Discussion about the possibility of designing genetically-encoded reporter. We also added a sentence about the possibility of using an ER-specific SunTag reporter, as done recently in Choi et al., Nature (2025) (https://doi.org/10.1038/s41586-025-09718-0).

      Minor Comments:

      (1) Use consistent naming for SunTag reporters (e.g., "PPG" vs "proline-rich") throughout.

      Thank you for the comment. However, the term proline-rich always appears together with PPG, so we believe that the naming is clear and consistent.

      (2) Consider a schematic overview of the experimental design and modeling pipeline for accessibility.

      Thank you for the suggestion. We consider that experimental design and modeling is now sufficiently clearly described and does not justify an additional scheme. 

      (3) Clarify how incomplete run-off traces are handled in the HMM inference.

      Incomplete run-off traces are treated identically to complete traces in our HMM inference. This is possible because our model relies on the probability of transitions occurring per time step to infer rates. It does not require observing the final "empty" state to estimate the kinetic parameters ɑ and λ. The loss of signal (e.g., mRNA moving out of the focal volume or photobleaching) does not invalidate the kinetic information contained in the portion of the trace that was observed. We have clarified this in the Methods section.

      Reviewer #2 (Recommendations for the authors):

      (1) Reproducibility:

      (1.1) The authors should use a GitHub repository with a timestamp for the release version.

      The code is available on GitHub (https://github.com/naef-lab/suntag-analysis).

      (1.2) Make raw images and data available in a figure repository like Figshare.

      The raw images (.ome.tif) are now available on Zenodo (https://zenodo.org/records/17669332).

      (2) Paper reorganization and expansion of the intensity and ribosome quantification:

      (2.1) Given the relevance of the initiation and elongation rates for the conclusions of this study, and the fact that the authors inferred these rates from the spot intensities. I recommend that the authors move Figure 1 Supplement 2 to the main text and expand the description of the process to relate spot intensity and number of ribosomes. Please also expand the figure caption for this image.

      We agree with the importance of this validation. We have expanded the description of the calibration experiment in the main text and in the figure caption.

      (2.2) I suggest the authors explicitly mention the use of HMM in the abstract.

      We have now explicitly mentioned the TASEP-based HMM in the abstract.

      (2.3) In line 492, please add the frame rate used to acquire the images for the run-off assays.

      We have added the specific frame rate (one frame every 20 seconds) to the relevant section.

      (3) Figures and captions:

      (3.1) Figure 1, Supplement 2. Please add a description of the colors used in plots B, C. 

      We have expanded the caption and added the color description.

      (3.2) In the Figure 2 caption. It is not clear what the authors mean by "traceseLife". Please ensure it is not a typo.

      Thank you for spotting this. We have corrected the typo.

      (3.3) Figure 1 A, in the cartoon N(alpha)->N-1, shouldn't the transition also depend on lambda?

      The transition probability was explicitly derived in the “Bayesian modeling of run-off traces” section (Eqs. 17-18), and does not depend on λ, but only on the initiation rate under the low-density assumption.

      (3.4) Figure 3, Supplement 2. "presence of bursting and stalling.." has a typo.

      Corrected.

      (3.5) Figure 5, panel C, the y-axis label should be "run-off time (min)."

      Corrected.

      (3.6) For most figures, add significance bars.

      (3.7) In the figure captions, please add the total number of cells used for each condition.

      We have systematically indicated the number of traces (n<sub>t</sub>) and the number of independent experiments (n<sub>e</sub>) in the captions in this format (n<sub>t</sub>, n<sub>e</sub>).

      (4) Mathematical Methods:

      We greatly thank the reviewer for their detailed attention to the mathematical notation. We have addressed all points below.

      (4.1) In lines 555, Materials and Methods, subsection, Quantification of Intensity Traces, multiple equations are not numbered. For example, after Equation (4), no numbers are provided for the rest of the equations. Please keep consistency throughout the whole document.

      We have ensured that all equations are now consistently numbered throughout the document.

      (4.2) In line 588, the authors mention "$X$ is a standard normal random variable with mean $\mu$ and standard deviation $s_0$". Please ensure this is correct. A standard normal random variable has a 0 mean and std 1. 

      Thank you for the suggestion, we have corrected the text (L678).

      (4.3) Line 546, Equation 2. The authors use mu(x,y) to describe a 2d Gaussian function. But later in line 587, the authors reuse the same variable name in equation 5 to redefine the intensity as mu = b_0 + I.

      We have renamed the 2D Gaussian function to \mu_{2D}(x,y) in the spot tracking section

      (4.4) For the complete document, it could be beneficial to the reader if the authors expand the definition of the relationship between the signal "y" and the spot intensity "I". Please note how the paragraph in lines 582-587 does not properly introduce "y".

      We have added an explicit definition of y and its relationship to the underlying spot intensity I in the text to improve readability and clarity.

      (4.5) Please ensure consistency in variable names. For example, "I" is used in line 587 for the experimental spot intensity, then line 763 redefines I(t) as the total intensity obtained from the TASEP model; please use "I_sim(t)" for simulated intensities. Please note that reusing the variable "I" for different contexts makes it hard for the reader to follow the text. 

      We agree that this was confusing. We have implemented the suggestion and now distinguish simulated intensities using the notation I<sub>S</sub> .

      (4.6) Line 555 "The prior on the total intensity I is an "uninformative" prior" I ~ half_normal(1000). Please ensure it is not "I_0 ~ half_normal(1000)."? 

      We confirm that “I” is the correct variable representing the total intensity in this context; we do not use an “I<sub>0</sub>” variable here.

      (4.7) In lines 595, equation 6. Ensure that the equation is correct. Shouldn't it be: s_0^2 = ln ( 1 + (sigma_meas^2 / ⟨y⟩^2) )? Please ensure that this is correct and it is not affecting the calculated values given in lines 598.

      Thank you for catching this typo. We have corrected the equation in the manuscript. We confirm that the calculations performed in the code used the correct formula, so the reported values remain unchanged.

      (4.8) In line 597, "the mean intensity square ^2". Please ensure it is not "the square of the temporal mean intensity."

      We have corrected the text to "the square of the temporal mean intensity."

      (4.9) In lines 602-619, Bayesian modeling of run-off traces, please ensure to introduce the constant "\ell". Used to define the ribosomal footprint?

      We have added the explicit definition of 𝓁 as the ribosome footprint size (length of transcript occupied by one ribosome) in the "Bayesian modeling of run-off traces" section.

      (4.10) Line 687 has a minor typo "[...] ribosome distribution.. Then, [...]"

      We have corrected the punctuation.

      (4.11) In line 678, Equation 19 introduces the constant "L_S", Please ensure that it is defined in the text.

      We have added the explicit definition of L<sub>S</sub> (the length of the SunTag) to the text surrounding Equation 19.

      (4.12) In line 695, Equation 22, please consider using a subscript to differentiate the variance due to ribosome configuration. For example, instead of "sigma (...)^2" use something like "sigma_c ^2 (...)". Ensure that this change is correctly applied to Equation 24 and all other affected equations.

      Thank you, we have implemented the suggestions.

      (4.13) In line 696, please double-check equations 26 and 27. Specifically, the denominator ^2. Given the previous text, it is hard to follow the meaning of this variable. 

      We have revised the notation in Equations 26 and 27 to ensure the denominator is consistent with the definitions provided in the text.

      (4.14) In lines 726, the authors mention "[...], but for the purposes of this dissertation [...]", it should be "[...], but for the purposes of this study [...]"

      Thank you for spotting this. We have replaced "dissertation" with "study."

      (4.15) Equations 5, 28, 37, and the unnumbered equation between Equations 16 and 17 are similar, but in some, "y" does not explicitly depend on time. Please ensure this is correct. 

      We have verified these equations and believe they are correct.

      (4.16) Please review the complete document and ensure that variables and constants used in the equations are defined in the text. Please ensure that the same variable names are not reused for different concepts. To improve readability and flow in the text, please review the complete Materials and Methods sections and evaluate if the modeling section can be written more clearly and concisely. For example, Equation 28 is repeated in the text.

      We have performed a comprehensive review of the Materials and Methods section. To improve conciseness and flow, we have merged the subsection “Observation model and estimation of observation parameters” with the “Bayesian modeling of run-off traces” section. This allowed us to remove redundant definitions and repeated equations (such as the previous Equation 28). We have also checked that all variables and constants are defined upon first use and that variable names remain consistent throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Data Presentation

      (1.1) In main Figures 1D and 4E, the traces appear to show frequent on-off-on transitions ("bursting"), but in supplementary figures (1-S1A and 4-S1A), this behavior is seen in only ~8 of 54 traces. Are the main figure examples truly representative?

      We acknowledge the reviewer's point. In Figure 1D, we selected some of the longest and most illustrative traces to highlight the bursting dynamics. We agree that the term "representative" might be misleading if interpreted as "average." We have updated the text to state "we show bursting traces" to more accurately reflect the selection.

      (1.2) There are 8 videos, but I could not identify which is which.

      Thank you for pointing this out. We have renamed the video files to clearly correspond to the figures and conditions they represent.

      (2) Data Availability:

      As noted above, the data should be shared. This is in accordance with eLife's policy: "Authors must make all original data used to support the claims of the paper, or that are required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). [...] eLife considers works to be published when they are posted as preprints, and expects preprints we review to meet the standards outlined here." Access to the time traces would have been helpful for reviewers.

      We have now added the Github link for the code (https://github.com/naef-lab/suntag-analysis) and deposited the raw data (.ome.tif files) on Zenodo (10.5281/zenodo.17669332).

      (3) Model Assumptions:

      (3.1) The broad range of run-off times (Figure 3A) suggests stalling, which may be incompatible with the 'low-density' assumption used on the TASEP model, which essentially assumes that ribosomes do not bump into each other. This could impact the validity of the assumptions that ribosomes behave independently, elongate at constant speed (necessary for the continuum-limit approximation), and that the rate-limiting step is the initiation. How robust are the inferences to this assumption?

      We agree that the deviation of waiting times from an exponential distribution (Figure 3 - figure supplement 2C) suggests the presence of stalling, which challenges the strict low-density assumption and constant elongation speed. We explicitly explored the robustness of our model to higher ribosome densities in simulations. As shown in Figure 2 - figure supplement 2, while the model accuracy for single parameters deteriorates at very high densities (overestimating density due to neglected interference), it remains robust for estimating global rates in the regime relevant to our data. We have expanded the discussion on the limitations of the low density and homogeneous elongation rate assumptions in the text (L404-408).

      (3.2) Since all constructs share the same SunTag region, elongation rates should be identical there and diverge only in the variable region. This would affect $\gamma (t)$ and hence possibly affect the results. A brief discussion would be helpful.

      This is a valid point. Currently, our model infers a single average elongation rate that effectively averages the behavior over the SunTag and the variable CDS regions. Modeling distinct rates for these regions would be a valuable extension but adds significant complexity. While our current "effective rate" approach might underestimate the magnitude of differences between reporters, it captures the global kinetic trend. We have added a brief discussion acknowledging this simplification (L408-412).

      (3.3) A similar point applies to the Gillespie simulations: modeling the SunTag region with a shared elongation rate would be more accurate.

      We agree. Simulating distinct rates for the SunTag and CDS would increase realism, though our current homogeneous simulations serve primarily to benchmark the inference framework itself. We have noted this as a potential future improvement (L413-414).

      (3.4) Equation (13) assumes that switching between bursting and non-bursting states is much slower than the elongation time. First, this should be made explicit. Second, this is not quite true (~5 min elongation time on Figure 3-s2A vs ~5-15min switching times on Figure 1). It would be useful to show the intensity distribution at t=0 and compare it to the expected mixture distribution (i.e., a Poisson distribution + some extra 'N=0' cells). 

      We thank the reviewer for this insightful comment. We have added a sentence to the text explicitly stating the assumption that switching dynamics are slower than the translation time. While the timescales are indeed closer than ideal (5 min vs. 5-15 min), this assumption allows for a tractable approximation of the initial conditions for the run-off inference. Comparing the intensity distribution at t=0 to a zero-inflated Poisson distribution is an excellent suggestion for validation, which we will consider for future iterations of the model.

      (4) Microscopy Quantifications:

      (4.1) Figure 1-S2A shows variable scFv-GFP expression across cells. Were cells selected for uniform expression in the analysis? Or is the SunTag assumed saturated? which would then need to be demonstrated. 

      All cell lines used are monoclonal, and cells were selected via FACS for consistent average cytoplasmic GFP signal. We assume the SunTag is saturated based on the established characterization of the system by Tanenbaum et al. (2014), where the high affinity of the scFv-GFP ensures saturation at expression levels similar to ours.

      (4.2) As translation proceeds, free scFv-GFP may become limiting due to the accumulation of mature SunTag-containing proteins. This would be difficult to detect (since mature proteins stay in the cytoplasm) and could affect intensity measurements (newly synthesized SunTag proteins getting dimmer over time).

      This effect can occur with very long induction times. To mitigate this, we optimized the Doxycycline (Dox) incubation time for our harringtonine experiments to prevent excessive accumulation of mature protein. We also monitor the cytoplasmic background for granularity, which would indicate aggregation or accumulation.

      (4.3) The statements "for some traces, the mRNA signal was lost before the run-off completion" (line 195) and "we observed relatively consistent fractions of translated transcripts and trace duration distributions across reporters" (line 340) should be supported by a supplementary figure.

      The first statement is supported by Figure 2 - figure supplement 1, which shows representative run-off traces for all constructs, including incomplete ones.

      The second statement regarding consistency is supported by the quantitative data in Figure 1E and G, which summarize the fraction of translated transcripts and trace durations across conditions.

      (4.4) Measurements of single mature protein intensity $i_{MP}$:

      (4.4.1) Since puromycin is used to disassemble elongating ribosomes, calibration may be biased by incomplete translation products (likely a substantial fraction, since the Dox induction is only 20min and RNAs need several minutes to be transcribed, exported, and then fully translated).

      As mentioned in the “Live-cell imaging” paragraph, the imaging takes place 40 min after the end of Dox incubation. This provides ample time for mRNA export and full translation of the synthesized proteins. Consequently, the fraction of incomplete products generated by the final puromycin addition is negligible compared to the pool of fully synthesized mature proteins accumulated during the preceding hour.

      (4.4.2) Line 519: "The intensity of each spot is averaged over the 100 frames". Do I understand correctly that you are looking at immobile proteins? What immobilizes these proteins? Are these small aggregates? It would be surprising that these aggregates have really only 1, 2, or 3 proteins, as suggested by Figure 1-S2A.

      We are visualizing mature proteins that are specifically tethered to the actin cytoskeleton. This is achieved using a reporter where the RH1 domain is fused directly to the C-terminus of the Renilla protein (SunTag-Renilla-RH1). The RH1 domain recruits the endogenous Myosin Va motor, which anchors the protein to actin filaments, rendering it immobile. Since each Myosin Va motor interacts with one RH1 domain (and thus one mature protein), the resulting spots represent individual immobilized proteins rather than aggregates. We have now revised the text and Methods section to make this calibration strategy and the construct design clearer (L130-140).

      (4.4.3) Estimating the average intensity $i_{MP}$ of single proteins all resides in the seeing discrete modes in the histogram of Figure 1-S2B, which is not very convincing. A complementary experiment, measuring *on the same microscope* the intensity of an object with a known number of GFP molecules (e.g., MS2-GFP labeled RNAs, or individual GEMs https://doi.org/10.1016/j.cell.2018.05.042 (only requiring a single transfection)) would be reassuring to convince the reader that we are not off by an order of magnitude.

      While a complementary calibration experiment would be valuable, we believe our current estimate is robust because it is independently validated by our model. When we inferred i<sub>MP</sub> as a free parameter in the HMM (Figure 5 - figure supplement 2B), the resulting value (10-15 a.u.) was remarkably consistent with our experimental calibration (14 ± 2 a.u.). We have clarified this independent validation in the text to strengthen the confidence in our quantification (L264-272).

      (4.4.4) Further on the histogram in Figure 1-S2B:

      - The gap between the first two modes is unexpectedly sharp. Can you double-check? It means that we have a completely empty bin between two of the most populated bins.

      We have double-checked the data; the plot is correct, though the sharp gap is likely due to the small sample size (n=29).

      - I am surprised not to see 3 modes or more, given that panel A shows three levels of intensity (the three colors of the arrows).

      As noted below, brighter foci exist but fall outside the displayed range of the histogram.

      - It is unclear what the statistical test is and what it is supposed to demonstrate.

      The Student's t-test compares the means of the two identified populations to confirm they are statistically distinct intensity groups.

      - I count n = 29, not 31. (The sample is small enough that the bars of the histogram show clear discrete heights, proportional to 1, 2, 3, 4, and 5 --adding up all the counts, I get 29). Is there a mistake somewhere? Or are some points falling outside of the displayed x-range?

      You are correct. Two brighter data points fell outside the displayed range. The total number of foci in the histogram is 29. We have corrected the figure caption and the text accordingly.

      (5) Miscellaneous Points: 

      (5.1) Panel B in Figure 2-s1 appears to be missing.

      The figure contains only one panel.

      (5.2) In Equation (7), $l$ is not defined (presumably ribosome footprint length?). Instead, $J$ is defined right after eq (7), as if it were used in this equation.

      Thank you for pointing this out, we have corrected it.

      (5.3) Line 703, did you mean to write something else than "Equation 26" (since equation 26 is defined after)?

      Yes, this was a typo. We have corrected the cross-reference.