10,000 Matching Annotations
  1. Jan 2026
    1. Reviewer #3 (Public review):

      Summary:

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) "gene expression" patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths:

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

    2. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The well-performed and clearly presented simulations will be of interest to those studying gene expression in the context of chromatin. While the study is generally solid, it could benefit from a more direct comparison with existing experimental data sets as well as further discussion of the limits of the underlying model assumptions.

      We thank the editors for their overall positive assessment. In response to the Referees’ comments, we have addressed all technical points, including a more detailed explanation of the methodology used to extract gene transcription from our simulations and its analogy with real gene transcription. Regarding the potential comparison with experimental data and our mixing–demixing transition, we have added new sections discussing the current state of the art in relevant experiments. We also clarify the present limitations that prevent direct comparisons, which we hope can be overcome with future experiments using the emerging techniques.

      Reviewer #1 (Public Review):

      This manuscript discusses from a theory point of view the mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription. I would only advise the authors to polish and shorten their text to better highlight their key findings and make it more accessible to the reader.

      We thank the Referee for carefully reading our manuscript and recognizing its scientific value. As suggested, we tried to better highlight our key findings and make the text more accessible while addressing also the comments from the other Referees.

      Reviewer #2 (Public Review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript ”Cluster size determines morphology of transcription factories in human cells”.

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      We thank the Reviewer for their positive assessment of the soundness of our work and its contribution to the field. We have added a paragraph to the Conclusions highlighting the current state of experimental techniques and outlining near-term experiments that could be extended to test our predictions. We also emphasise that our analysis builds on state-of-the-art polymer models of chromatin and on quantitative experimental datasets, which we used both to build the model construction and to validate its outcomes (gene activity). We hope this strengthened link to experiment will catalyse further studies in the field.

      Major points:

      (1) My first point concerns terminology.The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      We agree with the Referee that the title could be misleading. In our study we characterized clusters size, that is a morphological descriptor, and cluster composition that isn’t morphology per se but used in the community in a broader sense. Nevertheless to strength the message we have changed the title in: “Cluster size determines internal structure of transcription factories in human cells”

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequencespecific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a ”small” transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      This is a good point. However in our case we can either look at clustering transcription factors or transcription units. In an experimental situation, transcription units could be “coloured”, or assigned different types, by looking at different cell types, so that they can be classified as housekeeping, or cell-type independent, or cell-type specific. This is similar to how DHS can be clustered. In this way the mixing or demixing state can be identified by looking at the type of transcription unit, removing any ambiguity due to the fact that the same protein may participate in different TF complexes..

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point (iii) are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units’ activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo’s group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors’ predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors’ key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      - RNA promotes the formation of spatial compartments in the nucleus https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      - Complex multi-enhancer contacts captured by genome architecture mapping https://www.nature.com/articles/nature21411

      - Cell-type specialization is encoded by specific chromatin topologies https://www.nature.com/articles/s41586-021-04081-2

      - Super-enhancer interactomes from single cells link clustering and transcription https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mirx have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      We appreciate the Reviewer’s insightful suggestion that a comparison between our simulation results and experimental data would strengthen the robustness of our model. In response, we have thoroughly revised the literature on multi-way chromatin contacts, with particular attention to SPRITE and GAM techniques. However, we found that the currently available experimental datasets lack sufficient statistical power to provide a definitive test of our simulation predictions, as detailed below.

      As noted by the Reviewer, SPRITE experiments offer valuable information on the composition of highorder chromatin clusters (k-mers) that involve multiple genomic loci. A closer examination of the SPRITE data (e.g., Supplementary Material from Ref. [1]) reveals that the majority of reported statistics correspond to 3-mers (three-way contacts), while data on larger clusters (e.g., 8-mers, 9-mers, or greater) are sparse. This limitation hinders our ability to test the demixing-mixing transition predicted in our simulations, which occurs for cluster sizes exceeding 10.

      Moreover, the composition of the k-mers identified by SPRITE predominantly involves genomic regions encoding functional RNAs—such as ITS1 and ITS2 (involved in rRNA synthesis) and U3 (encoding small nucleolar RNA)—which largely correspond to housekeeping genes. Conversely, there is little to no data available for protein-coding genes. This restricts direct comparison to our simulations, where the demixing-mixing transition depends critically on the interplay between housekeeping and tissue-specific genes.

      Similarly, while GAM experiments are capable of detecting multi-way chromatin contacts, the currently available datasets primarily report three-way interactions [2,3].

      In summary, due to the limited statistical data on higher-order chromatin clusters [4], a quantitative comparison between our simulation results and experimental observations is not currently feasible. Nevertheless, we have now briefly discussed the experimental techniques for detecting multi-way interactions in the revised manuscript to reflect the current state of the field, mentioning most of the references that the Reviewer suggested.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and ”stressed cells”. This distinction was initially established by Ibrahim Cisse’s lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      - Mediator and RNA polymerase II clusters associate in transcription-dependent condensates https://www.science.org/doi/10.1126/science.aar4199

      - Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering https://www.science.org/doi/10.1126/sciadv.aay6515

      - RNA polymerase II clusters form in line with surface condensation on regulatory chromatin https://www.embopress.org/doi/full/10.15252/msb.202110272

      - If ”morphology” should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper: Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin https://www.science.org/doi/10.1126/science.ade5308

      We thank the Reviewer for pointing out the discussion about small and large clusters observed in stressed cells. Our study aims to provide a broader mechanistic explanation on the formation of TF mixed and demixed clusters depending on their size. However, to avoid to generate confusion between our terminology and the classification that is already used for transcription factories in stem and stressed cells, we have now added some comments and references in the revised text.

      (5) The statement scripts are available upon request is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

      We fully agree with the Reviewer. We have now included in the main text a link to an external repository containing all the codes required to reproduce and analyze the simulations.

      Recommendations for the authors:

      Minor and technical points

      (6) Red, green, and yellow (mix of green and red) is a particularly bad choice of color code, seeing that red-green blindness is the most common color blindness. I recommend to change the color code.

      We appreciate the Reviewer’s thoughtful comment regarding color accessibility. We fully agree that red–green combinations can pose challenges for color-blind readers. In our figures, however, we chose the red–green–yellow color scheme deliberately because it provides strong contrast and intuitive representation for different TF/TU types. To ensure accessibility, we optimized brightness and saturation within red-green schemes and we carefully verified that the chosen hues are distinguishable under the most common forms of color vision deficiency, i.e. trichromatic color blindness, using color-blindness simulation tools (e.g., Coblis).

      How is the dispersing effect of transcriptional activation and ongoing transcription accounted for or expected to affect the model outcome? This affects both transcriptional clusters (they tend to disintegrate upon transcriptional activation) as well as the large scale organization, where dispersal by transcription is also known.

      We thank the Reviewer for this very insightful question. The current versions of both our toy model and the more complex HiP-HoP model do not incorporate the effects of RNA Polymerase elongation. Our primary goal was to develop a minimalisitc framework that focuses on investigating TF clusters formation and their composition. Nevertheless, we find that this straightforward approach provides a good agreement between simulations and Hi-C and GRO-seq experiments, lending confidence to the reliability of our results concerning TF cluster composition.

      We fully agree, however, that the effects of transcription elongation are an interesting topic for further exploration. For example, modeling RNA Polymerases as active motors that continually drive the system out of equilibrium could influence the chromatin polymer conformation and the structure of TF clusters. Additionally, investigating how interactions between RNA molecules and nuclear proteins, such as SAF-A, might lead to significant changes in 3D chromatin organization and, consequently, transcription [5], is also in intriguing prospect. Although we do not believe that the main findings of our study, particularly regarding cluster composition and mixed-demixed transition, would be impacted by transcription elongation effects, we recognize the importance of this aspect. As such, we have now included some comments in the Conclusions section of the revised manuscript.

      “and make the reasonable assumption that a TU bead is transcribed if it lies within 2.25 diameters (2.25σ) of a complex of the same colour; then, the transcriptional activity of each TU is given by the fraction of time that the TU and a TF:pol lie close together.” How is that justified? I do not see how this is reasonable or not, if you make that statement you must back it up.

      As pointed out by the Referee, we consider a TU to be active if at least one TF is within a distance 2.25σ from that TU. This threshold is a slightly larger than the TU-TF interaction cutoff distance, r<sub>c</sub> \= 1.8σ between TFs and TUs. The rationale for this choice is to ensure that, in the presence of a TU cluster surrounded by TFs, TUs that are not directly in contact with a TF are still considered active. Nonetheless, we find that using slightly different thresholds, such as 1.8σ or 1.1σ, leads to comparable results, as shown in Fig. S11, demonstrating the robustness of our analysis.

      Clearly, close proximity in 1D genomic space favours formation of similarly-coloured clusters. This is not surprising, it is what you built the model to do. Should not be presented as a new insight, but rather as a check that the model does what is expected.

      We believed that this sentence already conveyed that the formation of single-color clusters driven by 1D genomic proximity is not a surprising outcome. However, we have now slightly rephrased it to better emphasize that this is not a novel insight.

      That said, we would like to highlight that while 1D genomic proximity facilitates the formation of clusters of the same color, the unmixed-to-mixed transition in cluster composition is not easily predictable solely from the TU color pattern. Furthermore, in simulations of real chromosomes, where TU patterns are dictated by epigenetic marks, the complexity of these patterns makes it challenging—if not impossible—to predict cluster composition based solely on the input data of our model.

      “…how closely transcriptional activities of different TUs correlate…” Please briefly state over what variable the correlation is carried out, is it cross correlation of transcription activity time courses over time? Would be nice to state here directly in the main text to make it easier for the reader.

      We have now included a brief description in the revised manuscript explaining how the transcriptional correlations were evaluated and how the correlation matrix was constructed.

      “The second concerns how expression quantitative trait loci (eQTLs) work. Current models see them doing so post-transcriptionally in highly-convoluted ways [11, 55], but we have argued that any TU can act as an eQTL directly at the transcriptional level [11].” This text does not actually explain what eQTLs do. I think it should, in concise words.

      We agree with the Referee’s suggestion. We have revised the sentence accordingly and now provide a clear explanation of eQTLs upon their first mention. The revised paragraph now reads as follows:

      “The second concerns how expression quantitative trait loci (eQTLs)—genomic regions that are statistically associated with variation in gene expression levels—function. While current models often attribute their effects to post-transcriptional regulation through complex mechanisms [6,7], we have previously argued that any transcriptional unit (TU) can act as an eQTL by directly influencing gene expression at the transcriptional level [7]. Here, we observe individual TUs up-regulating or down-regulating the activity of others TUs – hallmark behaviors of eQTLs that can give rise to genetic effects such as “transgressive segregation” [8]. This phenomenon refers to cases in which alleles exhibit significantly higher or lower expression of a target gene, and can be, for instance, caused by the creation of a non-parental allele with a specific combination of QTLs with opposing effects on the target gene.”

      “In the string with 4 mutations, a yellow cluster is never seen; instead, different red clusters appear and disappear (Fig. 2Eii)…” How should it be seen? You mutated away most of the yellow beads. I think the kymograph is more informative about the general model dynamics, not the effects of mutations. Might be more appropriate to place a kymograph in Figure 1.

      We agree with the Referee that the kymograph is the most appropriate graphical representation for capturing the effects of mutations. Panel 2E already refers to the standard case shown in Figure 1. We have now clarified this both in the caption and in the main text. In addition, we have rephrased the sentence—which was indeed misleading—as follows:

      “From the activity profiles in Fig. 2C, we can observe that as the number of mutations increases, the yellow cluster is replaced by a red cluster, with the remaining yellow TUs in the region being expelled (Fig. 2B(ii)). This behavior is reflected in the dynamics, as seen by comparing panels E(i) and E(ii): in the string with four mutations, transcription of the yellow TUs is inhibited in the affected region, while prominent red stripes—corresponding to active, transcribing clusters—emerge (Fig. 2E(ii)).” We hope that the comparison is now immediately clear to the reader.

      “…but this block fragments in the string with 4 mutations…” I don’t know or cannot see what is meant by ”fragmentation” in the correlation matrix.

      With the sentence “this block fragments in the string with 4 mutations” we mean that the majority of the solid red pixels within the black box become light-red or white once the mutations are applied. We have now added a clarification of this point in the revised manuscript.

      “Fig. 3D shows the difference in correlation between the case with reduced yellow TFs and the case displayed in Fig. 1E.” Can you just place two halves of the different matrices to be compared into the same panel? Similar to Fig. S5. Will be much easier to compare.

      We thank the Referee for this suggestion. We tried to implement this modification, and report the modified figure below (Author response image 1). As we can see, in the new figure it is difficult to spot the details we refer to in the main text, therefore we prefer to keep the original version of the figure.

      Author response image 1.

      Heatmap comparing activity correlations of TUs in the random string under normal conditions (top half) and with reduced yellow-TF concentration (bottom half).

      What is the omnigenic model? It is not introduced.

      We thank the Reviewer for highlighting this important point. The omnigenic model, first introduced by Boyle et al in Ref. [6], was proposed to explain how complex traits, including disease risk, are influenced by a vast number of genes. Accordingly to this model, the genetic basis of a trait is not limited to a small set of core genes whose expression is directly related to the trait, but also includes peripheral genes. The latter, although not directly involved in controlling the trait, can influence the expression of core genes through gene regulatory networks, thereby contributing to the overall genetic influence on the trait. We have now added a few lines in the revised manuscript to explain this point.

      “Additionally, blue off-diagonal blocks indicate repeating negative correlations that reflect the period of the 6-pattern.” How does that look in a kymograph? Does this mean the 6 clusters of same color steal the TFs from the other clusters when they form?

      The intuition of the Referee is indeed correct. The finite number of TFs leads to competition among TUs of the same colour, resulting in anticorrelation:when a group of six nearby TUs of a given colour is active, other, more distant TUs of the same colour are not transcribing due to the lack of available TFs. As the Referee suggested,this phenomenon is visible in the kymograph showing TU activity. In Author response image 2, it can be observed that typically there is a single TU cluster for each of the three colours (yellow, green, and red). These clusters can be long-lived (e.g., the yellow cluster at the center of the kymograph) or may destroy during the simulation (e.g., the red cluster at the top of the kymograph, which dissolves at t ∼ 600 × 10<sup>5</sup> τ<sub>B</sub>). In the latter case, TFs of the corresponding colour are released into the system and can bind to a different location, forming a new cluster (as seen with the red cluster forming at the bottom of the kymograph for t > 600 × 10<sup>5</sup> τ<sub>B</sub>). This point is further discussed at the point 2.30 of this Reply where additional graphical material is provided.

      Author response image 2.

      Kymograph showing the TU activity during a typical run in the 6-pattern case. Each row reports the transcriptional state of a TU during one simulation. Black pixels correspond to inactive TUs, red (yellow, green) pixels correspond to active red (yellow, green) TUs.

      “Conversely, negative correlations connect distant TUs, as found in the single-color model…” But at the most distal range, the negative correlation is lost again! Why leave this out? Your correlation curves show the same , equilibration towards no correlation at very long ranges.

      As highlighted in Figure 5Ai, long-range negative correlations (grey segments) predominantly connect distant TUs of the same colour. This is quantified in Figure 5Bi: restricting to same-colour TUs shows that at large genomic separations the correlation is almost entirely negative, with small fluctuations at distances just below 3000 kbp where sampling is sparse; we therefore avoid further interpretation of this regime.

      “These results illustrate how the sequence of TUs on a string can strikingly affect formation of mixed clusters; they also provide an explanation of why activities of human TUs within genomic regions of hundreds of kbp are positively correlated [60].” This is a very nice insight.

      We thank the Reviewer for the very supportive comment.

      “To quantify the extent to which TFs of different colours share clusters, we introduce a demixing coefficient, θ<sub>dem</sub> (defined in Fig. 1).” This is not defined in Fig. 1 or anywhere else here in the main text.

      We thank the Referee for pointing this out. For a given cluster, the demixing coefficient is defined as

      where n is the number of colors, i indexes each color present in the model, and x<sub>i,max</sub> the largest fraction of TFs of the same i-th color in a single TF cluster.

      The demixing coefficient is defined in the Methods section; therefore, we have replaced defined in Fig. 1 with see Methods for definition.

      “Mixing is facilitated by the presence of weakly-binding beads, as replacing them with non-interacting ones increases demixing and reduces long-range negative correlations (Figure S3). Therefore, the sequence of strong and weak binding sites along strings determines the degree of mixing, and the types of small-world network that emerge. If eQTLs also act transcriptionally in the way we suggest [11], we predict that down-regulating eQTLs will lie further away from their targets than up-regulating ones.” Going into these side topics and minke points here is super distracting and waters down the message. Maybe first deal with the main conclusions on mixed vs demixed clusters in dependence on the strong and specific binding site patterns, before dealing with other additional points like the role of weak binding sites.

      Thank you for the suggestion. We now changed the paragraph to highlight the main results. The new paragraph is as follows. “These results on activity correlation and TF cluster composition suggest that, if eQTLs act transcriptionally as expected [7], down-regulating eQTLs are likely to be located further from their target genes than up-regulating ones. In addition, it is important to note that mixing is promoted by the presence of weakly binding beads; replacing these with non-interacting ones leads to increased demixing and a reduction in long-range negative correlations (Figure S3). More generally, our findings indicate that the presence of multiple TF colors offers an effective mechanism to enrich and fine-tune transcriptional regulation.”

      “…provides a powerful pathway to enrich and modulate transcriptional regulation.” Before going into the possible meaning and implications of the results, please discuss the results themselves first.

      See previous point.

      Figure 5B. Does activation typically coincide with spatial compaction of the binding sites into a small space or within the confines of a condensate? My guess would be that colocalization of the other color in a small space is what leads to the mixing effect?

      As the Reviewer correctly noted, the activity of a given TU is indeed influenced by the presence of nearby TUs of the same color, since their proximity facilitates the recruitment of additional TFs and enhances the overall transcriptional activity. In this context, the mixing effect is certainly affected by the 1D arrangement of TUs along the chromatin fiber. As emphasized in the revised manuscript, when domains of same-color TUs are present (as in the 6-pattern string), the degree of demixing is greater compared to the case where TUs of different colors alternate and large domains are absent (as in the 1-pattern string). This difference in the demixing parameter as a function of the 1D TU arrangement is clearly visible in Fig. S2B.

      “…euchromatic regions blue, and heterochromatic ones grey.” Please also explain what these color monomers mean in terms of non specific interactions with the TFs.

      Generally, in our simulation approach we assume euchromatin regions to be more open and accessible to transcription factors, whereas heterochromatin corresponds to more compacted chromatin segments [9]. To reflect this, we introduce weak, non-specific interactions between euchromatin and TFs, while heterochromatin interacts with TFs only thorugh steric effects. To clarify this point, we have now slightly revised the caption of Fig.6.

      “More quantitatively, Spearman’s rank correlation coefficient is 3.66 10<sup>−1</sup>, which compares with 3.24 10<sup>−1</sup> obtained previously using a single-colour model [11].” This comparison does not tell me whether the improvement in model performance justifies an additional model component. There are other, likelihood based approaches to assess whether a model fits better in a relevant extent by adding a free model parameter. Can these be used for a more conclusive comparison? Besides, a correlation of 0.36 does not seem so good?

      We understand the Reviewer’s concern that the observed increase in the activity correlation may not appear to provide strong evidence for the improvement of the newly introduced model. However, within the context of polymer models developed to study realistic gene transcription and chromatin organization, this type of correlation analysis is a widely accepted approach for model validation. Experimental data commonly used for such validation include Hi-C maps, FISH experiments, and GRO-seq data [10,11]. The first two are typically employed to assess how accurately the model reproduces the 3D folding of chromatin; a comparison between experimental and simulated Hi-C maps is provided in the Supplementary Information (Fig. S5), showing a Pearson correlation of 0.7. GRO-seq or RNA-seq data, on the other hand, are used to evaluate the model’s ability to predict gene transcription levels. To date, the highest correlation for transcriptional activity data has been achieved by the HiP-HoP model at a resolution of 1 kbp [10], reporting a Spearman correlation of 0.6. Therefore, the correlation obtained with our 2-color model represents a good level of agreement when compared with the more complex HiP-HoP model. In this context, the observed increase in correlation—from 0.324 to 0.366—can be regarded as a modest yet meaningful improvement.

      “…consequently, use of an additional color provides a statisticallysignificant improvement (p-value < 10<sup>−6</sup>, 2-sided t-test).” I do not follow this argument. Given enough simulation repeats, any improvement, no matter how small, will lead to statistically significant improvements.

      We agree that this sentence could be misleading. We have now rephrased it in a clearer manner specifying that each of the two correlation values is statistically significant alone, while before we were wrongly referring to the significance of the improvement.

      “Additionally, simulated contact maps show a fair agreement with Hi-C data (Figure S5), with a Pearson correlation r ∼ 0.7 (p-value < 10<sup>−6</sup>, 2-sided t-test).” Nice!

      We thank the Reviewer for the positive comment.

      “Because we do not include heterochromatin-binding proteins, we should not however expect a very accurate reproduction of Hi-C maps: we stress that here instead we are interested in active chromatin, transcription and structure only as far as it is linked to transcription.” Then why do you not limit your correlation assessment to only these regions to show that these are very well captured by your model?

      We thank the Reviewer for this insightful comment. Indeed, we could have restricted our investigation to active chromatin regions, as done in our previous works [11,12]. However, our intention in this section of the manuscript was to clarify that the current model is relatively simple and therefore not expected to achieve a very high level of agreement between experimental and simulated Hi-C maps. Another important limitation of the two color model described in the section is the absence of active loop extrusion mediated by SMC proteins, which is known to play a central role in establishing TADs boundaries. Consequently, even if our analysis were limited to active chromatin regions, the agreement with experimental Hi-C maps would still remain lower than that obtained with more comprehensive models, such as HiP-HoP, that we use later in the last section of the paper. We have now added a comment in the revised manuscript explicitly noting the lack of active loop extrusion in our 2-color model.

      “We also measure the average value of the demixing coefficient, θ<sub>dem</sub> (Materials and Methods). If θ<sub>dem</sub> = 1, this means that a cluster contains only TFs of one colour and so is fully demixed; if θ<sub>dem</sub> = 0, the cluster contains a mixture of TFs of all colors in equal number, and so is maximally mixed.” Repetitive.

      We have now rephrased the sentence in a more concise way.

      “…notably, this is similar to the average number of productivelytranscribing pols seen experimentally in a transcription factory [6].” That seems a bit fast and loose. The number of Polymerases can differ depending on state, type of factory, gene etc. and vary between anything from to a few hundreds of Polymerase complexes depending on definition of factory, and what is counted as active. Also, one would think that polymerases only make up a small part of the overall protein pool that constitutes a condensate, so it is unclear whether this is a pertinent estimate.

      Here we refer to the average size of what is normally referred to as a PolII factory, not a generic nuclear condensate. These are the clusters which arise in our simulations. These structures emerge through microphase separation and have been well characterised, for instance see [13] for a recent review. For these structures while there is a distribution the average is well defined and corresponds to a size of about 100 nm, which is very much in line with the size of the clusters we observe, both in terms of 3D diameter and number of participating proteins. Because of the size, the number of active complexes which can contribute cannot be significantly more than ∼ 10. These estimates are, we note, very much in line with super-resolution measurements of SAF-A clusters [14], which are associated with active transcription and hence it is reasonable to assume they colocalise with RNA and polymerase clusters.

      “Conversely, activities of similar TUs lying far from each other on the genetic map are often weakly negatively correlated, as the formation of one cluster sequesters some TFs to reduce the number available to bind elsewhere.” This point is interesting, and I strongly suspect that this indeed happening. But I don’t think it was shown in the analysis of the simulation results in sufficient clarity. We need direct assessment of this sequestration, currently it’s only indirectly inferred.

      Indeed, this is the mechanism underlying the emergence of negative long-range correlations among TU activity values. As the Reviewer correctly pointed out, the competition for a finite number of TFs was only indirectly inferred in the original manuscript. To address this, we have now included a new figure explicitly illustrating this effect. In Fig. S12, we show the kymograph of active TUs (left panel), as in Fig. 2E(i) of the main text, alongside a new kymograph depicting the number of green TFs within a sphere of radius 10σ centered on each green TU (right panel). For simplicity, we focus here only on green TUs and TFs. It can be observed that, during the initial part of the simulation, green TFs are localized near genomic position ∼ 2000(right panel), where green TUs are transcriptionally active (left panel). Toward the end of the simulation, TUs near genomic position ∼ 500 become active, coinciding with the relocation of TFs to this region and the depletion of the previous one.

      In the definition for the demixing coefficient (equation 1), what does the index i stand for?

      Here i is an index denoting each of the colors present in the model. We have now specified the meaning of i after Eq. 1.

      Reviewer 3 (Public Review):

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) ”gene expression” patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths.

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

      Weaknesses.

      Weakness of the work: The model has many assumptions. Some of the assumptions are a bit too simplistic. Concerns about the work are detailed below:

      We thank the Referee for this overall positive evaluation.

      We thank the Referee for this important observation. The way we The authors assume that when the diffusing beads (TFs) are near a TU, the gene expression starts. However, mammalian gene expression requires activation by enhancer-promoter looping and other related events. It is not a simple diffusion-limited event. Since many of the conclusions are derived from expression activity, will the results be affected by the lack of looping details?

      We do not need to assume promoter-enhancer contact, this emerges naturally through the bridging-induced phase separation and indeed is a key strength of our model. Even though looping is not assumed as key to transcriptional initiation, in practice the vast majority of events in which a TF is near a TU are associated with the presence of a cluster where regulatory elements are looped. So transcription in our case is associated with the bridging-induced phase separation, and there is no lack of looping, looping is naturally associated with transcription, and this is an emergent property of the model (not an assumption), which is an important feature of our model. Accordingly, both contact maps and transcriptional activity are well predicted by our model, both in the version described here and in the more sophisticated single-colour HiP-HoP model [10] (an important ingredient of which is the bridging-induced phase separation).

      Authors neglect protein-protein interactions. Without proteinprotein interactions, condensate formation in natural systems is unlikely to happen.

      We thank the Reviewer for pointing out the absence of protein-protein interactions in our simulations. While we acknowledge this limitation, we would like to emphasize that experimental studies have not observed nuclear proteins forming condensates at physiological concentrations in the absence of DNA or chromatin. For example, studies such as Ryu et al. [15] and Shakya et al. [16] show that protein-protein interactions alone are insufficient to drive condensate formation in vivo. Instead, the presence of a substrate, such as DNA or chromatin, is essential to favor and stabilize the formation of protein clusters.

      In our simulations, we propose that protein liquid-liquid phase separation (LLPS) is driven by the presence of both strong and weak attractions between multivalent protein complexes and the chromatin filament. As stated in our manuscript, the mechanism leading to protein cluster formation is the bridging induced attraction. This mechanism involves a positive feedback loop, where protein binding to chromatin induces a local increase in chromatin density, which then attracts more proteins, further promoting cluster formation.

      While we acknowledge that adding protein-protein interactions could be incorporated into our simulations, we believe this would need to be a weak interaction to remain consistent with experimental data. Additionally, incorporating such interactions would not alter the conclusions of our study.

      What is described in this paper is a generic phenomenon; many kinds of multivalent chromatin-binding proteins can form condensates/clusters as described here. For example, if we replace different color TUs with different histone modifications and different TFs with Hp1, PRC1/2, etc, the results would remain the same, wouldn’t they? What is specific about transcription factor or transcription here in this model? What is the logic of considering 3kb chromatin as having a size of 30 nm? See Kadam et al. (Nature Communications 2023). Also, DNA paint experimental measurement of 5kb chromatin is greater than 100 nm (see work by Boettiger et al.).

      We thank the Reviewer for this important observation, which we now address. To begin, we consider the toy model introduced in the first part of the manuscript, where TUs are randomly positioned rather than derived from epigenetic data. As the Reviewer points out, in this simplified context, our results reflect a generic phenomenon: the composition of clusters depends primarily on their size, independent of the specific types of proteins involved. However, the main goal of our work is to gain insights into apparently contradictory experimental findings, which show that some transcription factories consist of a single type of transcription factors, while other contain multiple types. This led us to focus on TF clusters and their role in transcriptional regulation and co-regulation of distant genes. Therefore, in the second part of the manuscript, we use DNase I hypersensitive site (DHS) data to position TUs based on predicted TF binding sites, providing a more biological framework. In both the toy model and the more realistic HiP-HoP model, we observe a size-dependent transition in cluster composition. However, we refrain from generalizing these results to clusters composed of other protein complexes, such as HP1 and PRC, as their binding is governed by distinct epigenetic marks (e.g. H3K927me3 and H3K27me3), which exhibit different genomic distributions compared to DHS marks.

      Finally, the mapping of 3kb to 30nm is an estimate which does not significantly impact our conclusions. The relationship between genomic distance (in kbp) and spatial distance (in nm) is highly dependent on the degree of chromatin compaction, which can vary across cell types and genomic context. As such, providing an exact conversion is challenging [17]. For example, in a previous work based on the HiP-HoP model [12] we compared simulated and experimental FISH measurements and found that 1kbp typically corresponds to 15 − 20nm, implying that 3kbp could span 60nm. Nevertheless, we emphasize that varying this conversion factor does not affect the core results or conclusions of our study. We have now included a clarification in the revised SI to highlight this point.

      Recommendations for the authors:

      Other points.

      Figure 1(D) caption says 2.25σ = 1.6 nanometer. Is this a typo? Sigma is 30nm.

      Yes, it was. As 1σ ∼ 30nm, we have 2.25σ = 2.25 · 30 nm = 67.2 nm ∼ 6.7 × 10<sup>−8</sup>m. We have now corrected the caption.

      Page 6, column 2nd, 3rd para, it is written that θ<sub>dem</sub> (”defined in Fig.1”). There is no θ<sub>dem</sub> defined in Fig.1, is there? I can see it defined in Methods but not in Fig. 1.

      Correct, we replaced (defined in Fig.1) with (see Methods for definition).

      Page 6, column 2, 4th para: what does “correlations overlap and correlations diverge mean”?

      With reference to the plots from Fig. 5B, correlation overlap and diverge simply refers to the fact that same-colour (red curves) and different-colour (blue curves) correlation trends may or may not overlap on each other. We have now clarified this point.

      What is the precise definition of correlation in Fig 5B (Y-axis)?

      In Fig.5B, correlation means Pearson correlation. We have now specified this point in the revised text and in the caption of Fig.5.

      References

      (1) S. A. Quinodoz, J. W. Jachowicz, P. Bhat, N. Ollikainen, A. K. Banerjee, I. N. Goronzy, M. R. Blanco, P. Chovanec, A. Chow, Y. Markaki et al., “Rna promotes the formation of spatial compartments in the nucleus,” Cell, vol. 184, no. 23, pp. 5775–5790, 2021.

      (2) R. A. Beagrie, A. Scialdone, M. Schueler, D. C. Kraemer, M. Chotalia, S. Q. Xie, M. Barbieri, I. de Santiago, L.-M. Lavitas, M. R. Branco et al., “Complex multi-enhancer contacts captured by genome architecture mapping,” Nature, vol. 543, no. 7646, pp. 519–524, 2017.

      (3) R. A. Beagrie, C. J. Thieme, C. Annunziatella, C. Baugher, Y. Zhang, M. Schueler, A. Kukalev, R. Kempfer, A. M. Chiariello, S. Bianco et al., “Multiplex-gam: genome-wide identification of chromatin contacts yields insights overlooked by hi-c,” Nature Methods, vol. 20, no. 7, pp. 1037–1047, 2023.

      (4) L. Liu, B. Zhang, and C. Hyeon, “Extracting multi-way chromatin contacts from hi-c data,” PLOS Computational Biology, vol. 17, no. 12, p. e1009669, 2021.

      (5) R.-S. Nozawa, L. Boteva, D. C. Soares, C. Naughton, A. R. Dun, A. Buckle, B. Ramsahoye, P. C. Bruton, R. S. Saleeb, M. Arnedo et al., “Saf-a regulates interphase chromosome structure through oligomerization with chromatin-associated rnas,” Cell, vol. 169, no. 7, pp. 1214–1227, 2017.

      (6) E. A. Boyle, Y. I. Li, and J. K. Pritchard, “An expanded view of complex traits: from polygenic to omnigenic,” Cell, vol. 169, no. 7, pp. 1177–1186, 2017.

      (7) C. Brackley, N. Gilbert, D. Michieletto, A. Papantonis, M. Pereira, P. Cook, and D. Marenduzzo, “Complex small-world regulatory networks emerge from the 3d organisation of the human genome,” Nat. Commun., vol. 12, no. 1, pp. 1–14, 2021.

      (8) R. B. Brem and L. Kruglyak, “The landscape of genetic complexity across 5,700 gene expression traits in yeast,” Proceedings of the National Academy of Sciences, vol. 102, no. 5, pp. 1572– 1577, 2005.

      (9) M. Chiang, C. A. Brackley, D. Marenduzzo, and N. Gilbert, “Predicting genome organisation and function with mechanistic modelling,” Trends in Genetics, vol. 38, no. 4, pp. 364–378, 2022.

      (10) M. Chiang, C. A. Brackley, C. Naughton, R.-S. Nozawa, C. Battaglia, D. Marenduzzo, and N. Gilbert, “Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure,” Cell Genomics, vol. 4, no. 12, 2024.

      (11) A. Buckle, C. A. Brackley, S. Boyle, D. Marenduzzo, and N. Gilbert, “Polymer simulations of heteromorphic chromatin predict the 3d folding of complex genomic loci,” Mol. Cell, vol. 72, no. 4, pp. 786–797, 2018.

      (12) G. Forte, A. Buckle, S. Boyle, D. Marenduzzo, N. Gilbert, and C. A. Brackley, “Transcription modulates chromatin dynamics and locus configuration sampling,” Nature Structural & Molecular Biology, vol. 30, no. 9, pp. 1275–1285, 2023.

      (13) P. R. Cook and D. Marenduzzo, “Transcription-driven genome organization: a model for chromosome structure and the regulation of gene expression tested through simulations,” Nucleic acids research, vol. 46, no. 19, pp. 9895–9906, 2018.

      (14) M. Marenda, D. Michieletto, R. Czapiewski, J. Stocks, S. M. Winterbourne, J. Miles, O. C. Flemming, E. Lazarova, M. Chiang, S. Aitken et al., “Nuclear rna forms an interconnected network of transcription-dependent and tunable microgels,” BioRxiv, pp. 2024–06, 2024.

      (15) J.-K. Ryu, C. Bouchoux, H. W. Liu, E. Kim, M. Minamino, R. de Groot, A. J. Katan, A. Bonato, D. Marenduzzo, D. Michieletto et al., “Bridging-induced phase separation induced by cohesin smc protein complexes,” Science advances, vol. 7, no. 7, p. eabe5905, 2021.

      (16) A. Shakya, S. Park, N. Rana, and J. T. King, “Liquid-liquid phase separation of histone proteins in cells: role in chromatin organization,” Biophysical journal, vol. 118, no. 3, pp. 753–764, 2020.

      (17) A.-M. Florescu, P. Therizols, and A. Rosa, “Large scale chromosome folding is stable against local changes in chromatin structure,” PLoS computational biology, vol. 12, no. 6, p. e1004987, 2016.

    1. eLife Assessment

      This important study identifies a metal transporter in the plasma membrane of the obligate intracellular pathogen, Toxoplasma gondii. Using an array of different approaches, the authors convincingly demonstrate that this transporter mediates iron and zinc uptake and regulates diverse cellular processes, including parasite metabolism and differentiation. This work will be of broad interest to cell biologists and biochemists studying metal ion transport mechanisms.

    2. Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knock-down mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage.

      The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated by exogenous addition of iron or zinc. Finally, the authors used heterologous expression of ZFT in Xenopus oocytes and yeast mutants, highlighting the dual substrate specificity of the transporter. The ability of ZFT to transport both iron and zinc is thus supported by two experimental approaches in heterologous systems. First by demonstrating ZFT ability to transport zinc, as the expression of Toxoplasma ZFT can compensate for a lack of zinc transport in yeast. Then, by showing the ability of ZFT to transport iron, as assessed in the Xenopus oocytes model. Furthermore, phenotypic analyses suggest defects in iron availability upon ZFT depletion, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function.

      Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. The converging evidence, including changes in metal concentrations upon ZFT depletion, data on metal transport obtained in heterologous systems, and phenotypic changes linked to iron deficiency, presents a convincing case. Given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful.

      Comments on revisions:

      The revised manuscript has successfully addressed all of the key points raised in the initial review. Notably, the metal transport experiments in Xenopus oocytes now provide compelling evidence supporting the role of ZFT function. I congratulate the authors on their efforts and have no further concerns to raise.

    3. Reviewer #2 (Public review):

      Summary:

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite.

      Strengths:

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. The heterologous expression of ZFT in a Xenopus oocyst system where ZFT was shown to transport iron and zinc is an important addition to the study. The authors also build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion.

      Weaknesses:

      The inclusion of the data showing iron and zinc transport when ZFT is expressed in a Xenopus oocyst system alleviated one of the main weaknesses of the original paper - the lack of direct biochemical evidence that ZFT acted as an iron transporter.

    4. Reviewer #3 (Public review):

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. In the manuscript's revision, the authors performed additional transport assays in Xenopus oocysts, providing further evidence for the transporter trafficking iron. Overall, the data by Aghabi et al. convincingly support that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes.

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging form the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration) as well as performing a yeast mutant complementation and transport assays in Xenopus oocysts expressing the T. gondii protein. This work is very thorough and clearly presented, leaving little doubt about this protein's function.

      Weaknesses:

      None. The authors have addressed all my previous queries/ concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knockdown mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage. The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated for by exogenous addition of iron or zinc. 

      While the manuscript does not directly investigate the transport function of ZFT through biochemical assays, the authors indirectly support the notion that ZFT can transport zinc by demonstrating its ability to compensate for a lack of zinc transport in a yeast heterologous system. Furthermore, phenotypic analyses suggest defects in iron availability, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function. Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. Although direct biochemical evidence for the transporter's substrate specificity and transport activity is lacking, the converging evidence, including changes in metal concentrations upon ZFT depletion, yeast complementation data, and phenotypic changes linked to iron deficiency, presents a convincing case. Some aspects of the results may appear somewhat unbalanced, particularly since iron transport could not be confirmed through heterologous complementation, while zinc-related phenotypes in the parasites have not been thoroughly explored (which is challenging given the limited number of zinc-dependent proteins characterized in Toxoplasma). Nevertheless, given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful. 

      We thank the reviewer for their assessment and would like to highlight that we now add direct biochemical characterisation in the new Figure 8, supporting our hypothesis and confirming iron transport by this protein.

      Reviewer #2 (Public review): 

      Summary: 

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite. 

      Strengths: 

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. Additionally, the authors build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion. 

      Weaknesses: 

      (1) Excess zinc was shown not to alter ZFT expression, but a cation chelator (TPEN) did lead to decreased expression. While TPEN is often used to reduce zinc levels, does it have any effect on iron levels? Could the reduction in ZFT after TPEN treatment be due to a reduction in the level of iron or another cation?

      WE thank the reviewers for this comment, we agree that TPEN is a fairly unspecific cation chelator so to determine if its effects are due to removal of zinc or other cations we treated with TPEN and either zinc or iron. Co-incubation of TPEN and zinc prevented ZFT depletion, while TPEN+FAC had no effect compared to TPEN alone (new Figure 6h and i), strongly suggesting the effects on ZFT abundance are linked to zinc and not just iron.  

      (2) ZFT expression was found to be dynamic depending on the size of the vacuole, based on mean fluorescence intensity measurements. Looking at protein levels by Western blot at different times during infection would strengthen this finding. 

      We show here that ZFT expression is highly dynamic, depending both the iron status of the host cell and the number of parasites/vacuole. However, validating this finding by western would be complex due to the highly unsynchronised nature of parasite replication and the large number (5x10<sup>6</sup> - 1x10<sup>7</sup>cells) of parasites required to visualise ZFT. Further, we show that ZFT is apparently internalised prior to degradation. For this reason, we have not attempted to validate this finding by western blotting at this time.

      (3) ZFT localization remained at the parasite periphery under low iron conditions. However, in the images shown in Figure S1c, larger vacuoles (containing 4-8 parasites) are shown for the untreated conditions, and single parasite-containing vacuoles are shown for the low iron condition. As ZFT localization is predominantly at the basal end of the parasite in larger PV and at the parasite periphery for smaller vacuoles, it would be better to compare vacuoles of similar size between the untreated and low-iron conditions.

      The reviewer brings up a good point, the concentration of iron chelator that we used here does not enable parasite replication, making an assessment of changes in localisation challenging. To address this, have new data using a much lower concentration of chelator (20 mM), which is still expected to impact the parasites (Hanna et al, 2025), but allows for replication. In this low iron environment, ZFT localisation remained significantly more peripheral (Fig. S1d,e), supporting our hypothesis that ZFT localisation is iron dependent, independent of vacuolar stage.

      Reviewer #3 (Public review): 

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements, including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. Overall, the data by Aghabi et al. reveal that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes. 

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging from the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration), as well as performing a yeast mutant complementation. This work is very thorough and clearly presented, leaving little doubt about this protein's function. 

      Weaknesses:

      This study offers no major novel insights into the biology of T. gondii. The transporter was already annotated as a zinc transporter (ToxoDB), was deemed essential (PMID: 27594426), and localized to the plasma membrane (PMID: 33053376). This study mostly confirms and validates these previous datasets. The authors identify three other proteins with a ZIT domain. Particularly, the role of TGME49_225530 is intriguing, as it is likely fitness-conferring (score: -2.8, PMID: 27594426) and has no subcellular localization assigned. Characterizing this protein as well, revealing its localization, and identifying if and how these transporters coordinate metal ion transport would have been worthwhile. 

      We agree that the work presented here validates the previous datasets, and if that was all we had done, we agree that the biological insights would be limited. However, we have gone significantly beyond the predictions, demonstrating dynamic localisation changes, iron-mediated regulation, the lack of substrate-based complementation and validating transport activity of both zinc and iron. Although in silico predictions and screens can be informative, it remains important to validate biological functions experimentally. While we agree that characterisation of TGME49_225530 (as well as the other two annotated ZIP proteins) would be interesting, and will certainly form part of our future plans, it is significantly beyond the scope of the presented manuscript.

      Another weakness is the data related to the impact of ZFT downregulation on the apicoplast in Figure 4. The authors show that downregulation of ZFT causes an increase in elongated apicoplasts (Figure 4d). The subsequent panels seem to show that the parasites present a dramatic growth defect at that time point. This growth arrest can directly explain the elongated apicoplast, but does not allow any conclusion about an impact on the organelle. In any case, an assessment of 'delayed death' as presented in Figure 4c seems futile, since the many other processes affected by zinc and iron depletion likely cause a rapid death, masking any potential delayed death.

      To address this point, we agree that given the importance of iron and zinc to the parasite that we cannot differentiate the death of the parasite due to apicoplast defects from death from other causes and we have modified the discussion to reflect this, as below.

      “However, given the delayed phenotype typically seen upon apicoplast disruption, we cannot determine if this is a direct effect of ZFT, or a downstream consequence of metal depletion”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Comments: 

      (1) The background on the typical sequence features that would identify Toxoplasma ZIP homologues should be expanded and clarified. While these proteins are likely quite divergent and may lack many conserved features, the manuscript currently does not provide enough detail to assess how similar (or different) TgZIPs are from well-characterized family members. Additionally, the justification for focusing on TGGT1_261720 (ZFT) over TGGT1_225530, as stated in the first paragraph of the results section, seems unclear. There is no predictive data supporting a potential plasma membrane localization for TGGT1_225530 (yet this cannot be excluded), and TGGT1_225530 appears to have more canonical metal-binding motifs. I believe that the fact that only TGGT1_261720 is iron-regulated should be sufficient justification for its selection, and this point could be emphasized more clearly. Furthermore, the discussion mentions a leucine residue that may be associated with broad substrate specificity, but this is not addressed in the initial comparative sequence analysis. These residues and the HK motif are not actually addressed in the Gyimesi et al. reference currently mentioned; thus this could be clarified and updated with references (such as PMID: 31914589) that provide more recent insights into key residues involved in metal selectivity in ZIP transporters.

      We thank you for this comment, to address these points:

      We agree that the iron-mediated regulation is sufficient for our focus on ZFT and have clarified the text to reflect this, as described above.

      We have also updated the references as suggested, our apologies for this oversight.

      We have further expanded the discussion, especially with reference to our new results using heterologous expression in oocytes (please see above).

      (2) Figure 1D, Figure 2A, C, H, Figure 3D, Figure 6F, H, corresponding text and paragraph 2 of the Discussion: It seems that most of the "non-specific bands" annotated in Figure 1D, which are lower molecular weight products, are not present in the parental cell line, suggesting they may not be non-specific after all. These bands also vary depending on the cell line (e.g., promoter used, see Figures 2H and 3D) or experimental conditions (e.g., iron excess or depletion). Given the dynamic localization of ZFT during intracellular development, it may be worth exploring whether these lower molecular weight bands represent degraded forms of TgZFT, possibly corresponding to the basally-clustered signal observed by immunofluorescence, with only the full-length protein associating with the plasma membrane. This possibility should be investigated or at least discussed further.

      While the lower bands are not present in the parental, we do see them in other HA-tagged lines, especially when the expression of the tagged protein is low, seen below (Author response image 1). We don’t currently have an explanation for these, but we can confirm that they do not change in abundance in parallel with the full length protein, supporting our hypothesis that these bands are an artefact of the anti-HA antibody in our system. Although ZFT is clearly degraded (e.g. Fig. 1g), we currently do not believe these bands are ZFT c-terminal degradation products.

      Author response image 1.

      Western blot of ZFT-3HA<sub>zft</sub> and another HA-tagged unrelated cytosolic protein, demonstrating that the lower bands are most likely nonspecific.

      (3) It is unfortunate that ZFT could not complement a yeast iron transporter mutant cell line, as this would have provided a strong argument for ZFT's role in iron transport. The manuscript does not provide much detail about the Δfet2/3 yeast mutant line. Fet3 is the ferroxidase subunit, while Ftr1 is the permease subunit of the high-affinity iron transport complex in yeast. Fet2, however, appears to be Saccharomyces cerevisiae's VPS41 homolog. Therefore, is Δfet2/3 the most appropriate mutant to use, or would another mutant line (e.g., ΔFtr1) be a better choice? Additionally, while Figure 7 suggests a decrease in metal uptake upon ZFT depletion, it would be useful to test whether overexpression of ZFT leads to enhanced metal incorporation, perhaps using a FerroOrange assay. 

      We thank the reviewer for their comments, which we have answered below:

      The Δfet2/3 yeast mutant was a typo and has been corrected, or apologies, we did use the  Δfet3/4 mutant line, based on previous successful experiments involving plant metal transporters (e.g  (DiDonato et al., 2004)).

      Unfortunately, we were unable to perform the FerroOrange assay in the overexpression line as this line is endogenously fluorescent in the same channel as FerroOrange.

      However, as detailed above we have now added significant new data, confirming our hypothesis that ZFT is an iron/zinc transporter through heterologous expression in Xenopus oocytes in the new figure 8. This provides direct evidence of transport of iron, and evidence that zinc can inhibit this transport, consistent with our hypothesis.  

      (4) The annotation of the blot in Figure 2H suggests that overexpressed ZFT-TY can only be detected in the absence of heat denaturation. However, this is not addressed in the text. Does heat denaturation also affect the detection of ZFT-3HA or the lower molecular weight products? This should be clarified in the manuscript. 

      Interestingly, ZFT is detectable after boiling at 95° C for 5 minutes when expressed at endogenous (or near endogenous) levels in the ZFT-3HA<sub>sag1</sub> and ZFT-3HA<sub>zft</sub> tagged parasite lines. However, overexpression of ZFT leads to a loss of detection via western blot when boiled, although the protein is detectable without heat denaturation.

      A possible explanation for this is that overexpression of protein may cause ZFT to miss-fold, making the protein more prone to aggregation following boiling, rendering the protein insoluble and unable to enter the gel. Moreover, heat aggregation can sometimes mask the epitope tags on the protein that is required for the antibody to be recognised, possibly explaining by ZFT is undetectable when overexpressed and exposed to boiling conditions, as has previously been observed for other transmembrane proteins (e.g. (Tsuji, 2020)).

      We have clarified this in the results section, although we do not have a full explanation for this, we consider it important to share for others who may be looking at expression of these proteins.

      (5) Figure 3G: It might be helpful to include an uncropped gel profile to allow readers to visualize that the main product does indeed correspond to a potential dimeric form in the native PAGE. 

      This has now been added in Figure S3e, thank you for this suggestion.

      (6) The investigation of the impact of ZFT depletion on the apicoplast could be improved. The authors suggest that ZFT knockdown inhibits apicoplast replication based on a modest increase in elongated organelles, but the term "delayed death" is not appropriate in that case, as it is typically linked to a loss of the organelle. This is not observed here and is also illustrated by the unchanged CPN60 processing profile. So, clearly, there seems to be no strong morphological effect on the apicoplast early on after ZFT depletion. On the other hand, the authors dismiss any impact on TgPDH-E2 lipoylation (which is iron-dependent) based on the fact that the lipoylated form of the protein is still detected by Western blot. However, closer inspection of the blot in Figure 4B suggests that the intensity of the annotated TgPDH-E2 signal is reduced compared to the -ATc condition (although there might be differences in protein loading, as indicated by the control) or even with the mitochondrial 2-oxoglutarate dehydrogenase-E2, whose lipoylation is presumably iron-independent (see PMID: 16778769). This experiment should be repeated, and the results quantified properly in case something was missed, and the duration of depletion conditions perhaps extended further. Of note, it would also be worthwhile to revisit size estimations, as the displayed profiles seem inconsistent with the typical sizes of lipoylated proteins detected with the anti-lipoyl antibody (e.g., ~100 kDa for PDH-E2, ~60 kDa for branched-chain 2-oxo acid dehydrogenase, and ~40 kDa 2-oxoglutarate dehydrogenase).

      We thank the reviewer for this comment. We agree that there is no strong defect on the apicoplast in the first lytic cycle and we have modified the language to remove reference to delayed death, as given the magnitude of changes associated with loss of iron and zinc, we cannot be certain about the role of the apicoplast.

      Based on this suggestion, we have now quantified the levels of lipoylation of PDH-E2, BDCK-E2 and OGDH-E2 and now include this in Figure S4b, c, d. Supporting our other results, we do not see a significant change in PDH-E2 lipolyation upon ZFT knockdown. However, although OGDH-E2 lipoylation is unchanged (Figure S4c) interestingly we do see a significant increase in BDCK-E2 lipoylation (Figure S4d). This process is not expected to be directly iron related, as mitochondrial lipoylation is through scavenging rather than synthesis however, speaks to the larger mitochondrial disruption that we see. We now consider this further in the discussion.

      For the sizes, we thank the reviewer for bringing this up, our apologies this was due to an error in the annotation, and we have now corrected this in the figure.

      (7) In the third paragraph of the discussion, the authors mention the inability to complement ZFT loss by adding exogenous metals. One argument is the potential lack of metal access to the parasitophorous vacuole (PV). Although largely unexplored, this point could be expanded further in the discussion, as the issue of metal transport to the parasite involves not only the parasite plasma membrane but also the PV membrane. Additionally, the authors mention the absence of functional redundancy in transporters, but it would be helpful to discuss potential stage-specific or differential expression of other ZIP candidates. Transcriptomic data available on Toxodb.org could provide useful insights into this, and experimental approaches, such as RT-PCR, could be used to assess the expression of these candidates in the absence of ZFT. 

      On the issue of metals crossing the PV membrane, we agree that while we do not currently know mechanisms of metal transport within the infected host cell, we do have experimental confirmation that the concentration and form of the metals that we are using can impact the parasites. We show that metal treatment inhibits parasites growth (e.g. Figure 3k-n, Figure 6a-d) and we can detect the increased metals through our experiments using FerroOrange and FluroZine (Figure 7a, c). In these experiments, parasites were treated intracellularly and so we can confirm that, regardless of the mechanism, iron and zinc can reach the parasite. While entry of metals across the PV is an intriguing question, it is beyond the scope of the present work which focuses on the role of the selected transporter.

      We agree that a more detailed discussion of the other ZIP transporters is warranted. We have extended this section of the discussion although for now, we cannot determine the role of the other ZIP transporters in Toxoplasma.

      (8) In the discussion, the authors mention that « Inhibition of respiration has previously been linked to bradyzoite conversion ». To strengthen their point, the authors could mention that mitochondrial Fe-S mutants, as well as mutants affecting mitochondrial translation or the mitochondrial electron transport chain, also initiate bradyzoite conversion (PMID: 34793583). This would reinforce the connection between mitochondrial dysfunction and stage conversion. 

      This is an excellent point and we have added this to the discussion as follows:

      “Inhibition of mitochondrial Fe-S biogenesis or mitochondrial respiration have both previously been linked to bradyzoite conversion (Pamukcu et al., 2021; Tomavo and Boothroyd, 1995), however we do not yet know the signalling factors linking iron, zinc or mitochondrial function to bradyzoite differentiation”.

      (9) As a general comment on manuscript formatting, providing page and line numbers would significantly improve the manuscript's readability and allow reviewers to more easily reference specific sections. This would help address the minor issues of typos (e.g., multiple occurrences of "promotor"). I suggest a careful read-through to correct these issues. 

      We thank the reviewer for this comment and in the resubmitted version we have corrected these issues. 

      Reviewer #2 (Recommendations for the authors): 

      (1) In the alignment (Figure 1a), the BPZIP sequence is from which organism (genus, species)? It would be helpful to include this information in the figure legend.

      Apologies for this oversight, this figure and section have been reworked and the species name (Bordetella bronchiseptica) added.

      (2) In reference to Figure 1a, the authors state, "Interestingly, all parasite ZIP-domain proteins examined have a HK motif at the M2 metal binding". I was wondering if by "all" the authors mean Toxoplasma and Plasmodium falciparum (shown in Figure 1a) or did the authors also look at other apicomplexan parasites such as Cryptosporidium or Neospora? Is this a general feature of apicomplexan parasites? 

      We looked at this, and the HK motif in the M2 binding site is conserved in Neospora Cryptosporidium, and even the digenic gregarine Porospora cf. gigantea. However, in the more distantly related Chromera we find a HH motif at the same position. This suggests that the HK motif is present in the Apicomplexa, but not conserved in the free-living Alveolata. Although we cannot speculate on the role of this motif currently, its role in metal import in Apicomplexa does deserve future scrutiny. To reflect this finding we have modified Figure 1a and the text.

      (3) In Figure 1e, to better visualize the ZFT-3HA staining at the basal pole, it would be better to omit the DAPI staining from the merged image. It is difficult to see the ZFT staining in the image of the large vacuole.

      We have removed the DAPI from this image to improve clarity.

      (4) Based on the "delayed-death" phenotype of the apicoplast, it is not surprising that no defects were observed in CPN60 processing or protein lipoylation. Have the authors considered measuring these phenotypes after a further round of growth (as was done for visualizing apicoplast morphology)? 

      We agree that changes in apicoplast function are often only seen in the second round of replication. However, here we wanted to check if ZFT depletion led to immediate changes in function of the organelle, which was not the case. It is highly likely that after the second round, we would see significant defects in the apicoplast function, however given the immediate importance of iron and zinc to many processes within the parasite, we believe that these experiments would be complicated to interpret.

      (5) Depleting ZFT led to a reduction in expression levels for the mitochondrial Fe-S protein SDHB but not for a cytosolic Fe-S protein. Is it expected that less intracellular iron (via depleted ZFT) would differentially affect mitochondrial versus cytosolic Fe-S proteins? 

      Previous studies (e.g., Maclean et al., 2024; Renaud et al., 2025) have shown that upon direct inhibition of the cytosolic Fe-S pathway, ABCE1 is fairly stable and levels can persist for 2-3 days post treatment. However, our recent work has shown that rapid and acute depletion of iron directly (though treatment with a chelator) can lead to ABCE1 levels decreasing within 24h (Hanna et al., 2025). In the case of ZFT knockdown, due to the more gradual reduction in iron levels seen (e.g. Figure 7j) we believe the parasites are prioritising key Fe-S pathways (e.g. essential proteostasis through ABCE1), probably while remodelling metabolism (as seen in our Seahorse assays). However, there are many proteins expected to be directly impacted by iron and zinc restriction that these parasites experience, and different protein classes are expected to behave differently in these conditions.

      Reviewer #3 (Recommendations for the authors): 

      (1) Is the effect on the plaque size between T7S4-ZFT (-aTc) in regular and 'high iron' conditions significant? The authors show convincingly that the plaque size is smaller due to the swapped promoter and the resulting overexpression of ZFT. But is the effect aggravated in high iron? This would be expected if excess iron were the problem.

      The plaque sizes are significantly smaller in the T7S4-ZFT line under high iron compared to the untreated condition, and compared to the parental untreated line. However, if we normalise plaque size to untreated conditions for both lines, there is not a significant change in plaque size in high iron between the parental and T7S4-ZFT. This is possibly due to the concentration of iron used (200 mM), which may not be optimal to see this effect, or the time taken for plaque assays (6-7 days), which may allow the excess iron to be stored by the host cells, changing the effective concentration of parasite exposure.

      (2) I struggle to understand the intracellular growth assay in Figure 5b. Here, T7S4-ZFT parasites show 25 % of vacuoles with more than 8 parasites (labelled 8+). But such large vacuoles are not observed in the parental strain. It appears as if the inducible strain grows faster even though it was earlier shown to have a fitness defect (see Figure 3j). Can you please clarify?

      This is a result of rapid growth of the parental line, some vacuoles in this line lysed and initiated a new round of replication at this time point while we saw no evidence at any timepoint that ZFT-depleted parasites were able to lyse the host cell. However, the initial (24-48h post ATc addition) replication rate of the ZFT KD remains similar to the parental. In this panel, we wanted to emphasize that the major phenotype we see upon ZFT depletion is vacuole disorganisation, which we believe is linked to the start of differentiation into bradyzoites.

      (3) Did the authors perform an IFA in addition to the Western blot to localize the 2nd Ty-tagged ZFT copy? It seems important to validate that the protein correctly localizes to the plasma membrane. 

      We have done so and now include these data in Figure S2b. Overexpression of ZFT-Ty localises to internal structures (probably vesicles) with some signal at the periphery, however, this limited expression at the periphery is sufficient to mediate the phenotypes that we see.

      (4) First sentence of the abstract and introduction: The authors speak of metabolism and cellular respiration as though they are two different processes. Is respiration not part of metabolism? 

      This is an excellent point, we wanted to distinguish mitochondrial respiration  from general cellular metabolism, but this was not clear. We have now changed this in the introduction to the below:

      “Iron, and other transition metals such as zinc, manganese and copper, are essential nutrients for almost all life, playing vital roles in biological processes such as DNA replication, translation, and metabolic processes including mitochondrial respiration (Teh et al., 2024)”

      (5) 2nd paragraph of the introduction: toxoplasmosis is written capitalized but should be lower case.

      This has been corrected.

      (6) Figure 4j legend: change 'shits parasites to a more quiescent stage' to 'shifts parasites'.

      This has been corrected, our apologies.

      (7) Please correct the following sentence: 'These data demonstrate ZFT depletion leads to the expression of the bradyzoite-specific markers BAG1 and DBL.' DBL is not expressed by the parasite. It is a lectin that binds to the sugars in the cyst wall.

      We have now modified this in the text. The sentence now reads: “These data show that ZFT depletion leads to the expression of the bradyzoite marker BAG1 and the production of the cyst wall, as detected by DBL”.

      (8) In the section on yeast complementation with TgZFT, the authors write: 'Based on this success, we also attempted to complement...'. Please consider changing 'Success' to something more neutral.

      We have modified the text to now read: “Based on these results, we also attempted to complement”…

      (9) In the discussion, the authors write: 'We see a delayed phenotype on the apicoplast, suggesting that metal import is also required in this organelle, although no apicoplast metal transporters have yet been identified.' Please consider the study Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites (PMID: (38163252).

      We thank the reviewer for the note and have modified the text to include this and the reference. Please see below:

      “Iron is known to be required in the apicoplast (Renaud et al., 2022), zinc also may be required, as the fitness-conferring Plasmodium zinc transporter ZIP1 is transiently localised to the apicoplast (Shrivastava et al., 2024), although the functional relevance of this localisation has not yet been established”.

      (10) The authors write: 'Iron is known to be required in the apicoplast (Renaud et al., 2022), although a potential role for zinc in this organelle has not yet been established.' The role for zinc in the apicoplast may not have been shown formally, but surely among its hundreds of proteins, and those involved in replication and transcription, there are some that depend on zinc...?

      Yes, we agree it would make sense, however multiple searches using ToxoDB and the datasets from Chen et al (2025) were unable to find any apicoplast-localised proteins with zinc-binding domains. We cannot exclude that zinc is in the apicoplast, and the results from Plasmodium (Shrivastava et al., 2024) may suggest that is, however currently we do not have any evidence for its role within this organelle.

      References

      DiDonato, R.J., Roberts, L.A., Sanderson, T., Eisley, R.B., Walker, E.L., 2004. Arabidopsis Yellow Stripe-Like2 (YSL2): a metal-regulated gene encoding a plasma membrane transporter of nicotianamine-metal complexes. Plant J 39, 403–414. https://doi.org/10.1111/j.1365-313X.2004.02128.x

      Hanna, J.C., Shikha, S., Sloan, M.A., Harding, C.R., 2025. Global translational and metabolic remodelling during iron deprivation in Toxoplasma gondii. https://doi.org/10.1101/2025.08.11.669662

      Maclean, A.E., Sloan, M.A., Renaud, E.A., Argyle, B.E., Lewis, W.H., Ovciarikova, J., Demolombe, V., Waller, R.F., Besteiro, S., Sheiner, L., 2024. The Toxoplasma gondii mitochondrial transporter ABCB7L is essential for the biogenesis of cytosolic and nuclear iron-sulfur cluster proteins and cytosolic translation. mBio 15, e00872-24. https://doi.org/10.1128/mbio.00872-24

      Pamukcu, S., Cerutti, A., Bordat, Y., Hem, S., Rofidal, V., Besteiro, S., 2021. Differential contribution of two organelles of endosymbiotic origin to iron-sulfur cluster synthesis and overall fitness in Toxoplasma. PLoS Pathog 17, e1010096. https://doi.org/10.1371/journal.ppat.1010096

      Renaud, E.A., Maupin, A.J.M., Berry, L., Bals, J., Bordat, Y., Demolombe, V., Rofidal, V., Vignols, F., Besteiro, S., 2025. The HCF101 protein is an important component of the cytosolic iron–sulfur synthesis pathway in Toxoplasma gondii. PLoS Biol 23, e3003028. https://doi.org/10.1371/journal.pbio.3003028

      Shrivastava, D., Jha, A., Kabrambam, R., Vishwakarma, J., Mitra, K., Ramachandran, R., Habib, S., 2024. Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites. ACS Infect. Dis. 10, 155–169. https://doi.org/10.1021/acsinfecdis.3c00426

      Teh, M.R., Armitage, A.E., Drakesmith, H., 2024. Why cells need iron: a compendium of iron utilisation. Trends in Endocrinology & Metabolism 35, 1026–1049. https://doi.org/10.1016/j.tem.2024.04.015 Tomavo, S., Boothroyd, J.C., 1995. Interconnection between organellar functions, development and drug resistance in the protozoan parasite, Toxoplasma gondii. International Journal for Parasitology 25, 1293–1299. https://doi.org/10.1016/0020-7519(95)00066-B.

    1. eLife Assessment

      This important study provides new insights into how Staphylococcus aureus adapts to disulfide stress through the redox-sensitive regulator Spx, which coordinates nutrient uptake, cysteine import, redox homeostasis, and bacterial growth. While the authors present compelling evidence supporting the central role of Spx in managing disulfide stress, several aspects require further clarification. In particular, the precise mechanisms regulating cysteine uptake and the proposed link between disulfide stress responses and iron limitation would benefit from additional explanation and experimental or conceptual justification.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      This manuscript presents a thoughtful and well-executed analysis of how S. aureus adapts to disulfide stress using a redox-sensitive regulator, Spx, as a lynchpin to coordinate nutrient uptake, redox balance, and growth. The work is strengthened by a systematic and complementary experimental approach that combines genetic, biochemical, and physiological measurements. The authors carefully test alternative explanations and build a coherent model linking stress sensing to downstream metabolic consequences. Several results, particularly those connecting cysteine uptake to growth defects, provide convincing support for the proposed trade-off. Overall, the authors largely achieve their aims, and the evidence generally supports the central conclusions. The conceptual framework and experimental approaches should be of broad interest to researchers studying S. aureus physiology and pathogenesis and to those studying bacterial stress responses and metabolic trade-offs.

      Weaknesses:

      Clarifying several interpretive points would further strengthen confidence in the proposed model. Some conclusions rely on data presentations or experimental designs that are not immediately clear to the reader. In particular, aspects of the protein stability analysis, global regulatory comparisons, and assays linking cysteine uptake to iron limitation would benefit from clearer justification and more precise interpretation. In addition, certain conclusions could be more carefully framed to reflect partial rather than complete rescue effects.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript titled "Activation of the Spx redox sensor counters cysteine-driven Fe(II) depletion under disulfide stress" by Hall and colleagues describes that an active redox switch is required for surviving under the diamide-induced disulfide stress. Furthermore, the SpxC10A mutant exhibits transcriptional dysregulation of genes involved in thiol maintenance and disulfide repair. The authors further demonstrate a role for Spx in regulating the uptake of L-cysteine, which otherwise leads to the chelation of intracellular iron and thus the repression of growth.

      Strengths:

      The authors demonstrate that the SpxC10A mutant accumulates high levels of thiols, leading to the chelation of intracellular iron and subsequent repression of the SpxC10A mutant's growth.

      Weaknesses:

      The authors did not show a direct regulation of L-cysteine uptake through CymR.

    4. Reviewer #3 (Public review):

      Summary:

      The paper from Hall et al. reports the effects of an altered function spx allele on the physiology of S. aureus. Since Spx is essential in this organism, the authors compare WT with a spx C10A allele that retains Spx functions that are independent of the formation of a C10-C13 disulfide. However, the major role of Spx in maintaining disulfide homeostasis in this organism appears to be reduced by this mutation, including a reduction (relative to WT) in the DIA-induction of thioredoxin, thioredoxin reductase, and BSH biosynthesis and reduction enzymes.

      Strengths:

      Based on a wide range of studies, the authors develop a model in which Spx is required for adaptation to disulfide stress, and this adaptation involves (in part) induction of both cystine/Cys uptake and the Fur regulon. Overall, the results are compelling, but further efforts to clarify the presentation will aid readers in being able to follow this very complicated story.

      Weaknesses:

      (1) More details are needed on how relative growth is defined and calculated (e.g., line 145 and Figure 1C). The raw data (growth curves) should be included when reporting relative growth so that readers can see what changed (lag, growth rate, final OD?). Later in the paper, the authors refer to "the diamide-induced growth delay of the spxC10A mutant" (line 379), but this is not apparent from the presented data.

      (2) Are the spx C10A, spx C13A, and spx C10A,C13A all really equivalent? In all cases, the Spx protein is presumably made (as confirmed for C10A in panel 1D). However, the only evidence to suggest that they are equivalent is the similar growth effects in panel 1C, and (as noted above), this data presentation can mask differences in how the mutations affect protein activity.

      (3) Figure 1D and Figure 1 Supplement 2 report results related to the effect of diamide treatment on protein half-life (t1/2). Only single results are shown for both panels, and the conclusions do not seem to be statistically robust. For example, in Figure 1, Supplement 2 concludes that Spx C10A has a t1/2 is 3.38 min (this should be labeled correctly in the Figure legend as the red line). and WT Spx is 8.69 min. However, Figure 1D suggests that the protein levels at time 0 may not be equivalent, and this is lost in the data processing. Indeed, there are significant differences in Spx levels between time 0 - and + DIA, which is curious. Further, the authors' conclusion relies very heavily on line-fitting that includes a final point that has very low signal intensity (as judged from Figure 1D) and therefore is likely the least reliable of all the data. It might be worth showing curve fitting for multiple gels. Regardless of the overfitting of the data, the general conclusion that Spx is partially stabilized against proteolysis by ClpXP, and that the C10A mutant is reduced in stabilization, is probably correct.

      (4) Figure 2 concludes that despite differences in the mRNA profiles between WT and spx C10A after 15 min. of DIA treatment, the overall level of responsiveness of the bacillithiol pool is unchanged. The authors find it "surprising" that the BSH pool responds normally despite some differences in gene expression. This is not surprising. The major events visualized in panel 2D are the chemical oxidation of BSH to BSSB and, presumably, the re-reduction by Bdr(YpdA). While it is seen that BSH synthesis (bshC) and ypdA expression may be less induced by DIA in the C10A mutant (2C), there is no evidence that the basal levels are different prior to stress. Therefore, the chemical oxidation and enzymatic re-reduction might be expected to occur at similar rates, as observed.

      (5) Line 215. For the reason stated above, there is no reason to invoke Cys uptake as needed for the reduction of BSSB. Further, since CySS (presumably an abbreviation for cystine) is imported, this itself can contribute to disulfide stress.

      (6) Line 235. Following on the above point, "diamide-induced disulfide stress increased L-CySS uptake in the spxC10A mutant to re-establish the BSH redox equilibrium." This is counterintuitive since LCySS is itself a disulfide and is thought to be reduced to 2 L-Cys in cells by BSH (leading to an increase in BSSB, not a reduction). Is there a known cystine reductase? Could cystine or L-cys be affecting gene regulation? (e.g., through CymR or Spx or ?). Cystine can also lead to mixed disulfide formation (e.g., could it modify Spx on C13?).

      (7) l. 247 "a functional Spx redox switch allows S. aureus to avoid this trade-off and maintain thiol homeostasis without excessive L-CySS uptake." Can the authors expand on how this is thought to work? Does Spx normally affect cystine uptake? I thought this was CymR? I am not following the logic here.

      (8) Line 258. "The fur mutant, which is known to accumulate iron...". My understanding is that fur mutant strains typically have higher bioavailable (free) Fe pools. This is seen in E. coli, for example, using EPR methods. However, they also often have lower total Fe due to the iron-sparing response, which represses the expression of abundant, Fe-rich proteins. Please provide a reference that supports this statement that in S. aureus fur mutants have higher total iron per cell.

      (9) Figure 4. For the reasons stated above (point 1), it is hard to interpret data presented only as "Rel. Growth". Perhaps growth curve data could be included in a supplement.

      (10) The interpretation of Figure 4 is complicated. It is not clear that there is necessarily a change in bioavailable Fe pools, although it does seem clear that Fe homeostasis is perturbed. It has been shown that one strong effect of DIA on B. subtilis physiology is to oxidize the BSH pool to BSSB (as shown also here), and this leads to a mobilization of Zn (buffered by BSH). Elevated Zn pools can inactivate some Fe(II)-dependent enzymes, which could account for the rescue by Fe(II) supplementation. Zn(II) can also dysregulate PerR and likely Fur regulons.

    1. eLife Assessment

      Optical tweezers have been instrumental to the determination of mechanical parameters of molecular motors. This study by Takamatsu et al. reports key mechanical parameters of kinesin KIF1A using fluorescence microscopy, wherein the motor is tethered to a DNA nanospring, without the use of an optical trapping apparatus, which represents an exciting development. The approach and the findings reported change current thinking about KIF1A‑mediated transport, with potential implications for understanding human disease. The findings are important and the strength of the evidence is compelling.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses a novel DNA origami nanospring to measure the stall force and other mechanical parameters of the kinesin-3 family member, KIF1A, using light microscopy. The key is to use SNAP tags to tether a defined nanospring between a motor-dead mutant of KIF5B and the KIF1A to be integrated. The mutant KIF5B binds tightly to a subunit of the microtubule without stepping, thus creating resistance to the processive advancement of the active KIF1A. The nanospring is conjugated with 124 Cy3 dyes, which allows it to be imaged by fluorescence microscopy. Acoustic force spectroscopy was used to measure the relationship between the extension of the NS and force as a calibration. Two different fitting methods are described to measure the length of the extension of the NS from its initial diffraction-limited spot. By measuring the extension of the NS during an experiment, the authors can determine the stall force. The attachment duration of the active motor is measured from the suppression of lateral movement that occurs when the KIF1A is attached and moving. There are numerous advantages of this technology for the study of single molecules of kinesin over previous studies using optical tweezers. First, it can be done using simple fluorescence microscopy and does not require the level of sophistication and expense needed to construct an optical tweezer apparatus. Second, the force that is experienced by the moving KIF1A is parallel to the plane of the microtubule. This regime can be achieved using a dual beam optical tweezer set-up, but in the more commonly used single-beam set-up, much of the force experienced by the kinesin is perpendicular to the microtubule. Recent studies have shown markedly different mechanical behaviors of kinesin when interrogated by the two different optical tweezer configurations. The data in the current manuscript are consistent with those obtained using the dual-beam optical tweezer set-up. In addition, the authors study the mechanical behavior of several mutants of KIF1A that are associated with KIF1A-associated neurological disorder (KAND).

      Strengths:

      The technique should be cheaper and less technically challenging than optical tweezer microscopy to measure the mechanical parameters of molecular motors. The method is described in sufficient detail to allow its use in other labs. It should have a higher throughput than other methods.

      Weaknesses:

      The experimenter does not get a "real-time" view of the data as it is collected, which you get from the screen of an optical tweezer set-up. Rather, you have to put the data through the fitting routines to determine the length of the nanospring in order to generate the graphs of extension (force) vs time. No attempts were made to analyze the periods where the motor is actually moving to determine step-size or force-velocity relationships.

      Comments on revisions:

      I am satisfied with the revision made by the authors in response to my first round of criticisms.

    3. Reviewer #2 (Public review):

      Summary:

      This work is important in my view because it complements other single-molecule mechanics approaches, in particular optical trapping, which inevitably exerts off-axis loads. The nanospring method has its own weaknesses (individual steps cannot be seen), but it brings new clarity to our picture of KIF1A and will influence future thinking on the kinesins-3 and on kinesins in general.

      Strengths:

      By tethering single copies of the kinesin-3 dimer under test via a DNA nanospring to a strong binding mutant dimer of kinesin-1, the forces developed and experienced by the motor are constrained into a single axis, parallel to the microtubule axis. The method is imaging-based which should improve accessibility. In principle, at least, several single-motor molecules can be simultaneously tested. The arrangement ensures that only single molecules can contribute. Controls establish that the DNA nanospring is not itself interacting appreciably with the microtubule. Forces are convincingly calibrated and reading the length of the nanospring by fitting to the oblate fluorescent spot is carefully validated. The excursions of the wild type KIF1A leucine zipper-stabilised dimer are compared with those of neuropathic KIF1A mutants. These mutants can walk to a stall plateau, but the force is much reduced. The forces from mutant/WT heterodimers are also reduced.

      Weaknesses:

      The tethered nanospring method has some weaknesses; it only allows the stall force to be measured in the case that a stall plateau is achieved, and the thermal noise means that individual steps are not apparent. The nanospring does not behave like a Hookean spring - instead linearly increasing force is reported by exponentially smaller extensions of the nanospring under tension. The estimated stall force for Kif1A (3.8 pN) is in line with measurements made using 3 bead optical trapping, but those earlier measurements were not of a stall plateau, but rather of limiting termination (detachment) force, without a stall plateau.

      Comments on revisions:

      The authors have successfully addressed my previous criticisms.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for the careful reading of our manuscript and for the constructive comments. We have provided responses to each of the comments below.

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 Public Review

      We appreciate the constructive comments of Reviewer #2, which have strengthened both the presentation and interpretation of our results.

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows. First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors detect the attachments that occur during a processive run by KIF1A by monitoring the suppression of the angular fluctuations of the fluorescent signal and plot this, for example, in Figure 3a as the Length of the NS (which presumably is a readout of force) vs time. This interval includes the time when the KIF1A is actively moving along the MT and when it is stalled. It would be interesting to know the actual stall time of the motor in order to be able to calculate a detachment rate constant. For attachment periods such as the first example highlighted in pink in Figure 3a, the stall time is pretty much equal to the attachment time since the motor is moving so fast and the stall period is so long. However, for short attachment times such as the fifth pink interval shown in this same figure or the traces with the mutant KIF1As in Figure 4 this is not so. Can the authors institute a program to identify the periods where the motor has stretched the NS spring to the point where it stalls, and then calculate this time in order to do an exponential fit to the "dwell time distribution"?

      By introducing another criterion (see Methods, “Rate of relative increase in NS’s length”), the attachment duration was separated into the two time regions noted by the reviewer. After reanalyzing all the data, we evaluated only the stall duration this time. As a result, the estimated stall-force values became more reliable and accurate. The dwell time analysis of was performed and included in the supplementary material for WT KIF1A, for which sufficient data were available.

      (2) The histogram of stall events in Figure 3b is quite broad. Please discuss.

      The newly added distributions from individual molecules (Fig. 3b) show that the variety in the stall force distribution is not due to multiple molecules, but is primarily an intrinsic property of single KIF1A molecules reflecting the complex kinetics of KIF1A under load, including occasional backward steps and reattachments. In addition, because the nanospring is a non-linear spring, a disadvantage is that even small fluctuations in extension can result in a substantial deviation in the measured stall force. These points have been added to the Discussion section.

      (3) Figure 3c, it is clear that for attachment times greater than 5s the attachment duration is independent of the Lstall, but this is not so clear for the short attachment durations. Some of this may relate to the fact that you're measuring attachment durations and not stall or dwell times as described in my first comment. Do you feel this is due to less precision in measuring the "attachment duration" during the short attachments, or just simply that more data is needed here? I assume that you do not want to imply that there is a load-dependence of the attachment durations here? Perhaps an expanded view of the data set from 0-10 seconds would clarify. 

      As described in our response to comment (1), the stall durations were separated from the attachment durations. This improved the measurement accuracy and revealed that and are uncorrelated (Fig. 3c). We appreciate this constructive comment.

      Reviewer #2 (Recommendations for the authors):

      (1) Off-axis forces are described as 'upward', 'perpendicular', and 'horizontal'. Consider referring to off-axis force, and if necessary, defining the direction of the force(s) relative to the axis of the immobilised MT. If necessary, a cartoon of XYZ axes might be added to F1c? 

      An XZ axis was added to the schematic in Fig. 1c.

      (2) If I understand correctly, stall forces are calculated by averaging the entire region in which the angular fluctuation is reduced below a threshold. In cases like the 3rd and 7th events on the trace in F1a, this will reduce the average. Perhaps consider separately averaging the later time points in each stall event? Perhaps also consider correlating the angular fluctuation signals and the spring length signal? Some fluctuations during stall plateaus might indicate slip back and re-engage events? 

      Instead of separately averaging the later time points in each stall event, we separated the stall force duration from the overall attachment duration (Fig. 3). This allowed us to obtain more accurate stall force values. The relationship between the NS length and the angular fluctuation during KIF1A slip-back events differed among individual stall events, and no clear trend was observed. Two representative examples are shown in the Author response image 1.

      Author response image 1.

      (3) Please describe all relevant methods fully instead of referencing previous work. For example, nanospring preparation refers readers to reference 21 (which in turn references an earlier paper).

      We revised the Methods section to include the procedures described in the previous reference, and we added the sequence information of the DNA origami to the supplementary information.

      (4) Were any experiments tried at reduced ATP concentration?

      (5) Were any data obtained from WT KIF5B? For kinesin-1, stall plateau forces of >7 pN are obtained.

      This study focused on comparing the stall forces of wild-type and KAND-related mutant KIF1A molecules under physiological ATP conditions, as our main goal was to characterize the disease-relevant phenotypes. Experiments at reduced ATP concentrations and with WT KIF5B are indeed important future directions but are beyond the scope of the present study. These follow-up experiments are currently in progress.

      (6) In Figure 1b, consider showing the attachment to the mutant KIF5B, and reversing the orientation so it corresponds to Figure 1c.

      KIF1A and KIF5B share the same binding method, so to indicate that the schematic in Fig. 1b represents both, we replaced ‘KIF1A’ with ‘Kinesin’.

      (7) In Figure 3d, add force axis. In general, please re-check all force axes. In Supplement S3, the stall plateau labels appear well above their corresponding axis ticks. In Figure 4, several mutants appear to be stalling at well over 5 pN, yet Table 1 gives a much lower value. Presumably, this reflects averaging effects?

      We added the force axis to Fig. 3d. Besides, we corrected Fig. S3 and Fig. 4 because there were errors in the conversion from length to force. As the reviewer pointed out, the apparent discrepancy between the force values in Fig. 4 and Table 1 arises mainly from averaging effects.

    1. eLife Assessment

      This study presents a valuable human stem cell-derived organoid model that captures key morphological and cellular features of spinal cord development and provides evidence for a YAP-dependent mechanism of lumen formation relevant to secondary neurulation. Overall, the evidence is convincing, using strong and validated approaches consistent with the current state of the art, including systematic protocol optimisation across multiple cell lines and quantitative analysis of tissue architecture. However, some claims regarding precise anterior-posterior and dorsoventral spinal cord identity, as well as several novelty claims, are at times overstated and would benefit from more direct validation and more careful positioning. The work will be of interest to developmental biologists and researchers studying neural tube defects.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Blanco-Ameijeiras et al. present an organoid-based model of the caudal neural tube that builds upon established principles from embryonic development and prior organoid work. By systematically testing and refining signaling conditions, the authors generate caudal progenitor populations that self-organize into neuroepithelia with molecular features consistent with secondary neurulation. Bulk-RNA sequencing supports the emergence of caudal neural identities, and the authors further examine cellular features such as apico-basal polarity and interkinetic nuclear migration. Finally, they provide evidence for a conserved, YAP-dependent mechanism of tube formation specific to secondary neurulation. The manuscript provides valuable methodological resources, including troubleshooting guidance that will be especially useful for the field. While this work represents a significant advance toward modeling human spinal cord development - particularly the process of secondary neurulation - the claims of complete caudalization and full AP-axis representation require additional experimental support and clarification.

      Strengths:

      (1) Methodological clarity and transparency: The first figure and accompanying text provide an exemplary explanation of protocol optimization and troubleshooting. This transparency - showing approaches that failed as well as those that succeeded - sets a high standard for reproducibility and will be highly beneficial to laboratories aiming to adopt or build upon this model.

      (2) Testing across multiple cell lines: Multiple hPSC and hiPSC lines were evaluated, strengthening the robustness and generalizability of the reported protocol.

      (3) Biological relevance: The focus on secondary neurulation fills a notable gap in current human organoid models of spinal cord development. The identification of YAP-dependent mechanisms in tube formation is a valuable insight with potential translational relevance.

      (4) Resource creation: The detailed parameters and signaling regimes will serve as a resource for the spinal cord and organoid communities.

      Weaknesses:

      (1) The manuscript over-interprets bulk RNA-seq data to make strong claims on the organoid AP patterning and caudalization. Bulk sequencing provides population-level averages and cannot confirm that individual organoids represent discrete AP levels. To support claims of generating every AP identity, the authors must perform staining or in situ hybridization for HOX genes on individual organoids. Further, the current interpretation of CDX2 as marking "very distal" identity is inaccurate in vitro; CDX2 marks caudal progenitors across the spinal cord axis. The language should be revised accordingly.

      (2) The claim of being the first organoid system to model secondary neurulation overlooks prior work showing HOXC9 in human organoids (Xue et al., Nature 2024; Libby et al., Development 2021), which would reflect the beginning of secondary neurulation. While this system may indeed be the first isolated secondary neurulation organoid model that expresses HOXD9/10 - a meaningful advance - bulk RNA-seq alone is insufficient to support the exclusivity of this claim. Additional single-organoid-level spatial analyses (via immunofluorescence of in situ hybridisation) and frequency quantification of regional identities are required to fully characterize the system.

      (3) Similarly, as written, there are overstatements taken from the bulk RNA sequencing to determine dorsal-ventral identity. Although dorsal markers are present, the dataset also contains ventral-associated genes (PAX6, SP8, NKX6-1, NKX6-2, PRDM12). To claim a "dorsal-only" identity, the authors should perform PAX7 immunostaining to demonstrate dorsalization of the entire organoid tissue.

      (4) The studies identifying YAP as a key driver of lumen fusion in Figure 6 are important and should be extended to the apical organoid system to demonstrate that this is truly a feature of secondary neurulation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Blanco-Ameijeiras and colleagues present the use of stem cells to create human spinal cord organoids that recapitulate anterior-posterior identity, with a large focus on posterior fates. In particular, the authors show robust transcriptional landscape specification that reflects certain anterior-posterior spinal cord development.

      Recapitulation of spinal cord development is essential to understand the fundamentals of developmental defects in a systematic manner. This work provides a broad approach to test certain aspects of neural tube morphogenesis, particularly posterior and dorsal identities. Perhaps the shorter protocol is an interesting upgrade for current standards, and the mechanical interpretation provides good proof of concept work that aligns with the need to better understand neural tube mechanobiology.

      Strengths:

      The manuscript addresses a major gap by focusing on posterior spinal cord identity and secondary neurulation, a phase that is less well captured by existing neural tube organoid models (although some do recapitulate that). The manuscript situates the approach within vertebrate development and human embryology.

      Morphometric quantifications are well described and provide a dynamic interpretation of cell-level interpretation, and that is a true strength of the work. This is important to develop important metrics that can later be used to compare modulations and pathway disruption.

      The protocols are well described and documented.

      Weaknesses:

      Some key data lacks proper quantification to robustly support the claims. For example, it is not clear how many organoids in total are counted in Figure 1D to derive the % of organoids expressing certain markers (e.g. SOX2 or BRA).

      Some claims are overstated. In the manuscript, the organoids show primarily dorsal and posterior identities under the current conditions, yet the discussion sometimes reads as if a more complete dorsoventral recapitulation is achieved. Therefore, one can either demonstrate ventral patterning (e.g., SHH / FOXA2) or reduce the claims about spinal cord identity, which, given the results, are more specific to a particular region.

      The mention of anterior organoids seems to distract the reader from the important work, which primarily focuses on the posterior identity. Further, it is not understood why SOX2 identity is reduced by Day 7 in Figure 1D. Since SOX2 in the manuscript is considered a neural marker (although also pluripotency along with NANOG, etc.), a further explanation should be provided. The author should also test the presence of PAX6, which is one of the earliest neuroectoderm markers in humans (Zhang X. et al., Cell Stem Cell 2010).

      The authors position the work as a substantial addition to the field. The work is very much welcomed; however, some claims align with an interpretation that leads the readers to understand a novelty that is beyond the work presented here. For example, in certain instances in the intro, the manuscript conveys that this work consists of the first recapitulation of spinal cord fates anterior or posterior, while other works (Rifes P. Nature Cell Biology 2020, Xue X. Nature 2024) recapitulate dorsoventral and anterior-posterior patterning and identity (albeit not of secondary neurulation) through controlled gradients of WNT and RA activity. To clearly position the importance of this work, the intro should focus on secondary neurulation and posterior identities.

      In a similar fashion, the claim that "Importantly though, to our knowledge these are the first neural organoids exhibiting a robust spinal cord transcriptome identity" is not very well understood when other neural tube organoid systems (including spinal cord identities) have been exhaustively profiled at the single cell level (Rifes P. Xue X. Abdel Fattah A.). Further explanation is therefore needed.

      The mechanical angle is important and adds to the large body of research that traces NT morphogenesis to mechanics. However, the YAP localization images can be much improved. Lower magnification images are needed to show the entire organoid to robustly convince the reader of the correct and varying localization of the YAP protein. The authors should also check for YAP-associated genes in their bulk RNA sequencing.

      The quantification of the YAP analysis in a total of 23 and 18 cells in the two conditions and in 7 organoids is by no means enough to draw a conclusion about YAP localization, and an increase in the number of cells is needed. Moreover, the use of dasatinib as an inhibitor for YAP is great, but there is no evidence shown that in this culture system, the inhibitor actually inhibits YAP. As such, IF images are required to confirm cytosolic YAP. Additionally, the authors can try other inhibitors (such as verteporfin) since most inhibitors are broadband.

      Given the mechanically oriented conclusions, other relevant works have shown posteriorized and ventralized neural tube organoids using RA and SHH activation, which were also mechanically stimulated via actuation, such as work done from the Ranga lab (Nature comm. 2021/2023). Although not strictly related to YAP, the therein molecular profiling, mechanical stimulation, lumen measurements, and NTD-like phenotype using PCP-mutated genes make these important relevant mentions since the current work adds important aspects with YAP analysis.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Blanco-Ameijeiras and collaborators describe the 3D differentiation of human pluripotent stem cells into the posterior spinal cord. The authors first test the exposure of different combinations of extrinsic signals to generate human neural organoids with distinct antero-posterior identities, as shown by bulk transcriptome analysis. They show that neural organoids, whether anterior or posterior, display tissue architecture, organisation and dynamics resembling the in vivo situation. Increasing the size of initial cell aggregates leads to the formation of a single lumen through a multi-lumen stage and a process of cell intercalation, mimicking the situation that they recently described for chick secondary neurulation (Gonzalez-Gobartt et al. Dev Cell. 2021 PMID: 33878300). The authors go on to show that, as in chick, YAP is involved in the resolution of multiple lumens into a single lumen. They conclude that their human organoid approach faithfully models human secondary neurulation, which may be instrumental in unravelling the mechanisms of human neural tube defects.

      Strengths:

      Overall, this is an important study demonstrating that lumen formation in human spinal organoids recapitulates key aspects of secondary neurulation observed in animal models. This organoid approach may be instrumental in unravelling the mechanisms of human neural tube defects.

      Weaknesses:

      The significance of the findings is tempered by several limitations. While the authors show convincing evidence that organoids undergo lumen formation with similar morphological, cellular and molecular features as seen in chick in their previous work (Gonzalez-Gobartt et al. Dev Cell. 2021 PMID: 33878300), whether this is linked to their caudal spinal cord identity is unclear.

    1. eLife Assessment

      In this valuable study, the authors performed cell-specific ribosome pulldown to identify gene expression (translatome) differences in the anterior (NT1) vs middle & posterior (NT2-9) cells of the C. elegans intestine, under fed, starved, or refeeding conditions. The data generated will be very helpful to the C. elegans community, and the evidence supporting the conclusions of the study is assessed to be solid. Some methodological caveats remain and are discussed.

    2. Reviewer #1 (Public review):

      Summary

      In this study, the authors have performed tissue-specific ribosome pulldown to identify gene expression (translatome) differences in the anterior vs posterior cells of the C. elegans intestine. They have performed this analysis in fed and fasted states of the animal. The data generated will be very useful to the C. elegans community, and the role of pyruvate shown in this study will result in interesting follow-up investigations.

      However, several strong claims made in the study are solely based on in silico predictions and are not supported by experimental evidence.

      Strengths:

      Several studies in the past have predicted different functions of the anterior (INT1) vs posterior (INT2-9) epithelial cells of the C. elegans intestine based on their anatomy and ultrastructure, but detailed characterization of differences in gene expression between these cell types (and whether indeed these are different 'cell types') was lacking prior to this study. The genes and drivers identified to be exclusively expressed in the anterior vs posterior segments of the intestine will be very helpful to selectively modulate different parts of the C. elegans intestine in future studies.

      Another strength of this study is the careful experimental design to test how the anterior vs posterior cell types of the intestine respond differently to food deprivation and recovery after return to food. These comparisons between 'states' of a cell in different physiological conditions are difficult to pick up in single-cell analyses due to low sequencing depth, which can fail to identify subtle modulation of gene expression.

      The TRAP-associated bulk RNA-seq approach used in this study is more suitable for such comparisons and provides additional information on post-transcriptional regulation during metabolic stress.

      A key finding of this study is that pyruvate levels modulate the translation state of anterior intestinal cells during fasting. Characterization of pyruvate metabolism genes, especially of the enzymes involved in its mitochondrial breakdown, provides novel insights into how gut epithelial cells respond to the acute absence of food.

      Weaknesses:

      Unlike previous TRAP-seq studies (PMID: 30580965, 36044259, 36977417) that reported sequencing data for both input and IP samples, this study only reports the sequencing data for IP samples. Since biochemical pulldowns are variable across replicates, it is difficult to know if the observed differences between different conditions are due to biological factors or differences in IP efficiency. More importantly, since two different TRAP lines were utilized in this study and a large proportion of the results focus on the differences between the translational profiles of INT1 vs INT2-9 cells, it is essential to know if the IP worked with similar efficiency for both TRAP strains that likely have different expression levels of the HA-tagged ribosomal protein. One way to estimate this would be to perform qRT-PCR of genes that are known to be enriched in all intestinal cells and determine whether their fold-enrichment over housekeeping genes (normalized to input) is similar in INT1 vs INT2-9 TRAP strains and across the fed vs fasted conditions. The authors, in fact, mention variability across biological replicates, due to which certain replicates were excluded from their WGCNA analysis.

      It appears that GFP expression is also detectable in INT2 (in addition to strong expression in INT1 in Fig.1A). Compared to INT3-9, which looks red, INT2 cells appear yellow, suggesting that the expression patterns of the two TRAP drivers are not mutually exclusive, which changes the interpretation of many of the results described in the study.

      Some parts of the study overemphasize the differences between the INT1 vs INT2-9 cell types, which is a biased representation of the results. For example, the authors specifically point out that 270 genes are differentially expressed in opposite directions in INT1 vs INT2-9 cell types during acute (30 min) fasting without mentioning the 1,268 genes that are differentially expressed in the same direction. They also do not mention here that 96% of the genes are differentially expressed in the same direction in INT1 and INT2-9 cell types after prolonged (180 min) fasting, suggesting that the divergent translational responses of these cell types are only observed in the first 30 minutes of food deprivation. Similar results have also been reported for the effect of fasting on locomotory and feeding behaviors, where 30 min of fasting produces more variable effects, which become more consistent after longer periods of fasting (PMID: 36083280). Hence, the effects of brief food deprivation should be interpreted with caution.

      Many of the interpretations of this study primarily rely on pathway enrichment analyses, which are based on the known function of genes. The function of uncharacterized genes that were found to be differentially expressed in INT1 vs INT2-9 cell types, e.g., the ShKT proteins, was not explored in this study. In addition, overreliance on pathway enrichment tools (instead of functional validation) has resulted in several conflicting findings. For example, one of the main messages of this study is that INT1 cells specialize in immune and stress response in response to fasting, which relies on pathway analysis in Figs 5E and 5F. However, pathway analysis at a different time point (shown in Figure S5A) indicates that INT2-9 cells show a much stronger increase in translation of stress and pathogen-responsive genes compared to INT1 cells. Hence, some of the results should be interpreted as different translational effects in INT1 vs INT2-9 cells after different lengths of food deprivation, without making broad claims about selective pathways being affected only in specific cell types.

      The authors have compared their TRAP-seq results with genes enriched in the anterior and posterior intestine clusters from a previously published whole-animal adult scRNA dataset (PMID: 37352352). They claim that their TRAP-seq results are in agreement with the findings of the scRNA study. However, among the 10 genes from the 'posterior intestine' scRNA cluster in Fig.S1E, six are downregulated in the INT1 vs INT2-9 comparison, while four are upregulated. Hence, there is no clear agreement between the two studies in terms of the top enriched genes in the anterior vs posterior intestine, which should be considered for cross-study comparisons in the future.

      The authors describe in the manuscript that they have performed INT1-specific RNAi for two C-type lectin genes that are upregulated during fasting. Due to a recent expansion of C-type lectin genes in C. elegans, there is a high chance of off-target effects of RNAi that is designed for members of this gene family. More trustworthy results could have been obtained using CRISPR-based loss-of-function alleles for these genes, one of which is publicly available. Also, the authors do not provide any explanation for why knockdown of these stress-response genes, which are activated in INT1 cells in response to food deprivation, results in improved resistance to pathogens. This, in fact, suggests a role of INT1 cells in increasing pathogen susceptibility, and not pathogen resistance, during food deprivation.

      Many of the studies in this field (e.g., references 2-4 in this article) have investigated the effects of food deprivation ranging from 4 hr to 24 hr, which results in activation of starvation responses in C. elegans. In contrast, the authors have used shorter time periods of fasting (30 min and 180 min), and most of their follow-up experiments have used 30 min of food deprivation. Previous work has shown that the effects of food deprivation can either accumulate over time (i.e., the effect gets stronger with longer food deprivation) or can be transient (i.e., only observed briefly after removal of food and not observed during long-term food deprivation). Starvation-induced transcription factors such as DAF-16/FoxO and HLH-30 show strong translocation to the nucleus only after 30 min of fasting. Though gene expression changes in all stages of food deprivation are of biological relevance, the authors have missed the opportunity to explore whether increased INS-7 secretion from the anterior intestine is dependent on these starvation-induced transcription factors (which can be easily tested using loss-of-function alleles) or is due to other fast-acting regulatory mechanisms induced due to the absence of food contents in the gut lumen. A previous study (PMID: 40991693) has shown that DAF-16 activation during prolonged starvation shuts down insulin peptide secretion from the intestinal epithelial cells. Hence, it is not clear if increased INS-7 secretion is only a feature of short-term food deprivation or is also a signature of long-term starvation (e.g., at 8 hr or 16 hr timepoints). Since most of the INS-7 secretion data in this study are for 30 min of fasting, it remains unknown whether the discovered regulators of INS-7 secretion can be generalized for extended food deprivation that triggers major metabolic changes, such as fat loss (e.g., conditions shown in Figure 1D).

      Two previous studies (PMID: 18025456, 40991693) have shown a strong reduction in the expression of ins-7 in the anterior intestine using GFP-based reporters (both promoter fusions and endogenous CRISPR-generated) and in whole-animal RNA-seq data from starved animals. These results are in contrast to the increased INS-7 secretion from INT1 cells during fasting that is reported in this study. The authors here have reported that INS-7 translation is higher in INT1 compared to INT2-9 during fed, acute fasted, and chronic fasted conditions, but they have not shown whether INS-7 translation is upregulated during acute and chronic fasting in INT1 cells in their TRAP-seq analysis. Knowing whether increased INS-7 secretion during acute fasting is due to increased transcription, translation, or secretion of INS-7 is crucial to resolve the discrepancy between these studies.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors set out to understand whether the discrete segments of the C.elegans intestine were specialized to carry out distinct functions during an animal's exposure and adaptation to a fast-changing nutrient environment. To achieve this, the authors used a method called Translating ribosome affinity purification (TRAP), which provides a snapshot of what genes are being translated into proteins (and therefore functionally prioritized by the animal) under different fasting and re-feeding conditions. By expressing the TRAP constructs in two distinct segments of the intestine (INT1) and (INT2-9), the authors were able to identify how these segments responded to changing nutrient availability.

      Already under steady state nutrient conditions, the authors found that INT1 and INT2-9 appeared to have different 'tasks', with INT1 expressing more immune- and stress-response related genes. Exposing animals to different regimens of starvation and refeeding also showed marked differences between the intestinal segments, and the gene expression patterns in INT1 were consistent with INT1 cells playing an integrative role in linking nutrient cues to the secretion of insulin molecules that regulate fat metabolism with food intake. In summary, the data presented catalogue, for the first time, gene expression differences between two areas of the intestine, suspected to play different roles, and through clever experiments, links these gene expression changes to responses to nutrient availability.

      Strengths:

      The data presented catalogue - for the first time and in a careful manner - gene expression differences between two areas of the intestine. They strongly support the presence of intriguing differences between two areas of the intestine in immune, metabolic, and stress-response regulation, and link these gene expression changes to the responses of these regions to nutrient availability.

      Weaknesses:

      The conclusions of this paper are mostly well-supported by data, but the relevance of the changing gene expression patterns could be better clarified and extended in the discussion.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Liu and colleagues utilize TRAP-seq to profile the repertoire of actively translated mRNAs in different intestinal cell types (anterior INT1 vs. posterior INT2-9 cells) in C. elegans. A key goal of this study was to identify transcripts differentially expressed/translated between these intestinal cell subtypes in the context of animals being well fed or subjected to acute (30 minutes) or chronic (3 hours) starvation, followed by refeeding.

      The authors identify a number of differentially expressed genes across all of the conditions tested. They then provide an initial survey of the landscape of translatome changes through Weighted Gene Network Correlation Analysis (WGNA), and some high-level functional surveys via Gene Ontology (GO) term analysis and protein domain analysis. The authors validate the enriched expression patterns of some of their identified candidate genes using fluorescent promoter fusion reporters, confirming INT1-specific expression. The authors further implicate the role of several other candidate genes in pathogen avoidance and in response to nutritional cues by knocking them down specifically in INT1 cells by RNAi. Finally, the authors identify pyruvate as a major nutrient signal coming from the bacterial diet that suppresses the release of a key insulin peptide (INS-7), and identify some of the genes expressed in INT1 that are required for this response.

      Strengths:

      (1) Good use of and justification for TRAP-seq, because scRNA-seq would be difficult under the varied conditions used (starvation, refeeding).

      (2) The manuscript is generally clear to read, and the data are generally well-presented with good supporting data that includes replicates, sample sizes, error measurements, and associated statistics.

      (3) The dataset will be an interesting resource to mine for future studies focusing on mechanisms of how particular intestinal cell types respond to different environmental signals.

      Weaknesses:

      (1) A limitation of TRAP-seq, although powerful, is that only relative comparisons can be made between genotypes/conditions to identify differentially-expressed genes, rather than assessing whether a given gene is expressed at a certain level in a cell type under a certain condition. This limitation is due to the non-specific association of sticky RNA species with the beads during the immunoprecipitation step. This is a minor point, however, and the authors do a nice job of focusing their analysis on differentially expressed transcripts in the current study.

      (2) Another limitation of the current study is that the experiments testing the role of candidate genes identified by their profiling experiments do not delve a bit deeper into providing a mechanistic understanding of the phenotypes being studied. At present, the results are thus viewed more as a genomics-based screen with some limited follow-up on interesting hits. However, this reviewer appreciates that when placed in the context of the work presented, a presentation of the profiling data along with some validation is an excellent starting point for future mechanistic studies elaborating on these interesting candidates.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The main goal of the study was to survey the dynamic responses at the level of actively translated mRNAs of the INT1 vs INT2-9 cells in response to metabolic challenge.

      Overall, the authors use established methods to perform their genome-wide analysis, and the set of differentially regulated genes is enriched for expected molecular functions and forms coherent networks in anticipated pathways.

      The validation experiments (promoter::GFP fusion reporters, INT1-specific knockdowns of highly regulated genes) further corroborate the quality of the TRAP-seq datasets generated.

      I have a few points for the authors that would further strengthen this work:

      (1) The authors rightfully focus on the top differentially-regulated candidates, but it's unclear at present how far down their fold change list would lead to expression pattern validations. It would be useful to test a few more promoter::GFP fusion reporters at different enrichment/fold-change/statistical cutoffs.

      (2) Although the INT1-specific RNAi provides a convenient strategy for rapidly perturbing and testing genes of interest for phenotypes, independently validating the knockdowns with genetic mutants, or alternatively (if genes are essential), degron alleles.

      Impact:

      The TRAP-seq data and list of differentially-expressed candidate genes will form an interesting set of high-priority candidates to study for their role in the reception and transduction of nutritional cues in response to food status and pathogens. This data will thus benefit the C. elegans community of researchers studying the mechanisms governing these phenomena.

    1. eLife Assessment

      In this useful paper, the authors present a comprehensive method for the purification of recombinant Snake Venom Metalloproteinases (SVMPs) using the MultiBac expression system, explain the self-activation of the enzymes by Zn2+ incubation, and establish high-throughput screening (HTS) techniques. The authors addressed a key problem: producing a substantial amount of pure and enzymatically active SVMPs required for structural and functional studies. Altogether, this work builds a solid foundation for the large-scale production of active SVMPs for future biochemical and structural characterization as well as for drug discovery, albeit leaving certain caveats about the universal applicability of the described methodology for the production of any recombinant SVMPs.

    2. Reviewer #1 (Public review):

      Summary:

      The authors Hall et al. establish a purification method for snake venom metalloproteinases (SVMPs). By generating a generic approach to purify this divergent class of recombinant proteins, they enhance the field's accessibility to larger quantities of SVMPs with confirmed activity and, for some, characterized kinetics. In some cases, the recombinant protein displayed comparable substrate specificity and substrate recognition compared to the native enzyme, providing convincing evidence of the authors' successful recombinant expression strategy. Beyond describing their route towards protein purification, they further provide evidence for self-activation upon Zn2+ incubation. They further provide insights on how to design high-throughput screening (HTS) methods for drug discovery and outline future perspectives for the in-depth characterization of these enzyme classes to enable the development of novel biomedical applications.

      Strengths:

      The study is well-presented and structured in a compelling way. The purification strategy results in highly pure protein products, well characterized by size exclusion chromatography, SDS page as well as confirmed by mass spectrometry analysis. Further, a significant portion of the manuscript focuses on enzyme activity, thereby validating function. Particularly convincing is the comparability between recombinant vs. native enzymes; this is successfully exemplified by insulin B digestion. By testing the fluorogenic substrate, the authors provide evidence that their production method of recombinant protein can open up possibilities in HTS. Since their purification method can be applied to three structurally variable SVMP classes, this demonstrates the robust nature of the approach.

      Weaknesses:

      The universal applicability of the approach could be emphasized more clearly. The potential for this generic protocol for recombinant SVMP zymogen production to be adapted to other SVMPs is somewhat obscured by the detailed optimization steps. A general schematic overview would strengthen the manuscript, presented as a final model, to illustrate how this strategy can be extended to other targets with similar features. Such a schematic might, for example, outline the propeptide fusion design, including its tags, relevant optimizations during expression, lysis, purification (e.g., strategies for metal ion removal and maintenance of protease inactivity), as well as the controllable auto-activation.

      The product obtained from the purification protocol appears to be a heterogeneous mixture of self-activated and intact protein species. The protocol would benefit from improved control over the self-activation process. The Methods section does not indicate whether residual metal ions were attempted to be removed during the purification, which could influence premature activation. Additionally, it has not been discussed whether the shift to pH 8 in the purification process is necessary from the initial steps onwards, given that a lower pH would be expected to maintain enzyme latency.

      The characterization of PIII activity using the fluorogenic peptide effectively links the project to its broader implications for drug design. However, the absence of comparable solutions for PI and PII classes limits the overall scope and impact of the finding.

      Overall, the authors successfully purified active SVMP proteins of all three structurally diverse classes in high quality and provided convincing evidence throughout the manuscript to support their claims. The described method will be of use for a broader community working with self-activating and cytotoxic proteases.

    3. Reviewer #2 (Public review):

      Summary:

      The aim of the study by Hall et al. was to establish a generic method for the production of Snake Venom Metalloproteases (SVMPs). These have been difficult to purify in the mg quantities required for mechanistic, biochemical, and structural studies.

      Strengths:

      The authors have successfully applied the MultiBac system and describe with a high level of detail the downstream purification methods applied to purify the SVMP PI, PII, and PIII. The paper carefully presents the non-successful approaches taken (such as expression of mature proteins, the use of protease inhibitors, prodomain segments, and co-expression of disulfide-isomerases) before establishing the construct and expression conditions required. The authors finally convincingly describe various activity assays to demonstrate the activity of the purified enzymes in a variety of established SVMP assays.

      Weaknesses:

      The manuscript suffers from a lack of bottoming out and stringent scientific procedures in the methodology and the characterization of the generated enzymes.

      As an example, a further characterization of the generated protein fragments in Figure 3 by intact mass spectroscopy would have aided in accurate mass determination rather than relying on SEC elution volumes against a standard. Protein shape and charge can affect migration in SEC. Also, the analysis of N-linked glycosylation demonstrates some reactivity of PIII to PNGase F, but fails to conclude whether one or more sites are occupied, or whether other types of glycosylation is present. Again, intact mass experiments would have resolved such issues.

      The activity assays in Figure 4 are not performed consistently with kinetic assays and degradation assays performed for some, but not all, enzymes, and there is no Echis ocellatus comparison in Figure 4h. Overall, whilst not affecting the main conclusion, this leaves the reader with an impression of preliminary data being presented. For consistency, application of the same assays to all enzymes (high-grade purified) would have provided the reader with a fuller picture.

      Overall, the data presented demonstrates a very credible path for the production of active SVMP for further downstream characterization. The generality of the approach to all SVMP from different snakes remains to be demonstrated by the community, but if generally applicable, the method will enable numerous studies with the aim of either utilizing SVMPS as therapeutic agents or to enable the generation of specific anti-venom reagents, such as antibodies or small molecule inhibitors.

    4. Reviewer #3 (Public review):

      Summary:

      The presented study describes the long journey towards the expression of members' SVMP toxins from snake venom, which are toxins of major importance in a snakebite scenario. As in the past, their functional analysis relied on challenging isolation; the toxins' heterologous expression offers a potential solution to some major obstacles hindering a better understanding of toxin pathophysiology. Through a series of laborious and elegantly crafted experiments, including the reporting of various failed attempts, the authors establish the expression of all three SVMP subtypes and prove their activity in bioassays. The expression is carried out as naturally occurring zymogens that autocleave upon exposure to zinc, which is a novel modus operandi for yielding fusion proteins and sheds also some new light on the potential mechanism that snakes use to activate enzymatic toxins from zymogenic preforms.

      Strengths:

      The manuscript draws from an extensive portfolio of well-reasoned and hypothesis-driven experiments that lead to a stepwise solution. The wetlands data generated is outstanding, although not all experiments along this rocky road to victory were successful. A major strength of the paper is that, translationally speaking, it opens up novel routes for biodiscovery since a first reliable platform for expression of an understudied, yet potent toxin class is established. The discovered strategy to pursue expression as zymogens could see broad application in venom biotechnology, where several toxin types are pending successful expression. The work further provides better insights into how snake toxins are processed.

      Weaknesses:

      The manuscript contains several chapters reporting failed experiments, which makes it difficult to follow in places. The reporting of experimental details, especially sample sizes and replicates, could be optimised. At the time of writing, it remains unclear whether the glycosilations detected at a pIII SVMP could have an impact on the bioactivities measured, which is a major aspect, and future follow-ups should clarify this. Finally, the work, albeit of critical importance, would benefit from a more down-to-earth evaluation of its findings, as still various persistent obstacles that need to be overcome.

      Major comments to the manuscript:

      (1) Lines 148-149: "indicating that expressing inactivated SVMPs could be a viable, although inefficient, approach". I think this text serves a good purpose to express some thoughts on the nature of how the current draft is set up. It is quite established that various proteases cause extreme viability losses to their expression host (whether due to toxicity, but surely also because of metabolic burden), which is why their expression as inactive fusion proteins is the default strategy in all cases I have thus far seen. I believe that, especially in venom studies, this is of importance given the increased toxicity often targeting cellular integrity, and especially here, because Echis are known to feed on arthropods at younger life history stages, making it very likely that some venom components are especially active against insects and other invertebrates. With that in mind, I would argue that exploring their production in inactive form is the obvious strategy one would come up with and not really the conclusion of a series of (well-conducted and scientifically sound!) experiments. For me, the insight of inactive expression is largely confirmatory of what is established, unless I miss something in the authors' rationale. If yes, it would be important to clarify that in the online version.

      (2) Line 173: Here, Alphafold 3 was used, whereas in previous sections (e.g., line 153, line 210), it was Alphafold 2. I suggest using one release across the manuscript.

      (3) Line 252-254: I fully agree, the PIII SVMP is glycosylated. Glycosylation is an important mediator of snake venom activity, and several works have described their importance in the field. This raises the question, which glycosylations have been introduced here in the SVMP, and to verify that these are glycosylations that belong to those found in snakes. This is important as insects facilitate thousands of N- and O- O-glycosylations to modulate the activity of their proteome, of which many are specific to insects. If some of these were integrated into the SVMP, this could have an impact on downstream produced bioassays and also antigenicity (the surface would be somewhat different from natural toxins, causing different selection).

      (4) General comment for the bioassays: It would be good to specify the replicates again and report the data, including standard deviations.

      Discussion:

      I think the data generated in the study is very valuable and will be instrumental for pushing the frontiers in SVMP research, but still I would like to see a bit of modesty in their discussion. As I have pointed out above, it is unclear which effect the glycosilations may have (i.e., are the glycosilations found reminiscent of natural ones?), despite their being functionally important. Also, yes, isolation of SVMPs is challenging, but the reality is that their expression is equally challenging, as evidenced by the heaps of presented negative data (with which I have no problems, I think reporting such is actually important). So far, the "generic" protocol has been used to express one member per structural class of Echis SVMP, but no evidence is provided that it would work equally well on other members from taxonomically more distant snakes (e.g., the pIII known from Naja oxiana). It is very likely, but at the time of writing, purely speculative. Lastly, the reality is also that the expression in insect cells can only be carried out by highly specialized labs (even in the expression world, as most laboratories work with bacterial or fungal hosts), whereas the isolation can be attempted in most venom labs. That said, production in insect cells also has economic repercussions as it will be very challenging to generate yields that are economically viable versus other systems, which is pivotal because the authors talk about bioprospecting and the toxins used in snakebite agent research. Again, I believe the paper is highly important and excellently crafted, but I think especially the discussion should see some refinement to address the drawbacks and to evaluate the paper's findings with more modesty.

    1. eLife Assessment

      The authors used genetic mutations in VANGL2 to study cell morphological changes during differentiation of hPSCs and understand the mechanisms underlying neural tube closure defects. The findings are important as they establish a quantitative, reproducible 2D human iPSC-to-neural-progenitor platform for analyzing cell-shape dynamics during differentiation. The convincing evidence provided, combined with the relative simplicity of the model and its tractability as a patient-specific and reverse genetic platform, make it attractive.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ampartzidis et al. report the establishment of an iPSC-derived neuroepithelial model to examine how mutations from spina bifida patients disrupt fundamental cellular properties that underlie neural tube closure. The authors utilize an adherent neural induction protocol that relies on dual SMAD inhibition to differentiate three previously established iPSC lines with different origins and reprogramming methods. The analysis is comprehensive and outstanding, demonstrating reproducible differentiation, apical-basal elongation, and apical constriction over an 8-day period among the 3 lines. In inhibitor studies, it is shown that apical constriction is dependent on ROCK and generates tension, which can be measured using an annular laser ablation assay. Since this pathway is dependent on PCP signaling, which is also implicated in neural tube defects, the authors investigated whether VANGL2 is required by generating 2 lines with a pathogenic patient-derived sequence variant. Both lines showed reduced apical constriction and reduced tension in the laser ablation assays. The authors then established lines obtained from amniocentesis, including 2 control and 2 spina bifida patient-derived lines. These remarkably exhibited different defects. One line showed defects in apical-basal elongation, while the other showed defects in neural differentiation. Both lines were sequenced to identify candidate variants in genes implicated in NTDs. While no smoking gun was found in the line that disrupts neural differentiation (as is often the case with NTDs), compound heterozygous MED24 variants were found in the patient whose cells were defective in apical-basal elongation. Since MED24 has been linked to this phenotype, this finding is especially significant.

      Some details are missing regarding the method to evaluate the rigor and reproducibility of the study.

      Major Comments:

      It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      For the patient-specific lines - how many lines were derived per patient?

      Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      Significance:

      This paper is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants. This will not only demonstrate that sequence variants result in loss of function but also determine which cellular behaviors are disrupted.

    3. Reviewer #2 (Public review):

      Summary:

      The authors' work focuses on studying cell morphological changes during differentiation of hPSCs into neural progenitors in a 2D monolayer setting. The authors use genetic mutations in VANGL2 and patient-derived iPSCs to show that (1) human phenotypes can be captured in the 2D differentiation assay, and (2) VANGL2 in humans is required for neural contraction, which is consistent with previous studies in animal models. The results are solid and convincing, the data are quantitative, and the manuscript is well written. The 2D model they present successfully addresses the questions posed in the manuscript. However, the broad impact of the model may be limited, as it does not contain NNE cells and does not exhibit tissue folding or tube closure, as seen in neural tube formation. Patient-derived lines are derived from amniotic fluid cells, and the experiments are performed before birth, which I find to be a remarkable achievement, showing the future of precision medicine.

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      (2) Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      (4) Figure 2d. Do the cells become thicker after recoil?

      (5) Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      Significance:

      This study establishes a quantitative, reproducible 2D human iPSC-to-neural-progenitor platform for analyzing cell-shape dynamics during differentiation. Using VANGL2 mutations and patient-derived iPSCs, the work shows that (1) human phenotypes can be captured in a 2D differentiation assay and (2) VANGL2 is required for neural contraction (apical constriction), consistent with animal studies. The results are solid, the data are quantitative, and the manuscript is well written. Although the planar system lacks non-neural ectoderm and does not exhibit tissue folding or tube closure, it provides a tractable baseline for mechanistic dissection and genotype-phenotype mapping. The derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models. However, overall, I did not learn anything substantively new from this manuscript; the conclusions largely corroborate prior observations rather than extend them. In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Ampartzidis et al., significantly extends the human induced pluripotent stem cell system originally characterized by the same group as a tool for examining cellular remodeling during differentiation stages consistent with those of human neural tube closure (Ampartzidis et al., 2023). Given that there are no direct ways to analyze cellular activity in human neural tube closure in vivo, this model represents an important platform for investigating neural tube defects which are a common and deleterious human developmental disease. Here, the authors carefully test whether this system is robust and reproducible when using hiPSC cells from different donors and pluripotency induction methods and find that despite all these variables the cellular remodeling programs that occur during early neural differentiation are statistically equivalent, suggesting that this system is a useful experimental substrate. Additionally, the carefully selected donor populations suggest these aspects of human neural tube closure are likely to be robust to sexual dimorphism and to reasonable levels of human genetic background variation, though more fully testing that proposition would require significant effort and be beyond the scope of the current work. Subsequent to this careful characterization, the authors next tested whether this system could be used to derive specific insights into cell remodeling during early neural differentiation. First, they used a reverse genetics approach to knock in a human point mutation in the critical regulator of planar cell polarity and apical constriction, Vangl2. Despite being identified in a patient, this R353C variant has not been directly functionally tested in a human system. The authors find that this variant, despite showing normal expression and phospho-regulation, leads to defects consistent with a failure in apical constriction, a key cell behavior required to drive curvature change during cranial closure. Finally, the authors test the utility of their hiPSC platform to understand human patient-specific defects by differentiating cells derived from two clinical spina bifida patients. The authors identify that one of these patients is likely to have a significant defect in fully establishing early proneural identity as well as defects in apicobasal thickening. While early remodeling occurs normally in the other patient, the authors observe significant defects in later neuronal induction and maturation. In addition, using whole exome sequencing the authors identify candidate variant loci that could underly these defects.

      Major comments:

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      Significance:

      Overall, I am enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects. This work systematizes an important and novel tool to examine the cellular basis of neural tube defects. While other hiPSC models of neural tube closure capture some tissue level dynamics, which this model does not, they require complex microfluidic approaches and have limited accessibility to direct imaging of cell remodeling. Comparatively, the relative simplicity of the reported model and the work demonstrating its tractability as a patient-specific and reverse genetic platform make it unique and attractive. This work will be of interest to a broad cross section of basic scientists interested in the cellular basis of tissue remodeling and/or the early events of nervous system development as well as clinical scientists interested in modeling the consequences of patient specific human genetic deficits identified in neural tube defect pregnancies.

    5. Author response:

      General Statements

      In this manuscript we characterize an exquisitely reproducible model of iPSC differentiation into neuroepithelial cells, use it to mechanistically study cell shape changes and planar cell polarity signaling activation during this transition, then apply it to identify patient-specific cell deficiencies in both forward and reverse genetic screens as a power tool for patient-stratification in personalized medicine. To our knowledge, we provide the first evidence of a human pathogenic mutation directly impairing apical constriction: an evolutionarily conserved behavior of epithelial cells which is the subject of intense research. 

      We are very pleased with the balanced and rigorous reviews generated through Review Commons, which we have already used to improve our manuscript. Reviewer 1 highlights that our study “is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants.” Reviewer 2 agrees that “results are solid and convincing, the data are quantitative, and the manuscript is well written”, and that our “derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models.” Reviewer 3 is “enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects.” 

      Below, we have replied to each of the reviewers’ comments.

      Description of the planned revisions

      R2.2. Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      We are happy to do this analysis fully in revision. Our initial analysis performing crosscorrelation between apical area and CDH2 protein in one line shows the highest crosscorrelation at Δt = -1, suggesting neuroepithelial CDH2 increases before apical area decreases. In contrast, the same analysis comparing apical area versus PAX6 shows Δt = 0, suggesting concurrence. This analysis will be expanded to include the other markers we quantified and the manuscript text amended accordingly. We are keen to undertake additional experiments to test whether these cells swap their key cadherins – CDH1 and CDH2 - before they begin to undergo morphological changes (see the response to Reviewer 3’s minor comment 1 immediately below).

      R3.1(Minor) There seems to be a critical window at day 5 of the differentiation protocol, both in terms of cell morphology and the marker panel presented in Figure 1i. Do the authors have any data spanning the hours from day 5 to 6? If not, I don't think they need to generate any, but do I think this is a very interesting window worthy of further discussion for a couple of reasons. First, several studies of mouse neural tube closure have shown that various aspects of cell remodeling are temporally separable. For example, between Grego-Bessa et al 2016 and Brooks et al 2020 we can infer that apicobasal elongation rapidly increases starting at E8.5, whereas apical surface area reduction and constriction are apparent somewhat earlier at E8.0. I think it would be interesting to see if this separability is conserved in humans. Second, is there a sense of how the temporal correlation between the pluripotent and early neural fate marker data presented here corroborate or contradict the emerging set of temporally resolved RNA seq data sets of mouse development at equivalent early neural stages?

      Cell shape analysis between days 5 and 6 has now been added (see the response to point 2.1 below). As the reviewer predicted, this is a transition point when apical area begins to decrease and apicobasal elongation begins to increase.

      We also thank the reviewer for this prompt to more closely compare our data to the previous mouse publications, which we have added to the discussion. The Grego-Bessa 2016 paper appears to show an increase in thickness between E7.75 and E8.5, but these are not statistically compared. Previous studies showed rapid apicobasal elongation during the period of neural fold elevation, when neuroepithelial cells apically constrict. This has now been added to the discussion: 

      Discussion: “In mice, neuroepithelial apicobasal thickness is spatially-patterned, with shorter cells at the midline under the influence of SHH signalling[14,77,78]. Apicobasal thickness of the cranial neural folds increases from ~25 µm at E7.75 to ~50 µm at E8.5[79]: closely paralleling the elongation between days 2 and 8 of differentiation in our protocol. The rate of thickening is non-uniform, with the greatest increase occurring during elevation of the neural folds[80], paralleled in our model by the rapid increase in thickness between days 4-6 as apical areas decrease. Elevation requires neuroepithelial apical constriction and these cells’ apical area also decreases between E7.75 and E8.5 in mice[79], but we and others have recently shown that this reduction is both region and sex-specific[14,81]. Specifically, apical constriction occurs in the lateral (future dorsal) neuroepithelium: this corresponds with the identity of the cells generated by the dual SMAD inhibition model we use[56]. More recently, Brooks et al[82] showed that the rapid reduction in apical area from E8-E8.5 is associated with cadherin switching from CDH1 (E-cadherin) to CDH2 (N-cadherin). This is also directly paralleled in our human system, which shows low-level co-expression of CDH1 and CDH2 at day 4 of differentiation, immediately before apical area shrinks and apicobasal thickness increases.”

      Prompted by the in vivo data in Brooks et al (2025)[82], we are keen to further explore the timing of CDH1/CDH2 switching versus apical constriction with new experimental data in revisions.

      R3.2(Minor) 2) Can the authors elaborate a bit more on what is known regarding apicobasal thickening and pseudo-stratification and how their work fits into the current understanding in the discussion? This is a very interesting and less well studied mechanism critical to closure, which their model is well suited to directly address. I am thinking mainly of the Grego-Bessa at al., 2016 work on PTEN, though interestingly the work of Ohmura et al., 2012 on the NUAK kinases also shows reduced tissue thickening (and apical constriction) and I am sure I have missed others. Given that the authors identify MED24 as a likely candidate for the lack of apicobasal thickening in one of their patient derived lines, is there any evidence that it interacts with any of the known players?

      We have now added further discussion on the mechanisms by which the neuroepithelium undergoes apicobasal elongation. Nuclear compaction is likely to be necessary to allow pseudostratification and apicobasal elongation. The reviewer’s comment has led us to realise that diminished chromatin compaction is a potential outcome of MED24 down-regulation in our GOSB2 patient-derived line. Figure 4D suggests the nuclei of our MED24 deficient patientderived line are less compacted than control equivalents and we propose to quantify nuclear volume in more detail to explore this possibility.

      Additionally, we have already expanded our discussion as suggested by the reviewer:

      Discussion: “Mechanistic separability of apical constriction and apicobasal elongation is consistent with biomechanical modelling of Xenopus neural tube closure showing that both are independently required for tissue bending[61]. Nonetheless, neuroepithelial apical constriction and apicobasal elongation are co-regulated in mouse models: for example, deletion of Nuak1/2[83], Cfl1[84], and Pten[79] all produce shorter neuroepithelium with larger apical areas. Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium.

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68]. Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85]. As general regulators of polymerase activity, MED proteins have the potential to alter the timing or level of expression of many other genes, including those already known to influence pseudostratification or apicobasal elongation. MED depletion also causes redistribution of cohesion complexes[86] which may impact chromatin compaction, reducing nuclear volume during differentiation.”

      R3.3(Minor) 3) Is there any indication that Vangl2 is weakly or locally planar polarized in this system? Figure 2F seems to suggest not, but Supplementary Figure 5 does show at least more supracellular cable like structures that may have some polarity. I ask because polarization seems to be one of the properties that differs along the anteroposterior axis of the neural plate, and I wonder if this offers some insight into the position along the axis that this system most closely models?

      VANGL2 does not appear to be planar polarised in this system. This is similar to the mouse spinal neuroepithelium, in which apical VANGL2 is homogenous but F-actin is planar polarised (Galea et al Disease Models and Mechanisms 2018). We do observe local supracellular cablelike enrichments of F-actin in the apical surface of iPSC-derived neuroepithelial cells:

      Author response image 1.

      Preliminary identification of apical supracellular cables suggestive of local polarity. Top: F-actin staining shown in inverted grey LUT highlighting enrichment along directionally-polarised cell borders (blue arrows). Bottom: Staining orientation (blue ~ X axis, red ~ Y axis) based on OrientationJ analysis illustrating localised organisation of F-actin enrichment.

      We propose to compare the length of F-actin cables and coherency of their orientation at the start and end of neuroepithelial differentiation, and in wild-type versus VANGL2mutant epithelia.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Major points

      (1) It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      These experimental details have now been clarified. Unless otherwise stated, all findings were confirmed in three independently differentiated plates from the same line or at least one differentiation from each of three lines. 

      Methods: Unless otherwise stated, for each iPSC line three independently differentiated plates were generated and analysed, with each plate representing a separate differentiation experiment performed on different days.

      (2) For the patient-specific lines - how many lines were derived per patient?

      This has now been clarified in the methods. Microfluidic reprogramming of a small number of amniocytes produces one line per patient representing a pool of clones. Subcloning from individual cells would not be possible within the timeframe of a pregnancy. 

      Methods: For patient-specific iPSC lines, one independent iPSC line was obtained per patient following microfluidic mmRNA reprogramming.

      (3) Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      We have now expanded these details:

      Methods: “VANGL2 knock-in lines were generated using CRSIPR-Cas9 homology directed repair editing by Synthego (SO-9291367-1). The guide sequence was AUGAGCGAAGGGUGCGCAAG and the donor sequence was CAATGAGTACTACTATGAGGAGGCTGAGCATGAGCGAAGGGTGTGCAAGAGGAGGGCCAGGTGGGTCCCTGGGGGAGAAGAGGAGAG.

      Sequence modification was confirmed by Sanger sequencing before delivery of the modified clones, and Sanger sequencing was repeated after expansion of the lines (Supplementary Figure 5) as well as SNP arrays (Illumina iScan, not shown) confirming genomic stability.”

      Author response image 2.

      Snapshot of Illumina iScan SNP array showing absence of chromosomal duplications or deletions in the CRISPR-modified VANGL2-knockin lines or their congenic control.

      (4) Suggested text changes.

      Some additional suggestions for improvement.

      The abstract could be more clearly written to effectively convey the study's importance. Here are some suggestions

      Line 26: Insert "apicobasal" before "elongation" - the way it is written, I initially interpreted it as anterior-posterior elongation.

      Line 29: Please specify that the lines refer to 3 different established parent iPSC lines with distinct origins and established using different reprogramming methods, plus 2 control patient-derived lines. - The reproducibility of the cell behaviors is impressive, but this is not captured in the abstract.

      Line 32: add that this mutation was introduced by CRISPR-Cas9 base/prime editing.

      The last sentence of the abstract states that the study only links apical constriction to human NTDs, but also reveals that neural differentiation and apical-basal elongation were found. The introduction could also use some editing.

      Line 71: insert "that pulls actin filaments together" after "power strokes" Line 73: "apically localized," do you mean "mediolaterally" or "radially"?

      Line 75: Can you specify that PCP components promote "mediolaterally orientated" apical constriction Lines 127: Specify that NE functions include apical basal elongation and neurodifferentiation are disrupted in patient-derived models

      All have now been corrected.

      Reviewer #2:

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      We used ZO-1 to quantify apical areas of the VANGL2-konckin lines in Figure 3. Segmentation of neuroepithelial apical areas based on F-actin staining is commonplace in the field (e.g. in the Brooks et al 2022 paper cited by another reviewer), and is generally robust because the cell junctions are much brighter than any apical fibres not associated with the apical cortex. However, we accept that at earlier stages of differentiation there may be more apical fibres when cells are cuboidal. We have therefore repeated our analysis of apical area using ZO-1 staining as suggested, analysing a more temporally-detailed time course in one iPSC line. This new analysis confirms our finding of lack of apical area change between days 2-4 of differentiation, then progressive reduction of apical area between days 4-8, further validating our system. Including nuclear images is not helpful because of the high nuclear index of pseudostratified epithelia (e.g. see Supplementary Figure 7) which means that nuclei overlap along the apicobasal axis. Individual nuclei cannot be related to their apical surface in projected images.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      The outlines on these images are not intended to show cell boundaries, but rather link landmarks visible at both timepoints to calculate cluster (not cell) change in area. This is as previously shown in Galea et al Nat Commun 2021 and Butler et al J Cell Sci 2019. We have now amended the visualisation of retraction to make representation of differences between conditions more intuitive. 

      (4) Figure 2d. Do the cells become thicker after recoil?

      This is unlikely because the ablated surface remains in the focal plane. Unfortunately, we are unable to image perpendicularly to the direction of ablation to test whether their apical surface moves in Z even by a very small amount. This has now been clarified in the results:

      Results: “The ablated surface remained within the focal plane after ablation, indicating minimal movement along the apical-basal axis.”

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      The GOSB2 iPSC line we describe does represent the in vivo situation in Med24 knockout mouse embryos, but is clearly less severe because we are still able to detect MED24 protein expressed in this line. We do not have detailed clinical data of the patient from which this line was obtained to determine whether their neurological development is normal. However, it is well established that some individuals who have spina bifida also have abnormalities in supratentorial brain development. It is therefore likely that abnormalities in neuron differentiation/maturation are concomitant with spina bifida. Our findings in the GOSB2 line complement earlier studies which also identified deficiencies in the ability of patient-derived lines to form neurons, but were unable to functionally assess neuroepithelial cell behaviours we studied. This has now been clarified in the discussion:

      Discussion: “Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium. 

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68].

      Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85].”

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      We appreciate the reviewer’s praise of our personalised medicine approach and fully agree that neural tube defects are rarely monogenic. The patient lines we studied were not intended to provide mechanistic insight, but rather to demonstrate the future applicability of our approach to patient care. Our vision is that every patient referred for fetal surgery of spina bifida will have amniocytes (collected as part of routine cystocentesis required before surgery) reprogrammed and differentiated into neuroepithelial cells, then neural progenitors, to help stratify their postnatal care. One could also picture these cells becoming an autologous source for future cellbased therapies if they pass our reproducible analysis pipeline as functional quality control. This has now been clarified in the discussion:

      Discussion: “The multi-genic nature of neural tube defect susceptibility, compounded by uncontrolled environmental risk factors (including maternal age and parity[102]), mean that patient-derived iPSC models are unlikely to provide mechanistic insight. They do provide personalised disease models which we anticipate will enable functional validation of genetic diagnoses for patients and their parents’ recurrence risk in future pregnancies, and may eventually stratify patients’ postnatal care. We also envision this model will enable quality control of patient-derived cells intended for future autologous cell replacement therapies, as is being developed in post-natal spinal cord injury[103]. Thus, the highly reproducible modelling platform we evaluate – which is robust to differences in iPSC reprogramming method, sex and ethnicity – represents a valuable tool for future mechanistic insights and personalised disease modelling applications.”

      Significance:

      In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

      We disagree with the reviewer that “the model was unsuccessful in one of the two patientderived lines”. The GOSB1 line demonstrated deficiency of neuron differentiation independently of neuroepithelial biomechanical function, whereas the GOSB2 line showed earlier failure of neuroepithelial function. We also do not, at this stage, make patient-specific predictive claims: this will require longer-term matching of cell model findings with patient phenotypes over the next 5-10 years.  

      Reviewer #3:

      Major comments

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      As the reviewer observes, our cultures cannot bend because they are adhered on a rigid surface. The apical and basal lengths of the cultures will therefore necessarily be roughly equal in length. Some inwards bending of the epithelium is expected at the edges of the dish, but these cannot be imaged. The live imaging we show in Figure 2 illustrates that, just as happens in vivo, apical constriction is asynchronous. This means not all cells will have ‘bottle’ shapes in the same culture. We now illustrate the evolution of these shapes in more detail in Supplementary Figure 1.

      Additionally, the reviewer’s comment motivated us to investigate local buckles in the apical surface of our cultures when their apical surfaces are dilated by ROCK inhibition. We hypothesised that the very straight apical surface in normal cultures is achieved by a balance of apical cell size and tension with pressure differences at the cell-liquid interface. Consistent with our expectation, the apical surface of ROCK-inhibited cultures becomes wrinkled (Supplementary figure 4). The VANGL2-KI lines do not develop this tortuous apical surface (as shown in Figure 3), which is to be expected given their modification is present throughout differentiation unlike the acute dilation caused by ROCK inhibition.

      This new data complements our visualisation of apical constriction in live imaging, apical accumulation of phospho-myosin, and quantification of ROCK-dependent apical tension as independent lines of evidence that our cultures undergo apical constriction. 

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      There is no significant difference in recoil between the control lines in Figures 2 and 3, albeit the data in Figure 3 is more variable (necessitating more replicates: none were excluded). We also showed laser ablation recoil data in Supplementary Figure 10, in which we did identify a graphing error (now corrected, also no significant difference in recoil from the other control groups as shown in Author response image 3).

      Author response image 3.

      Recoil following laser ablation is not significantly different between different experiments. X axis labels indicate the figure panel each set of ablation data is shown in. Points represent an independent differentiation dish.

      (4)(Minor) I think some of the commentary on the strengths and limitations of the model found in the Results section should be collated and moved to the discussion in a single paragraph. For example, this could also briefly touch on/compare to some of the other models utilizing hiPSCs (These are mentioned briefly in the intro, but this comparison could be elaborated on a bit after seeing all the great data in this work).

      These changes have now been made:

      Discussion: “Some of these limitations, potentially including inclusion of environmental risk factors, can be addressed by using alternative iPSC-derived models[93,94]. For example, if patients have suspected causative mutations in genes specific to the surface (non-neural) ectoderm, such as GRHL2/3, 3D models described by Karzbrun et al[49] or Huang et al[95] may be informative. Characterisation of surface ectoderm behaviours in those models is currently lacking. These models are particularly useful for high-throughput screens of induced mutations[95], but their reproducibility between cell lines, necessary to compare patient samples to non-congenic controls, remains to be validated. Spinal cell identities can be generated in human spinal cord organoids, although these have highly variable morphologies[96,97]. As such, each iPSC model presents limitations and opportunities, to which this study contributes a reductionist and highly reproducible system in which to quantitatively compare multiple neuroepithelial functions.”

      (5) While the authors are generally good about labeling figures by the day post smad inhibition, in some figures it is not clear either from the images or the legend text. I believe this includes supplemental figures 2,5,6,8, and 10 (apologies if I simply missed it in one or more of them)

      These have now been added.

      (6) The legend for Figure 2 refers to a panel that is not present and the remaining panel descriptions are off by a letter. I'm guessing this is a versioning error as the text itself seems largely correct, but it may be good to check for any other similar errors that snuck in

      This has now been corrected.

      (7) The cell outlines in Figure 3d are a bit hard to see both in print and on the screen, perhaps increase the displayed intensity?

      This has now been corrected.

      Description of analyses that authors prefer not to carry out

      R2.5. Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      The reviewer is correct that this is one of the exciting potential future applications of our model, which will first require us to generate stable fluorescently-tagged lines (to identify those cells which lack VANGL2). We will also need to extensively analyze controls to validate that mixing fluo-tagged and untagged lines does not alter the homogeneity of differentiation, or apical constriction, independently of VANGL2 deletion. As such, the reviewer is suggesting an altogether new project which carries considerable risk and will require us to secure dedicated funding to undertake.

      R3.8(Minor) The authors show a fascinating piece of data in Supplementary Figure 1, demonstrating that nuclear volume is halved by day 8. Do they have any indication if the DNA content remains constant (e.g., integrated DAPI density)? I suppose it must, and this is a minor point in the grand scheme, but this represents a significant nuclear remodeling and may impact the overall DNA accessibility.

      We agree with the reviewer that the reduction in nuclear volume is important data both because it informs understanding of the reduction in total cell volume, and because it suggests active chromatin compaction during differentiation. Unfortunately, the thicker epithelium and superimposition of nuclei in the differentiated condition means the laser light path is substantially different, making direct comparisons of intensity uninterpretable. Additionally, the apical-most nuclei will mostly be in G2/M phase due to interkinetic nuclear migration. As such, the comparison of DAPI integrated density between epithelial morphologies would not be informative (Author response image 4).

      Author response image 4.

      Lateral views of DAPI-stained nuclei on Days 2 and 8 of differentiation. Note the rapid loss of staining intensity below the apical pseudo-row of nuclei on Day 8. This intensity change is likely due to the apical nuclei being in G2/M phase and therefore having more DNA, and rapid loss of 405nm wavelength signal at depth.

    1. eLife Assessment

      The authors describe an interesting approach to studying the dynamics and function of membrane proteins in different lipid environments. The fundamental findings have theoretical and practical implications beyond the study of EGFR to all membrane signalling proteins. The evidence supporting the conclusions is compelling, based on the use of a nanodisk system to study membrane proteins in vitro, combined with state-of-the-art single-molecule FRET. The work will be of broad interest to cell biologists and biochemists.

    2. Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling, how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interested impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate but not all. The authors have just this bimodal model statistically and although adding a third component is a better fit, I agree with the authors that it cannot be justified statistically, given the data. Further work beyond the scope of this study would be needed to try to define further states.

    3. Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function. The manuscript describes a comprehensive study on the analysis of membrane protein function in context of different lipid environments.

      Weaknesses:

      As the implemented strategy is relatively new, some uncertainties in the interpretation of the data consequently remain. However, using state-of-the-art techniques, the authors support their results by appropriate data and sufficient controls in the revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.

      We thank the reviewer for highlighting the strengths of the study, including the use of nanodiscs, single-molecule FRET, and MD simulations to probe full-length EGFR in controlled membrane environments.

      We agree that statistical justification is important for interpreting the distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two Gaussians (µ = 2.64 and 3.43 ns) are not separable, implying they represent one broad distribution rather than two states.

      Author response table 1.

      Both the two- and three-Gaussian models include a low-value component (µ = ~1.3 ns), but the apparent improvement of the three-Gaussian model arises only from splitting the central population into two overlapping Gaussians. Thus, while the BIC favors the three-Gaussian model statistically, Ashman’s D demonstrates that the central peak should not be interpreted as bimodal. Therefore, when all the distributions are fit globally, the data are best explained as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework.

      We have clarified this in the revised manuscript by adding a section in the Methods (page 26) titled Model Selection and Statistical Analysis, which describes the results of the global two- versus three-Gaussian fits evaluated using BIC and Ashman’s D. Additional details of these analyses are also provided in response to Reviewer #1, Question 8 (Recommendations for the authors).

      Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.

      We thank the reviewer for noting the strengths of our approach, particularly the use of complementary techniques and the development of a new pipeline to study lipid effects on membrane protein function.

      Weaknesses:

      Due to the relative novelty of the approach, a number of concerns remain.

      (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?

      We monitored insertion of anionic lipids into nanodiscs by performing zeta potential measurements, which report on surface charge, and cholesterol insertion by Laurdan fluorescence, which reports on membrane order. Both assays provide information at the ensemble level, not single-nanodisc resolution. We clarified this in the Methods section (see below).

      Cholesterol clustering is well documented in ternary systems with saturated lipids and sphingolipids [Veatch, Biophys J., 2003; Risselada, PNAS, 2008]. However, in unsaturated POPC-cholesterol mixtures such as those used here, cholesterol primarily alters bilayer order and large-scale segregation is not typically observed.  The addition of POPS to the POPC-cholesterol mixture perturbs cholesterol-induced ordering, lowering the likelihood of cholesterol-rich domains [Kumar, J. Mol. Graphics Modell., 2021].

      Lipid heterogeneity between nanodiscs would be expected to give rise to heterogeneity in hydrodynamic properties, including potentially broadening the dynamic light scattering (DLS) distributions. However, the full width at half maximum (FWHM) values from the DLS measurements (see Author response table 2) do not indicate a broadening with cholesterol. Statistical testing (Mann-Whitney U test for non-normal data) showed no significant difference between samples with and without cholesterol (p = 0.486; n = 4 per group). While the sample size is small making firm conclusions challenging, these results suggest that large-scale heterogeneity is unlikely.

      Author response table 2.

      In the case of POPS lipids, clustering of POPS in EGFR embedded nanodiscs is a recognized property of receptor-lipid interactions. Molecular dynamics simulations have shown that POPS, although constituting only 30% of the inner leaflet, accounts for ~50% of the lipids directly contacting EGFR [Arkhipov, Cell, 2013], underscoring that anionic lipids are preferentially recruited to the receptor’s immediate environment.

      For nanodiscs containing cholesterol and anionic lipids, our smFRET experiments were designed to isolate the effect of EGF binding. The nanodisc population is the same in the ± EGF conditions as EGF was introduced just prior to performing sm-FRET experiments, and not during nanodisc assembly. Thus, for a given lipid composition, any observed differences between ligand-free and ligand-bound states reflect conformational changes of EGFR.

      Methods, page 23, “Zeta potential measurements to quantify surface charge of nanodiscs: Data analysis was processed using the instrumental Malvern’s DTS software to obtain the mean zeta-potential value. This ensemble measurement reports the average surface charge of the nanodisc population, verifying incorporation of anionic POPS lipids.”

      Methods, page 23, “Fluorescence measurements with Laurdan to confirm cholesterol insertion into nanodiscs: The excitation spectrum was recorded by collecting the emission at 440 nm and emission spectra was recorded by exciting the sample at 385 nm. Laurdan fluorescence provides an ensemble readout of membrane order and confirms cholesterol incorporation into the nanodisc population. While laurdan does not resolve the composition of individual nanodiscs, prior work has shown that POPC–cholesterol mixtures are miscible without forming cholesterol-rich domains[91,92], thus the observed ordering changes likely reflect the intended input cholesterol content at the ensemble level.”

      (91) Veatch, S. L. & Keller, S. L. Separation of liquid phases in giant vesicles of ternary mixtures of phospholipids and cholesterol. Biophysical journal, 85(5), 3074-3083 (2003).

      (92) Risselada, H. J. & Marrink, S. J. The molecular face of lipid rafts in model membranes. Proceedings of the National Academy of Sciences 105(45), 17367–17372 (2008).

      (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?

      We thank the reviewer for these insightful questions. Yes, the EGFR:ApoA1∆49 template ratio of 100:1 was empirically determined through optimization experiments now shown in the revised Supplementary Fig. 3. Cell-free reactions were performed across a range of EGFR:ApoA1∆49 template ratios (1:2 to 1:200) and sampled at different time points (2-19 hours). As shown in the gels, EGFR expression increased with higher template ratios and longer reaction times up to ~9 hours, while ApoA1 expression became clearly detectable only after 6 hours. Based on these results, we selected an EGFR:ApoA1∆49 ratio of 100:1 and 8-hour reaction time as the optimal condition, which yielded sufficient full-length EGFR incorporated into nanodiscs for ensemble and single-molecule experiments.

      In cell-free systems, protein yield does not scale directly with DNA template concentration, as translation efficiency is limited by factors such as ribosome availability and co-translational membrane insertion [Hunt, Chem. Rev., 2024; Blackholly, Front. Mol. Biosci., 2022]. Consistent with this, we observed that ApoA1∆49 is produced at higher levels than EGFR despite the lower DNA input (Supplementary Fig. 2b). Providing an excess EGFR template prevents the reaction from becoming limited by scaffold availability and helps compensate for the fact that, as a large multi-domain receptor, EGFR expression can yield truncated as well as full-length products. This strategy ensures that sufficient full-length receptors are available for nanodisc incorporation. We will clarify this in the Methods section (see below).

      We observed little to no visible precipitation under the reported cell-free conditions, likely due to the following reasons: (i) EGFR and ApoA1∆49 are co-expressed in the cell-free reaction, and ApoA1∆49 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink (ii) ApoA1∆49 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      The sample contains donor-labeled EGFR (snap surface 594) together with acceptor-labeled lipids (cy5-labeled PE doped in the nanodisc). We assess the oligomerization state of EGFR in nanodiscs using single-molecule photobleaching of the donor channel. Snap surface 594 is a benzyl guanine derivative of Atto 594 that reacts with the SNAP tag with near-stoichiometry efficiency [Sun, Chembiochem, 2011]. Most molecules (~75%) exhibited a single photobleaching step, consistent with incorporation of a single EGFR per nanodisc [Srinivasan, Nat. Commun., 2022]. A minority of traces (~15%) showed two photobleaching steps and about ~10% of traces showed three or more photobleaching steps, consistent with occasional multiple insertions. For all smFRET analysis, we restricted the dataset to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      Methods, page 20, “Production of labeled, full-length EGFR nanodiscs: Briefly, the E.Coli slyD lysate, in vitro protein synthesis E.Coli reaction buffer, amino acids (-Methionine), Methionine, T7 Enzyme, protease inhibitor cocktail (Thermofisher Scientific), RNAse inhibitor (Roche) and DNA plasmids (20ug of EGFR and 0.2ug of ApoA1∆49) were mixed with different lipid mixtures. The DNA template ratio of EGFR:ApoA1∆49 = 100:1 was empirically chosen by testing different ratios on SDS-PAGE gels and selecting the condition that maximized full-length EGFR expression in DMPC lipids (Supplementary Fig. 3).”

      (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.

      Negative-stain TEM was performed to confirm nanodisc formation and morphology, but this method does not resolve whether a given disc contains EGFR. To directly assess receptor stoichiometry, we instead relied on single-molecule photobleaching of snap surface 594-labeled EGFR (see response to Point 2). These experiments showed that the majority of nanodiscs contain a single receptor, with a minority containing two receptors. For all smFRET analyses, we restricted data to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      We did not normalize EGFR fluorescence to total protein concentration because the bulk protein fraction after IMAC purification includes both receptor-loaded and empty nanodiscs. The latter contribute to ApoA1∆49 mass but do not contain receptors and including them would underestimate receptor occupancy. Importantly, the presence of empty nanodiscs does not affect our measurements as photobleaching and single-molecule FRET analyses selectively report only on receptor-containing nanodiscs. This clarification has been added to the Methods.

      Methods, page 26, “Fluorescence Spectroscopy: Traces with a single photobleaching step for the donor and acceptor were considered for further analysis. Regions of constant intensity in the traces were identified by a change-point algorithm95. Donor traces were assigned as FRET levels until acceptor photobleaching. The presence of empty nanodiscs does not influence these measurements, as photobleaching and single-molecule FRET analyses selectively report on receptor-containing nanodiscs.”

      (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?

      We agree that not all nanodisc-embedded EGFR molecules may be fully functional and that the fraction of folded protein could vary with lipid composition. In our ATP-binding assay, EGFR detection relies on the C-terminal SNAP-tag fused to an intrinsically disordered region. Successful labeling requires that this segment be translated, accessible, and folded sufficiently to accommodate the SNAP reaction, which imposes an additional requirement compared to the rigid, structured kinase domain where ATP binds. Misfolded or truncated EGFR molecules would therefore likely fail to label at the C-terminus. These factors strongly imply that our assay predominantly reports on receptor molecules that are intact and well folded.

      Additionally, our molecular dynamics simulations at 0% and 30% POPS support the experimental ATP-binding measurements (Fig. 2c, d). This consistency between both the experimental and simulated evidence, including at 0% POPS where reduced receptor folding might be expected, suggests that the observed lipid-dependent changes are more likely due to modulation of the functional receptor rather than receptor misfolding. We have clarified these points by adding the following

      Results, page 7, “Role of anionic lipids in EGFR kinase activity: In the presence of EGF, increasing the anionic lipid content decreased the number of contacts from 71.8 ± 1.8 to 67.8 ± 2.4, indicating increased accessibility, again in line with the experimental findings. Because detection of EGFR relies on labeling at the C-terminus and ATP binding requires an intact kinase domain, the ATPbinding assay is for receptors that are properly folded and competent for nucleotide binding. The consistency between experimental results and MD simulations suggests that the observed lipiddependent changes are more likely due to modulation of functional EGFR than to artifacts from misfolding.”

      Reviewer #1 (Recommendations for the authors):

      The experimental program presented here is excellent, and the results are highly interesting. My enthusiasm is dampened by the presentation in places which is confusing, especially Figure 3, which contains so many of the results. I also have some reservations about the bimodal interpretation of the lifetime data in Figure 3.

      We thank the reviewer for their positive assessment of our experimental approach and results. In the revised version, we have improved figure organization and readability by adding explicit labels for lipid composition and EGF presence/absence in all lifetime distributions, moving key supplementary tables into main text, and reorganizing the supplementary figures as Extended Data Figures following eLife’s format. Figures and tables now appear in the order in which they are referenced in the text to further improve readability.

      Regarding the bimodal interpretation of the lifetime distribution, we have performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC) and Ashman’s D analysis, which supported the bimodal interpretation. Details of this analysis are provided in our response to comment (8) below and included in the manuscript.

      Specific comments below:

      (1) Abstract -"Identifying and investigating this contribution have been challenging owing to the complex composition of the plasma membrane" should be "has".

      We have corrected this error in the revised manuscript.

      (2) Results - p4 - some explanation of what POPC/POPS are would be helpful.

      We have added the text below discussing POPC and POPS.

      Results, page 4, “POPC is a zwitterionic phospholipid forming neutral membranes, whereas POPS carries a net negative charge and provides anionic character to the bilayer[56]. Both PC and PS lipids are common constituents of mammalian plasma membranes, with PC enriched in the outer leaflet and PS in the inner leaflet[22].”

      (22) Lorent, J. H., Levental, K. R., Ganesan, L., Rivera-Longsworth, G., Sezgin, E., Doktorova, M., Lyman, E. & Levental, I. Plasma membranes are asymmetric in lipid unsaturation, packing and protein shape. Nature Chemical Biology 16, 644–652 (2020).

      (56) Her, C., Filoti, D. I., McLean, M. A., Sligar, S. G., Ross, J. A., Steele, H. & Laue, T. M. The charge properties of phospholipid nanodiscs. Biophysical journal 111(5), 989–998 (2016).

      (3) Figure 2b - it would be easier to compare if these were plotted on top of each other. Are we at saturating ATP binding concentration or below it? Also, please put a key to say purple - absent and orange +EGF on the figure. I am also confused as to why, with no EGF, ATP binding is high with 0% POPS, but low when EGF is present, but that then reverses with physiological lipid content.

      While we agree that a direct comparison would be easier, the ATP-binding experiments for the ± EGF conditions were actually performed independently on separate SDS-PAGE gels, which unfortunately precludes such a comparison. We have added a color key to clarify the -EGF and +EGF datasets.

      The experiments were carried out at 1 µM of the fluorescently labeled ATP analogue (atto647Nγ ATP). Reported kinetic measurements for the isolated EGFR kinase domain indicate an K<sub>m</sub> of 5.2 µM suggesting that our experimental concentration is below, but close to the saturating range ensuring sensitivity to changes in accessibility of the binding site rather than saturating all available receptors.

      We have revised the manuscript to clarify these details by including the following text:

      Results, page 6, “To investigate how the membrane composition impacts accessibility, we measured ATP binding levels for EGFR in membranes with different anionic lipid content. 1 µM of fluorescently-labeled ATP analogue, atto647N-γ ATP, which binds irreversibly to the active site, was added to samples of EGFR nanodiscs with 0%, 15%, 30% or 60% anionic lipid content in the absence or presence of EGF.”

      Methods, page 24, “ATP binding experiments: Full-length EGFR in different lipid environments was prepared using cell-free expression as described above. 1μM of snap surface 488 (New England Biolabs) and atto647N labeled gamma ATP (Jena Bioscience) was added after cell-free expression and incubated at 30 °C , 300 rpm for 60 minutes. 1μM of atto647N-γ ATP was used, corresponding to a concentration near the reported Km of 5.2 µM for ATP binding to the isolated EGFR kinase domain[93], ensuring sensitivity to lipid-dependent changes in ATP accessibility.”

      (ii) Nucleotide binding is suppressed under basal conditions, likely to ensure that the catalytic activity is promoted only upon EGF stimulation.

      The molecular dynamics simulations at 0% and 30% POPS further support this interpretation, showing that anionic lipids modulate the accessibility of the ATP-binding site in a manner consistent with experimental trends (Fig. 2c and 2d).

      We have clarified these points in the main text with the following additions:

      Results, page 6, “In the presence of EGF, ATP binding overall increased with anionic lipid content with the highest levels observed in 60% POPS bilayers. In the neutral bilayer, ligand seemed to suppress ATP binding, indicating anionic lipids are required for the regulated activation of EGFR.”

      Results, page 7, “In the absence of EGF, increasing the anionic lipid content from 0\% POPS to 30% POPS increased the number of ATP-lipid contacts 58.6±0.7 to 74.4±1.2, indicating reduced accessibility, consistent with the experimental results and suggesting anionic lipids are required for ligand-induced EGFR activity.”

      (93) Yun, C. H., Mengwasser, K. E., Toms, A. V., Woo, M. S., Greulich, H., Wong, K. K., Meyerson,M. & Eck, M.J. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. PNAS, 105(6), 2070–2075 (2008).

      (4) Figure 2d - how was the 16A distance arrived at?

      We thank the reviewer for pointing this out. The 16 Å cutoff was chosen based on the physical dimensions of the ATP analogue used in the experiments. Specifically, the largest radius of the atto647N-γ ATP molecule is ~16.9 Å, which defines the maximum distance at which lipid atoms could sterically obstruct access of ATP to the binding pocket. Accordingly, in the simulations, contacts were defined as pairs of coarse-grained atoms between lipid molecules and the residues forming the ATP-binding site (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831) separated by less than 16 Å.

      We have rewritten the rationale for selecting the 16 Å cutoff in the Methods section to improve clarity.

      Methods, page 28, “Coarse-grained, Explicit-solvent Simulations with the MARTINI Force Field: We analyzed our simulations using WHAM[108,109] to reweight the umbrella biases and compute the average values of various metrics introduced in this manuscript. Specifically, we calculated the distance between Residue 721 and Residue 1186 (EGFR C-terminus) of the protein. To quantify the accessibility of the ATP-binding site, we calculated the number of contacts between lipid molecules and the residues forming the ATP-binding pocket (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831)[110]. Close contact between the bilayer and these residues would sterically hinder ATP binding; thus, the contact number serves as a proxy for ATP-site accessibility. The cutoff distance for defining a contact was set to 16 Å, corresponding to the largest molecular radius of the fluorescent ATP analogue (atto647N-γ ATP, 16.96 Å111). Accordingly, we defined a contact as a pair of coarse-grained atoms, one from the lipid membrane and one from the ATP binding site, within a mutual distance of less than 16 Å.”

      (5) Figure 2e-h - I think a bar chart/violin plot/jitter plot would make it easier to compare the peak values. The statistics in the table should just be quoted in the text as value +/- error from the 95% confidence interval. The way it is written currently is confusing, as it implies that there is no conformational change with the addition of EGF in neutral lipids, but there is ~0.4nm one from the table. I don't understand what you mean by "The larger conformational response of these important domains suggests that the intracellular conformation may play a role in downstream signaling steps, such as binding of adaptor proteins"?

      We thank the reviewer for these suggestions. For the smFRET lifetime distributions (Figure 2j, k; previously Figure 2e, f), we have now included jitter plots of the donor lifetimes in the Supplementary Figure 11 to facilitate direct visual comparison of the median and distribution widths for each lipid composition and ±EGF conditions. The distance distributions for the ATP to C-terminus in Figure 2e, f (previously Figure 2g, h) were obtained from umbrella-sampling simulations that calculate free-energy profiles rather than raw, unbiased distance values. Because the sampling is guided by biasing potentials, individual distance values cannot be used to construct violin or jitter plots. We therefore present the simulation data only as probability density distributions, which best reflect the equilibrium distributions derived from them.

      We have also revised the text to report the median ± 95% confidence interval, improving clarity and consistency with the statistical table.

      Results, page 9: “In the neutral bilayer (0% POPS), the distributions in the absence of EGF peaks at 8.1 nm (95% CI: 8.0–8.2 nm) and in the presence of EGF peaks at 8.6 nm (95% CI: 8.5–8.7 nm) (Table 1, Supplementary Table 1). In the physiological regime of 30% POPS nanodiscs, the peak of the donor lifetime distribution shifts from 9.1 nm (95% CI: 8.9–9.2 nm) in the absence of EGF to 11.6 nm (95% CI: 11.1–12.6 nm) in the presence of EGF (Table 1, Supplementary Table 1), which is a larger EGF-induced conformational response than in neutral lipids.”

      Finally, we have rephrased the sentence in question for clarity. The revised text now reads:

      Results, page 9: “The larger conformational response observed in the presence of anionic lipids suggests that these lipids enhance the responsiveness of the intracellular domains to EGF, potentially ensuring interactions between C-terminal sites and adaptor proteins during downstream signaling.”

      (6) "r, highlighting that the charged lipids can enhance the conformational response even for protein regions far away from the plasma membrane" - is it not that the neutral membrane is just very weird and not physiological that EGFR and other proteins don't function properly?

      We agree with the reviewer that completely neutral (0% POPS) membranes are not physiological and likely do not support the native organization or activity of EGFR. We have revised the text to clarify that the 30% POPS condition represents a more native-like lipid environment that restores or stabilizes the expected conformational response, rather than "enhancing" it. The revised sentence now reads:

      Results, page 10: “Both experimental and computational results show a larger EGF-induced conformational change in the partially anionic bilayer, consistent with the notion that a partially anionic lipid bilayer provides a more native environment that supports proper receptor activation, compared to the non-physiological neutral membrane.”

      (7) "snap surface 594 on the C-terminal tail as the donor and the fluorescently-labeled lipid (Cy5) as the acceptor (Supplementary Fig. 2, 11)." Why not refer to Figure 3a here to make it easier to read?

      We have added the reference to Figure 3a, and we thank the Reviewer for the suggestion.

      (8) Figure 3 - the bimodality in many of these plots is dubious. It's very clear in some, i.e. 0% POPS +EGF, but not others. Can anything be done to justify bimodality better?

      We agree that statistical justification is important for interpreting lifetime distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two of the Gaussians are not separable, implying they represent one broad distribution rather than two discrete states. Therefore, when all the distributions are fit globally, the data are best described as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. We better justified our choice of model by incorporating the results of the global two- vs three-Gaussian fits with BIC and Ashman’s D analysis in the revised manuscript.

      Methods, page 27: “Model Selection and Statistical Analysis

      Global fitting of lifetime distributions was performed across all experimental conditions using maximum likelihood estimation. Both two-Gaussian and three-Gaussian distribution models were evaluated as described previously.62 Model performance was compared using the Bayesian Information Criterion (BIC),[101] which balances model likelihood and complexity according to

      BIC = -2 ln L + k ln n

      where L is the likelihood, k is the number of free parameters, and n is the number of singlemolecule photon bunches across all experimental conditions. A lower BIC value indicates a statistically better model[101]. The separation between Gaussian components was subsequently assessed using the Ashman’s D where a score above 2 indicates good separation[102]. For two Gaussian components with means µ1, µ2 and standard deviations σ1, σ2,

      where Dij represents the distance metric between Gaussian components i and j. All fitted parameters, likelihood values, BIC scores, and Ashman’s D values are summarized in Supplementary Table 5.”

      (101) Schwarz, G. Estimating the dimension of a model. The Annals of Statistics, 461–464 (1978).

      (102) Ashman, K. M., Bird, C. M. & Zepf, S. E. Detecting bimodality in astronomical datasets. The Astronomical Journal 108(6), 2348–2361 (1994).

      (9) Figure 3c - can you better label the POPS/POPC on here?

      We thank the reviewer for this suggestion. In the revised manuscript, Figure 3b (previously Figure 3c) has been updated to label the lipid composition corresponding to each smFRET distribution to make the comparison across conditions easier to follow.

      (10) Figure 3g - it looks like cholesterol causes a shift in both the peaks, such that the previous open and closed states are not the same, but that there are 2 new states. This is key as the authors state: "Remarkably, high anionic lipids and cholesterol content produce the same EGFR conformations but with opposite effects on signaling-suppression or enhancement." But this is only true if there really are the same conformational states for all lipid/cholesterol conditions. Again, the bimodal models used for all conditions need to be justified.

      We appreciate the reviewer’s insightful comment. We agree that the interpretation of the lifetime distributions depends on whether cholesterol and anionic lipids modulate existing conformational states or create new ones. To test this, we performed global fits of all distributions using the two- and three-Gaussian models and compared them using the Bayesian Information Criterion (BIC) and Ashman’s D, the results of which are described in detail in response to (8) above.

      Both fitting models, two- and three-Gaussian, identified the same short lifetime component (µ = 1.3 ns), suggesting this reflects a well separated conformation. While the three-Gaussian model gave a lower BIC, Ashman’s D analysis indicated that the two of the three components (µ = 2.6 ns and 3.4 ns) are not statistically separable, suggesting they represent a single broad conformational population rather than distinct states. If instead these two components reflected distinct states present under different conditions, Ashman’s D analysis would have found the opposite result. This supports our interpretation that high cholesterol and high anionic lipid content produce similar conformation ensembles with opposite effects on signaling output.

      Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework. We have clarified this rationale in the revised manuscript and added the results of the BIC and Ashman’s D analysis to support this interpretation.

      (11) Why are we jumping about between figures in the text? Figure 1d is mentioned after Figure 2. Also, DMPC is shown in the figures way before it is described in the text. It is very confusing. Figure 3 is so compact. I think it should be spread out and only shown in the order presented in the text. Different parts of the figure are referred to seemingly at random in the text. Why is DMPC first in the figure, when it is referred to last in the text?

      Following the Reviewer’s comment, we have revised the figure order and layout to improve readability and ensure consistency with the text. The previous Figures 1d-f which introduce the single-molecule fluorescence setup are now Figure 2g-i, positioned immediately before the first single-molecule FRET experiments (Fig 2j, k). The DMPC distribution in Figure 3 has been moved to the Supplementary Information (Supplementary Fig. 17), where it is shown alongside POPC, as these datasets are compared in the section “Mechanism of cholesterol inhibition of EGFR transmembrane conformational response”. The smFRET distributions in Figure 3 are now presented in the same sequence as they are discussed in the text, and the figure has been spread out for better clarity.

      (12) Throughout, I find the presentation of numerical results, their associated error, and whether they are statistically significantly different from each other confusing. A lot of this is in supplementary tables, but I think these need to go in the main text.

      To improve clarity and ensure that key quantitative results are easily accessible, we have moved the relevant supplementary tables to the main text. Specifically, the following tables have been incorporated into the main manuscript:

      (i) Median distance between the ATP binding site and the EGFR C-terminus, or between membrane and EGFR C-terminus from smFRET measurements (previously supplementary table 1 is now main table 1)

      (ii) Median distance between the membrane and the EGFR C-terminus in different anionic lipid environments (previously supplementary table 4 is now main table 2)

      (iii) Median distance between the membrane and the EGFR C-terminus in different cholesterol environments (previously supplementary table 8 and 12 is now combined to be main table 3)

      (13) Supplementary figures - in general, there is a need to consider how to combine or simplify these for eLife, as they will have to become extended data figures.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have reorganized the supplementary figures into extended data figures in accordance with eLife’s format. Specifically:

      - Supplementary Figs. 1–7 are now grouped as Extended Data Figures for Figure 1 in the main text. They are now Figure 1 - figure supplements 1–7.

      - Supplementary Fig. 8–11 is now Extended Data Figure associated with Figure 2. It is now Figure 2 - figure supplements 1–4.

      - Supplementary Figs. 12–17 are now grouped as Extended Data Figures for Figure 3. They are now Figure 3 - figure supplements 1–6.

      (14) Supplementary Figure 2 - label what the two bands are in the EGFR and pEGFR sets at the bottom of panel c.

      We thank the reviewer for this comment. The two bands shown in the EGFR and pEGFR blots in Supplementary Fig. 2d (previously Supplementary Fig. 2c) corresponds to replicate samples under identical conditions. We have now clarified this in the figure legend and labeled the lanes as “Rep 1” and “Rep 2” in the revised figure and modified the figure legend.

      Supplementary Figure 2, page 31: “(d) Western blots were performed on labelled EGFR in nanodiscs. Anti-EGFR Western blots (left) and anti-phosphotyrosine Western blots (right) tested the presence of EGFR and its ability to undergo tyrosine phosphorylation, respectively, consistent with previous experiments on similar preparations[18, 54, 55]. The two lanes in each blot correspond to replicate samples under identical conditions.”

      (15) Supplementary Figures 3+4 - a bar chart/boxplot or similar would be easier for comparison here.

      In the revised version, we have replaced the histograms with jitter plots showing the nanodisc size distributions for each condition in supplementary figures 4 and 5 (previously supplementary figures 3 and 4). The plots display individual measurements with a horizontal line indicating the mean size (mean ± standard deviation values provided in the caption).

      (16) Supplementary Figures 10, 12, 13, 15, 16 - I would jitter these.

      We have incorporated jitter plots for the relevant datasets in Supplementary Figures 11, 13, 15, 16 and 17 (previously supplementary figures 10, 12 13, 15 and 16) to provide a clearer visualization of the data distributions and median values.

      Reviewer #2 (Recommendations for the authors):

      (1) Reactions were performed in 250 µL volumes. What is the average yield of solubilized EGFR in those reactions? Are there differences in the EGFR solubilization with the various lipid mixtures?

      The amount of solubilized EGFR produced in each 250 µL cell-free reaction was below the reliable detection limit for quantitative absorbance assays. At these protein levels, little to no EGFR precipitation was observed for all lipid compositions. Although exact yields could not be determined, fluorescence-based detection confirmed the presence of functional, nanodiscincorporated EGFR suitable for smFRET and ensemble fluorescence experiments. We observed variability in total yield between independent reactions within the same lipid composition, which is common for cell-free systems, but no consistent trend attributable to lipid composition.

      (2) Figure S2: It would be better to have a larger overview of the particles on a grid to get a better impression of sample homogeneity.

      TEM images showing a larger field of view have been added for each lipid composition in Supplementary Figures 4 and 5.

      (3) Figure 2b: It appears that there is some variation in the stoichiometry of ApoA1 and EGFR within the samples. Have equal amounts of each sample been analyzed? Are there, in addition, some precipitates of EGFR? It would further be good to have a negative control without expression to get more information about the additional bands in Figure S2b. As they do not appear in the fluorescent gel, it is unlikely that they represent premature terminations of EGFR.

      The fluorescence intensity from the bound ATP analogue (Atto 647N-ATP) and from the snap surface 488 label, which binds stoichiometrically to the SNAP tag at the EGFR C-terminus, was measured for each sample. The relative amount of ATP binding was quantified for each sample by normalizing to the EGFR content (Figure 2b). This normalization accounts for the different amounts of EGFR produced in each condition.

      We did not observe any visible precipitation under the reported cell-free conditions, likely due to the following reasons:

      (i) EGFR and ApoA1 are co-expressed in the cell-free reaction, and ApoA1 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink

      (ii) ApoA1 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      A control cell-free reaction containing only ApoA1∆49 (1 µg) and no EGFR template, analyzed after affinity purification, showed a single prominent band at ~ 25 kDa (gel image below), corresponding to ApoA1, along with faint background bands typical of Ni-NTA purification from cell-lysates. These weak, non-specific bands likely arise from co-purification of endogenous E.coli proteins.  

      The ApoA1∆49-only control gel has now been included as part of the supplementary figure 2.

      (4) Figure S2c: It would be better to show the whole lanes to document the specificity of the antibodies. Anti-Phosphor antibodies are frequently of poor selectivity. In that case, a negative control with corresponding tyrosine mutations would be helpful.

      We have updated Figure S2d (previously Figure S2c) to include the full gel lanes to better illustrate the specificity of both the total EGFR and phospho-EGFR (Y1068) antibodies. The results show a single clear band at the expected molecular weight for EGFR, conforming antibody specificity.

      (5) The Results section already contains quite some discussion. I would thus recommend combining both sections.

      We thank the reviewer for the suggestion. We have now created a results and discussion section to better reflect the content of these paragraphs, with the previous discussion section now a subsection focused on implications of these results.

    1. eLife Assessment

      This valuable paper advances understanding of the role of the HGF receptor, MET, in cancer cell invasion by demonstrating HGF-induced coordinated trafficking of MET and metalloprotease MT1-MMP into invadopodia. The results are generally solid, but there are concerns about the cell biology and whether the trafficking mechanism is clinically relevant. It's also unclear whether this is a general mechanism or specific to triple-negative breast cancer cells. The paper will be of interest to cancer cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endodomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      Weaknesses:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1-MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      Weaknesses:

      (1) Inappropriate stimulation times for endocytosis and recycling assays.

      The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligand-induced RTK trafficking.

      (2) Low efficiency of MET silencing in Figure S1I.

      The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      (3) Missing information on incubation times and inconsistencies in MET protein levels.

      The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 prior to immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies.

      Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues.

      (4) Insufficient representation and randomization of microscopic data.

      For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      (5) Use of a single siRNA/shRNA per target.

      As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      (6) Insufficient controls for antibody specificity.

      The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      (7) Inadequate demonstration of MET recycling.

      MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      (8) Insufficient evidence for MET-MT1-MMP interaction.

      The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      (9) Inconsistent use of cell lines and lack of justification.

      The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT-549 in supplementary figures).

      (10) Inconsistency in invadopodia numbers under identical conditions.

      The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      (11) Questionable colocalization in some images.

      In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      (12) Abstract, Introduction, and Discussion require substantial rewriting.

      (a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context.

      (b) The introduction should better describe the cellular processes and proteins investigated in this study.

      (c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endosomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      We sincerely thank the reviewer for his/her positive feedback and for considering our study to be well executed and rigorous. The valuable suggestions and comments will certainly improve the understanding of the role of the RAB14-RCP-KIF16B axis in MET trafficking and breast cancer invasion. Below we have addressed each of the concerns and suggestions point to point raised by the reviewer.

      Weakness:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed

      We thank the reviewer for raising this point. We want to clarify that in TNBCs, the lack of the hormonal receptor progesterone receptor, estrogen receptor, and HER2 makes the overexpression of EGFR and MET crucial in terms of prognosis and treatment (PMID: 27655711, 25368674). Hence study of MET signalling and trafficking is more relevant for TNBCs compared to other cancer cells. We will add an explanation in the discussion section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      We greatly appreciate the positive assessment we have from the reviewer, who also acknowledged the novelty and relevance of our study. Below, we have carefully addressed the comments/concerns raised regarding this study and will strengthen the reliability and robustness by revisiting the data, providing additional analyses where required, and clarifying methodological details.

      Weakness:

      (1) Inappropriate stimulation times for endocytosis and recycling assays. The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligandinduced RTK trafficking.

      We understand the reviewer’s concern regarding the HGF stimulation time point for endocytosis and recycling. We want to highlight that to study the recycling/surface delivery of MET in response to HGF, we performed TIRF microscopy-based imaging, where images were taken within 1h of HGF addition (Fig. 2I). Additionally, we will incorporate surface biotinylation to show the recycling of MET as suggested in comment -7. Moreover, we have observed the effect of HGF on gelatin degradation and invadopodia formation after 3h of HGF stimulation. We were curious to know where MET resides with prolonged ligand stimulation. Hence, to study the localization of MET to invadopodia or the endocytic markers, the cells were stimulated with HGF for 2-3 hours. 

      (2) Low efficiency of MET silencing in Figure S1I. The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      We understand the concern raised by the reviewer. We want to emphasize that we have employed three different approaches to investigate the effect of MET silencing/inhibition on invadopodia formation. (i) A MET kinase inhibitor, PHA665752, which shows reduced invadopodia formation. (Fig. 1D, E). (ii) Silencing with shRNA: Since the level of silencing of MET with the shRNA was not sufficient, cells were stained with MET as a readout for MET silencing, and images of the cells with depleted MET expression were captured, and invadopodia numbers were quantified (Fig. 1F). (iii) Using the SMARTpool siRNA of MET, we have shown the MT1-MMP containing invadopodia in Fig S5E, which shows another evidence of the role of MET in invadopodia activity. An additional graph showing invadopodia formation derived from the siRNA-mediated MET silencing will be added to the revised figure.

      (3) Missing information on incubation times and inconsistencies in MET protein levels. The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 before immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies. Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues. 

      We apologise for the unintentional omission of experimental detailing about HGF or drug incubation time, which will be incorporated into the figure legend appropriately. The blot will be replaced with a more appropriate representative image.

      Regarding the decreased MET level in the drug-treated condition: literature suggests that the MET inhibitor PHA665752 also promotes MET degradation, corroborating our result shown in Fig. S1J (PMID: 15788682, 18327775). Further in Fig. S1J, the relative phosphorylation of MET when compared to the total MET level in the HGF-treated condition is higher. We will add the quantification in the revised manuscript to add more clarity.

      Next, in the fig. S1C, the rabbit anti-MET (CST, D1C2 XP) antibody has been used, which binds to a c-terminal motif of MET and identifies both the 170kDa as well as 140kDa protein representing the uncleaved and cleaved form of MET. In Fig. S1J, the mouse antiMET (CST, L6E7) antibody has been used, which binds to an N-terminal motif of MET and recognizes only the 140kDa protein.

      (4) Insufficient representation and randomization of microscopic data. For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      We thank the reviewer for raising this point. The single-cell images are shown for clarity and to visualize the subcellular features; however, the conclusions are made based on the quantitative analysis of multiple cells collected from multiple frames (at least 30 frames per condition). Here, we would like to highlight that the image acquisition has been done over random fields in a coverslip. In the graphs shown in Fig. 1B, 1C, 4F, S1F, S1H, S5J’ it can be seen that there are frames where there is no degradation or invadopodia formed, which has also been taken into account. For a better representation of the population of cellforming invadopodia, a graph showing the percentage of cells forming invadopodia will be added to the figure.

      (5) Use of a single siRNA/shRNA per target. As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      We would like to clarify that we are using SMARTPool siRNA, which contains 4 individual siRNAs for the target gene. Literature suggests that using a pool of siRNA has reduced offtarget effects compared to using single oligos for gene silencing (PMID: 14681580, 33584737, 24875475).

      While SMARTpool siRNA minimizes the off-target effect, it does not eliminate the possibility of it. To confirm that the observed phenotypes are specifically attributable to the genes investigated in this study, we will perform additional experiments using two independent siRNAs targeting RCP and RAB14. RAB4 is known to be associated with MET trafficking (PMID: 21664574, 30537020), and we have taken RAB4 as a positive control. Hence, we feel the suggested experiment is not required to support the conclusion made regarding RAB4.

      For MET, we have used shRNA and an inhibitor to show the effect of MET inhibition/perturbation in the invadopodia-associated activity, which validates the observations of siRNA-mediated gene silencing.

      We have shown the effect of MT1-MMP depletion on invadopodia formation using a CRISPR-based gene knock-out study, and another study from our group has shown the effect using siRNA (PMID: 31820782), which supports our MT1-MMP KO cell observation.    

      (6) Insufficient controls for antibody specificity. The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      MET immunofluorescence staining in the MET-depleted condition has been provided in Fig. 1F, and an immunoblot for the siRNA-mediated gene silencing has been provided in Fig. S5C. We will add the entire field of view to show the MET silencing in Fig. 1F.

      The inhibition of MET kinase activity using PHA665752 abolished the MET phosphorylation, as shown in Fig S1J. In line with Joffre et.al. Fig 3C, S2I shows increased Tyr 1234/1235 phosphorylation of M1250T MET mutant (PMID: 21642981). Further, studies have shown the specificity of the antibody by immunoblotting and immunofluorescence using MET inhibitors (PMID: 21973114, 41009793).

      For the MT1-MMP immunoblot showing significant depletion in MT1-MMP protein level by the SMARTpool siRNA has been provided in Fig. S5L. Further MT1-MMP silencing has been validated by immunofluorescence in the following studies. PMID: 22291036, 21571860, 20505159.

      (7) Inadequate demonstration of MET recycling. MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      We appreciate the reviewer’s suggestion for an alternative approach to show MET trafficking. We aim to demonstrate MET trafficking using a biochemical approach, which will be included in the revised version. 

      (8) Insufficient evidence for MET-MT1-MMP interaction. The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      We thank the reviewer for pointing out the lack of MET-MT1-MMP interaction at the endogenous level. We have carried out the immunoprecipitation of endogenous MET to validate the interaction with MT1-MMP. However, we could not capture the interaction of these proteins at endogenous levels. We hypothesize that the interaction between MT1MMP and MET may be weak in nature, with a high K<sub>d</sub> value, and accordingly, it was difficult to precipitate the endogenous MT1-MMP by MET. The immunoblot will be added to the revised manuscript and discussed.

      (9) Inconsistent use of cell lines and lack of justification. The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT549 in supplementary figures).

      MDA-MB-231 and BT-549 are two well-characterized TNBC cell lines, which are being used extensively to study breast cancer cell invasion. These two cell lines also show overexpression of MET, making them suitable model cell lines for our study. 

      MDA-MB-231 has less transfection efficiency compared to BT-549. Additionally, MET is also a difficult gene to transfect, making it hard to perform experiments in MDA-MB-231 with MET overexpression. Though most of the experiments have been performed in both cell lines, a few of the studies have been performed only in the BT-549 cells. Further, we have focused on displaying the different approaches taken to validate an observation in the main figure, which led to showing the data in distinct cell lines.

      Also, showing observations in different cell lines is a practice that has been followed by multiple authors in the past. (PMID:  39751400, 41079612, 25049275, 22366451)

      (10) Inconsistency in invadopodia numbers under identical conditions. The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      We sincerely thank the reviewer for pointing out the inconsistency in invadopodia numbers across 2 experiments. Fig. 1C has 2 conditions: UT and the HGF-treated condition. The Untreated condition has the serum-free media without any stimulation. Whereas we have added vehicle (DMSO) in Fig. 1D, E, since the drug is resuspended in DMSO. This difference in the treatment is likely to be responsible for the decreased numbers of invadopodia in Fig. 1E.

      (11) Questionable colocalization in some images. In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      We thank the reviewer for the valuable comment. The apparent lack of convincing colocalization is likely due to the relatively lower fluorescence intensity of MET at these structures. We will add the line intensity plots for the indicated puncta to show the intensity of both channels in the figure.

      To quantify the colocalization of two channels, we have used the automated image analysis software motiontracking (motiontracking.mpi-cbg.de), which has been detailed in the method section. Motiontracking considers only those objects to be colocalized if there is an overlapping area of more than 35% between the two channels. Lastly, the apparent colocalization is corrected for random colocalization, which is the random permutation of object colocalization. This makes object-based colocalization more reliable than intensitybased colocalization. 

      (12) Abstract, Introduction, and  Discussion require substantial rewriting. a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context. b) The introduction should better describe the cellular processes and proteins investigated in this study. c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

      We thank the reviewer for this suggestion. We will modify the abstract, introduction, and discussion as per the suggestion.

    1. eLife Assessment

      This study presents valuable findings by demonstrating that specific GPCR subtypes induce distinct extracellular vesicle miRNA signatures, highlighting a potential novel mechanism for intercellular communication with implications for receptor pharmacology within the field. The evidence is solid, however, more experiments are needed to determine whether the distinct extracellular vesicle miRNA signatures result from GPCR-dependent miRNA expression or GPCR-dependent incorporation of miRNAs into extracellular vesicles.

    2. Reviewer #1 (Public review):

      Summary:

      GPCRs affect the EV-miRNA cargoes

      Strengths:

      Novel idea of GPCRs-mediated control of EV loading of miRNAs

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is completely lacking.

      Comments on revisions:

      The revised version of the manuscript falls short of the required standard by lacking additional experiments. Some of the conditions for acceptability could have been met only through clarifying uncertainties via further experiments, which, unfortunately, have not been conducted.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathology processes.

      Methods:

      Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      No significant change in EV quantity or size following GPCR activation.

      Each GPCR triggered a distinct EV miRNA expression profile.

      miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Comments on revisions:

      All the comments have been taken into account. I wish the authors success in their future research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNAbinding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section (page 16-17).

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript (page 19-20). Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as an exploratory study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section (Page 19, second paragraph).

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies (page 19-20).  

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for Future Research:

      (1) Functionally validate top candidate miRNAs in recipient cells.

      We acknowledge that validating the target genes of the top candidate miRNAs is a crucial next step. In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      (2) Investigate other GPCR families and repeat in primary or disease-relevant cell lines.

      The inclusion of different GPCRs and cell lines is suggested as an area for further investigation in the discussion. (Page 19).

      (3) Apply similar approaches in in vivo models or patient samples to assess clinical relevance.

      In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      References

      Garcia-Martin, R., Wang, G., Brandão, B. B., Zanotto, T. M., Shah, S., Kumar Patel, S., Schilling, B., & Kahn, C. R. (2022). MicroRNA sequence codes for small extracellular vesicle release and cellular retention. Nature, 601(7893), 446-451. https://doi.org/10.1038/s41586021-04234-3  

      Jackson, L., Rennie, M., Poussaint, A., & Scarlata, S. (2022). Activation of Gαq sequesters specific transcripts into Ago2 particles. Sci Rep, 12(1), 8758. https://doi.org/10.1038/s41598022-12737-w  

      Liu, X.-M., & Halushka, M. K. (2025). Beyond the Bubble: A Debate on microRNA Sorting Into Extracellular Vesicles. Laboratory Investigation, 105(2), 102206. https://doi.org/10.1016/j.labinv.2024.102206  

      McKenzie, A. J., Hoshino, D., Hong, N. H., Cha, D. J., Franklin, J. L., Coffey, R. J., Patton, J. G., & Weaver, A. M. (2016). KRAS-MEK Signaling Controls Ago2 Sorting into Exosomes. Cell  Rep, 15(5), 978-987. https://doi.org/10.1016/j.celrep.2016.03.085  

      Pultar, M., Oesterreicher, J., Hartmann, J., Weigl, M., Diendorfer, A., Schimek, K., Schädl, B., Heuser, T., Brandstetter, M., Grillari, J., Sykacek, P., Hackl, M., & Holnthoner, W. (2024).Analysis of extracellular vesicle microRNA profiles reveals distinct blood and lymphatic endothelial cell origins. J Extracell Biol, 3(1), e134. https://doi.org/10.1002/jex2.134  

      Teng, Y., Ren, Y., Hu, X., Mu, J., Samykutty, A., Zhuang, X., Deng, Z., Kumar, A., Zhang, L., Merchant, M. L., Yan, J., Miller, D. M., & Zhang, H.-G. (2017). MVP-mediated exosomal sorting of miR-193a promotes colon cancer progression. Nature Communications, 8(1), 14448. https://doi.org/10.1038/ncomms14448  

      Villarroya-Beltri, C., Gutiérrez-Vázquez, C., Sánchez-Cabo, F., Pérez-Hernández, D., Vázquez, J., Martin-Cofreces, N., Martinez-Herrera, D. J., Pascual-Montano, A., Mittelbrunn, M., & Sánchez-Madrid, F. (2013). Sumoylated hnRNPA2B1 controls the sorting of miRNAs into exosomes through binding to specific motifs. Nat Commun, 4, 2980. https://doi.org/10.1038/ncomms3980

      Welsh, J. A., Goberdhan, D. C. I., O'Driscoll, L., Buzas, E. I., Blenkiron, C., Bussolati, B., Cai, H., Di Vizio, D., Driedonks, T. A. P., Erdbrügger, U., Falcon-Perez, J. M., Fu, Q. L., Hill, A. F., Lenassi, M., Lim, S. K., Mahoney, M. G., Mohanty, S., Möller, A., Nieuwland, R., . . .Witwer, K. W. (2024). Minimal information for studies of extracellular vesicles (MISEV2023): From basic to advanced approaches. J Extracell Vesicles, 13(2), e12404. https://doi.org/10.1002/jev2.12404  

      Yoon, J. H., Jo, M. H., White, E. J., De, S., Hafner, M., Zucconi, B. E., Abdelmohsen, K., Martindale, J. L., Yang, X., Wood, W. H., 3rd, Shin, Y. M., Song, J. J., Tuschl, T., Becker, K. G., Wilson, G. M., Hohng, S., & Gorospe, M. (2015). AUF1 promotes let-7b loading on Argonaute 2. Genes Dev, 29(15), 1599-1604. https://doi.org/10.1101/gad.263749.115  

      Zubkova, E., Evtushenko, E., Beloglazova, I., Osmak, G., Koshkin, P., Moschenko, A., Menshikov, M., & Parfyonova, Y. (2021). Analysis of MicroRNA Profile Alterations in Extracellular Vesicles From Mesenchymal Stromal Cells Overexpressing Stem Cell Factor. Front Cell Dev Biol, 9, 754025. https://doi.org/10.3389/fcell.2021.754025

    1. eLife Assessment

      This valuable study presents a thorough analysis of protein abundance changes caused by amino acid substitutions, using structural context to improve predictive accuracy. By deriving substitution response matrices based on solvent accessibility, the authors demonstrate that simple structural features can predict abundance effects with accuracy comparable to complex methods such as free energy calculations. The strength of the evidence is convincing, supported by robust experimental design and comprehensive analyses.

    2. Reviewer #1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seq-type measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMP-seq.

      Public Review:

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      Comments on revision:

      We have no further comments on this manscript.

    3. Reviewer #3 (Public review):

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance through the lens of protein structural information (residue solvent accessibility, secondary structure type) to derive combinations of context-specific substitution matrices that predict variant impact on protein abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor, but to showcase the degree of prediction afforded simply by utilizing structural information.

      Both the derived matrices and the underlying 'training' data are comprehensively evaluated. The authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structure-unaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility and secondary structure. The capacity for the approach to produce generalizable matrices is explored through training data combinations, highlighting factors such as the variable quality of the experimental MAVE data and the biochemical differences between the protein targets themselves, which can lead to limitations. Despite this, the authors demonstrate their simple matrix approach is generally on par with dedicated protein stability predictors in abundance effect evaluation, and even outperforms them in a niche of solvent accessible surface mutations, revealing their matrices provide orthogonal abundance-specific signal. More importantly, the authors further develop this concept to creatively show their matrices can be used to identify surface residues that have buried-like substitution profiles, which are shown to correspond to protein interface residues, post-translational modification sites, functional residues or putative degrons.

      The paper makes a strong and well-supported main point, demonstrating the widespread utility of the authors' approach, empowered through protein structural information and cutting edge MAVE datasets. This work creatively utilizes a simple concept to produce a highly interpretable tool for protein abundance prediction (and beyond), which is inspiring in the age of impenetrable machine learning models.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer # 1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seqtype measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMPseq.

      We thank the reviewer for their evaluation of our work and for their comments and feedback below.

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      We thank the reviewer for their summary of the main points of our work. Based on the suggestion by the reviewer, we have added a comparison to predictions with BLOSUM62 to our revised manuscript, noting that we have previously compared the BLOSUM62 matrix to a broader and more heterogeneous set of scores generated by MAVEs (Høie et al, 2022).

      Specific Feedback:

      Major points:

      The authors spend a good amount of space discussing how the six datasets have different distributions in abundance scores. After the development of their model is there more to say about why? Is there something that can be leveraged here to design maximally informative experiments?

      We believe that these effects arise from a combination of intrinsic differences between the systems and assay-specific effects. For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures, will play a role, as will the fact that some proteins contain multiple domains.

      Also, the sequencing-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and on the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition differences can contribute to the differences between VAMP-seq score distributions. 

      From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences. We have briefly expanded the discussion of these points in the manuscript, and we have moreover elaborated on this in subsequent work (Schulze et al., 2025).

      They compare to one more "sophisticated model" - RosettaddG - which should be more correlated with thermodynamic stability than other factors measured by VAMP-seq. However, the direct head-tohead comparison between their matrices and ddG is underdeveloped. How can this be used to dissect cases where thermodynamics are not contributing to specific substitution patterns OR in specific residues/regions that are predicted by one method better than the other? This would naturally dovetail into whether there is orthogonal information between these two that could be leveraged to create better predictions.

      We thank the reviewer for this suggestion and indeed had spent substantial effort trying to gain additional biological insights from variants for which MAVE scores or MAVE predictions do not match predicted ∆∆G values. One major caveat in this analysis is that the experimental MAVE scores, MAVE predictions and the predicted ∆∆G values are rather noisy, making it difficult to draw conclusions based on individual variants or even small subsets of variants.

      In our revised manuscript, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. 

      We find that many substitution profiles are predicted equally well by the two model types, but also that there are residues for which one method predicts substitution effects better than the other method. We have added an analysis of the characteristics of the residues and variants for which either the ∆∆G model or the substitution matrix model is most useful to rank variants. Since we only find relatively few residues for which this is the case, we do not expect a model that leverages predicted scores from both methods to perform better than ThermoMPNN across variants. 

      Perhaps beyond the scope of this baseline method, there is also ThermoMPNN and the work from Gabe Rocklin to consider as other approaches that should be more correlated only with thermodynamics.

      We acknowledge that there are other approaches to predict ∆∆G beyond Rosetta including for example ThermoMPNN and our own method called RaSP (Blaabjerg et al, eLIFE, 2023), and we have added comparisons to ThermoMPNN and RaSP in the revised manuscript. We are unsure how one would use the data from Rocklin and colleagues directly, but we note that e.g. RaSP has been benchmarked on this data and other methods have been trained on this data. We originally used Rosetta since the Rosetta model is known to be relatively robust and because it has never seen large databases during training (though we do not think that training of ThermoMPNN and RaSP would be biased towards the VAMP-seq data). We note also that we have previously compared both Rosetta calculations and RaSP with VAMP-seq data for TPMT, PTEN and NUDT15 (Blaabjerg et al, eLIFE, 2023)

      I find myself drawn to the hints of a larger idea that outliers to this model can be helpful in identifying specific aspects of proteostasis. The discussion of S109 is great in this respect, but I can't help but feel there is more to be mined from Figure S9 or other analyses of outlier higher than predicted abundance along linear or tertiary motifs.

      We agree with these points and have previously spent substantial time trying to make sense of outliers in Figure S9 and Figure S18 (Figure S8 and Figure S18 of revised manuscript). The outlier analysis was challenging, in part due to the relatively high noise levels in both experimental data and predictions, and we did not find any clear signals. Some outliers in e.g. Figure S9 are very likely the result of dataset-specific abundance score distributions, which further complicates the outlier analysis. We now note this in the revised paper and hope others will use the data to gain additional insights on proteostasis-specific effects.  

      Reviewer # 2 (Public review):

      Summary:

      This study analyzes protein abundance data from six VAMP-seq experiments, comprising over 31,000 single amino acid substitutions, to understand how different amino acids contribute to maintaining cellular protein levels. The authors develop substitution matrices that capture the average effect of amino acid changes on protein abundance in different structural contexts (buried vs. exposed residues). Their key finding is that these simple structure-based matrices can predict mutational effects on abundance with accuracy comparable to more complex physics-based stability calculations (ΔΔG).

      Major strengths:

      (1) The analysis focuses on a single molecular phenotype (abundance) measured using the same experimental approach (VAMP-seq), avoiding confounding factors present when combining data from different phenotypes (e.g., mixing stability, activity, and fitness data) or different experimental methods.

      (2) The demonstration that simple structural features (particularly solvent accessibility) can capture a significant portion of mutational effects on abundance.

      (3) The practical utility of the matrices for analyzing protein interfaces and identifying functionally important surface residues.

      We thank the reviewer for the comments above and the detailed assessment of our work.

      Major weaknesses:

      (1) The statistical rigor of the analysis could be improved. For example, when comparing exposed vs. buried classification of interface residues, or when assessing whether differences between prediction methods are significant.

      We agree with the reviewer that it is useful to determine if interface residues (or any of the residues in the six proteins) can confidently be classified as buried- or exposed-like in terms of their substitution profiles. Thus, we have expanded our approach to compare individual substitution profiles to the average profiles of buried and exposed residues to now account for the noise in the VAMP-seq data. In our updated approach, we resample the abundance score substitution profile for every residue several thousand times based on the experimental VAMP-seq scores and score standard deviations, and we then compare every resampled profile to the average profiles for buried and exposed residues, thereby obtaining residue-specific distributions of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. These RMSD distributions are typically narrow, since many variants in several datasets have small standard deviations. In the revised manuscript, we report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the resampled profiles. We do not recalculate average scores in substitution matrices for this analysis. 

      Moreover, to illustrate potential overlap in predictive performance between prediction methods more clearly than in our preprint, we have added confidence intervals in Fig. 2 and Fig. 3 of the revised manuscript. We note that the analysis in Fig. 2 is performed using a leave-one-protein-out approach, which we believe provides the cleanest assessment of how well the different models perform.

      (2) The mechanistic connection between stability and abundance is assumed rather than explained or investigated. For instance, destabilizing mutations might decrease abundance through protein quality control, but other mechanisms like degron exposure could also be at play.

      We agree that we have not provided much description of the relation between stability and abundance in our original preprint. In the revised manuscript, we provide some more detail as well as references to previous literature explaining the ways in which destabilising mutations can cause degradation. We have moreover performed and added additional analyses of the relationship between thermodynamic stability and abundance through comparisons of stability predictions and predictions performed with our substitution matrix models.

      (3) The similar performance of simple matrix-based and complex physics-based predictions calls for deeper analysis. A systematic comparison of where these approaches agree or differ could illuminate the relationship between stability and abundance. For instance, buried sites showing exposed-like behavior might indicate regions of structural plasticity, while the link between destabilization and degradation might involve partial unfolding exposing typically buried residues. The authors have all the necessary data for such analysis but don't fully exploit this opportunity.

      This is similar to a point made by reviewer 1, and our answer is similar. We were indeed hoping that our analyses would have revealed clearer differences between effects on thermodynamic protein stability and cellular abundance and have tried to find clear signals. One major caveat in performing the suggested analysis is that both the experimental MAVE scores, ∆∆G predictions and our simple matrix-based predictions are rather noisy, making it difficult to make conclusions based on individual variants or even small subsets of variants. 

      To address this point, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. We find that many substitution profiles are predicted equally well by the two model types, but we also, in particular, find solvent-exposed residues for which the substitution matrix model is the better predictor. These residues are often aspartate, glutamate and proline, suggesting that surface-level substitutions of these amino acid types often can have effects that are not captured well by a thermodynamical model, either because this model does not describe thermodynamic effects perfectly, or because in-cell effects are necessary to account for to provide an accurate description.

      (4) The pooling of data across proteins to construct the matrices needs better justification, given the observed differences in score distributions between proteins (for example, PTEN's distribution is shifted towards high abundance scores while ASPA and PRKN show more binary distributions).

      We agree with the reviewer that the differences between the score distributions are important to investigate further and keep in mind when analysing e.g. prediction outliers. However, our results show that the pooling of VAMP-seq scores across proteins does result in substitution matrices that make sense biochemically and can identify outlier residues with proteostatic functions. As we also respond to a related point by reviewer 1, the differences in score distributions likely have complex origins. In that sense, we also hope that our results can inspire experimentalists to design methods to generate data that are more comparable across proteins.

      For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures will play a role, as will the fact that some proteins contain multiple domains. Also, the sequence-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and from the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition can contribute to the differences between VAMP-seq score distributions. From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences.

      Thus, even when experiments on different proteins are performed using the same technique (VAMP-seq), quantifying the same phenomenon (cellular abundance) and done in similar ways (saturation mutagenesis, sort-seq using four FACS bins), there can still be substantial differences in the results across different systems. An interesting side result of our work is to highlight this including how such variation makes it difficult to learn across experiments. We now elaborate on these points in the revised manuscript.

      (5) Some key methodological choices require better justification. For example, combining "to" and "from" mutation profiles for PCA despite their different behaviors, or using arbitrary thresholds (like 0.05) for residue classification.

      We hope we have explained our methodological choices clearer in the revised paper.

      We removed the dependency of the threshold of 0.05 used for residue classification in Fig. S19 of the original manuscript; in the revised manuscript we only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the abundance score profiles that we resampled according to VAMP-seq score noise levels, as explained above.

      With respect to combining “to” and “from” mutational profiles for PCA, we could have also chosen to analyse these two sets of profiles separately to take potentially different behaviours along the two mutational axes into account. We do not think that there should be anything wrong with concatenating the two sets of profiles in a single analysis, since the analysis on the concatenated profiles simply expresses amino acid similarities and differences in a more general manner.

      The authors largely achieve their primary aim of showing that simple structural features can predict abundance changes. However, their secondary goal of using the matrices to identify functionally important residues would benefit from more rigorous statistical validation. While the matrices provide a useful baseline for abundance prediction, the paper could offer deeper biological insights by investigating cases where simple structure-based predictions differ from physics-based stability calculations.

      This work provides a valuable resource for the protein science community in the form of easily applicable substitution matrices. The finding that such simple features can match more complex calculations is significant for the field. However, the work's impact would be enhanced by a deeper investigation of the mechanistic implications of the observed patterns, particularly in cases where abundance changes appear decoupled from stability effects.

      We agree that disentangling stability and other effects on cellular abundance is one of the goals of this work. As discussed above, it has been difficult to find clear cases where amino acid substitutions affect abundance without stability beyond for example the (rare) effects of creating surface exposed degrons. Our new analysis, in which we compare substitution matrix-based predictions to stability predictions, does offer deeper insight into the relationship between the two predictor types and hence possibly between folding stability and abundance. 

      Reviewer #3 (Public review): 

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance (and thus stability) by utilizing structural information, specifically residue solvent accessibility and secondary structure type, to derive combinations of context-specific substitution matrices predicting variant abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor but to showcase the degree of prediction afforded simply by utilizing information on residue accessibility. The performance of their matrices is robustly evaluated using a leave-one-out approach, where the abundance effects for a single protein are predicted using the remaining datasets. Using a simple classification of buried and solvent-exposed residues, and substitution matrices derived respectively for each residue group, the authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structureunaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility or secondary structure. Interestingly, it is shown that the performance of the simple buried and exposed residue substitution matrices for predicting protein abundance is on par with Rosetta, an established and specialized protein variant stability predictor. More importantly, the authors finish off the paper by demonstrating the utility of the two matrices to identify surface residues that have buried-like substitution profiles, that are shown to correspond to protein interface residues, posttranslational modification sites, functional residues, or putative degrons.

      Strengths:

      The paper makes a strong and well-supported main point, demonstrating the utility of the authors' approach through performance comparisons with alternative substitution matrices and specialized methods alike. The matrices are rigorously evaluated without introducing bias, exploring various combinations of protein datasets. Supplemental analyses are extremely comprehensive and detailed. The applicability of the substitution matrices is explored beyond abundance prediction and could have important implications in the future for identifying functionally relevant sites.

      We thank the reviewer for the supportive comments on our work. 

      Comments:

      (1) A wider discussion of the possible reasons why matrices for certain proteins seem to correlate better than others would be extremely interesting, touching upon possible points like differences or similarities in local environments, degradation pathways, posttranslation modifications, and regulation. While the initial data structure differences provide a possible explanation, Figure S17A, B correlations show a more complicated picture.

      We agree with the reviewer that biochemical and biophysical differences between the proteins might contribute to the fact that some matrices correlate better than others. We also agree that it would be very interesting to understand these differences better. While it might be possible to examine some of the suggested causes of the differences, like differences or similarities in local environments, we have generally found that noise and differences in score distributions make such analyses difficult (see also responses to reviewers 1 and 2). For now, we will defer additional analyses to future work.

      (2) The performance analysis in Figure 2D seems to show that for particular proteins "less is more" when it comes to which datasets are best to derive the matrix from (CYP2C9, ASPA, PRKN). Are there any features (direct or proxy), that would allow to group proteins to maximize accuracy? Do the authors think on top of the buried vs exposed paradigm, another grouping dimension at the protein/domain level could improve performance?

      We don’t currently know if any protein- or domain-level features could be used to further split residues into useful categories for constructing new substitution matrices, but it is an interesting suggestion. We note that every substitution matrix consists of 380 averages, and creating too many residue groupings will cause some matrix entries to be averaged over very few abundance scores, at least with the current number of scores in the pooled VAMP-seq dataset. For example, while previous work has shown different mutational effects e.g. in helices and sheets (as one would expect), we find that a model with six matrices ({buried,exposed}x{helix,sheet,other}) does not lead to improved predictions (Fig. 2C), presumably because of an unfavourable balance between parameters and data.

      (3) While the matrices and Rosetta seem to show similar degrees of correlation, do the methods both fail and succeed on the same variants? Or do they show a degree of orthogonality and could potentially be synergistic?

      These are good questions and are related to similar questions from reviewers 1 and 2. In the revised manuscript, we have added additional analyses of differences between predictions from our substitution matrix model and a stability model, and we indeed find that the two methods show a degree of orthogonality. However, since we identify only relatively few residues for which one method performs better than the other, we don’t expect a synergistic model to outperform the stability predictor across all variants in any of the six proteins.  

      Overall, this work presents a valuable contribution by creatively utilizing a simple concept through cutting-edge datasets, which could be useful in various.

      Reviewing Editor:

      As discussed in more detail below, to strengthen the assessment, the authors are encouraged to:

      (1) Include more thorough statistical analyses, such as confidence intervals or standard errors, to better validate key claims (e.g., RMSD comparisons).

      (2) Perform a deeper comparison between substitution response matrices and ΔΔG-based predictions to uncover areas of agreement or orthogonality

      (3) Clarify the relationship between structural features, stability, and abundance to provide more mechanistic insights.

      As discussed above and below, we have added new analyses and clarifications to the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Why is a continuous version of the contact number used here, instead of a discrete count of neighbouring residues? WCN values of the residues in the core domain can be affected by residues far away (small contribution but not strictly zero; if there are many of them, it adds up).

      We have previously found WCN, which quantifies residue contact numbers in a continuous manner, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also found WCN and the cellular abundance of single substitution variants to correlate well in individual analyses of different proteins (Grønbæk-Thygesen et al., 2024; Gersing et al., 2024; Clausen et al., 2024).

      We have calculated the WCN as well as a contact number based on discrete counts of neighbouring residues for the six proteins in our dataset. When distances between residues are evaluated in the same way (i.e. using the shortest distance between any pair of heavy atoms in the side chains), and when the cutoff value used for the discrete count is equal to the r<sub>0</sub> of the WCN function, the continuous and discrete evaluations of residue contact numbers are highly and linearly correlated, and their rank correlation with the VAMP-seq data are very similar. We only observe minor contributions from residues far away in the structure on the WCN.

      Typos in SI figure captions e.g. Figure S8-11 "All predictions were performed using using...."

      Thank you for pointing this out. We have corrected the typos in Figure S8-11 (Figure S7-S10 in the revised manuscript).

      Personally, I'd appreciate a definition of these new substitution matrices under the constraints of rASA/WCN values. It was unclear to me until I read the code but we think that the definition is averaging the substitution matrix based on the clusters they are assigned to. If so, this could be straightforwardly defined in the method section with a heaviside step function.

      We have added a definition of the “buried” and “exposed” substitution matrices as a function of rASA in the methods section (“Definitions of buried and exposed residues” and “Definition of substitution matrices”) of the manuscript, as well as a definition of how we classified residues as either buried or exposed using both rASA and WCN as input. Our final substitution matrices, as shown in e.g. Fig. 2, do not depend on the WCN; only the substitution matrix results in Figure S6 (Figure S20 in the revised manuscript) depend on both WCN and rASA.

      Reviewer #2 (Recommendations for the authors):

      The following suggestions aim to strengthen the analysis and clarify the presentation of your findings:

      (1) Specific analyses to consider:

      (1.1) Analyze buried positions where the exposed matrix performs better. Understanding these cases might reveal properties of protein core regions that show unexpected mutational tolerance.

      We agree with the reviewer that a more detailed analysis of buried residues with exposed-like substitution profiles would be very interesting.

      We note that for proteins where the VAMP-seq score distribution is shifted towards high values (as it is the case for PTEN, TPMT and CYP2C9), our identification of such residues may be a result of the score distribution differences between the six datasets. To confidently identify mutationally tolerant core regions, it would be best to (a) correct for the distribution differences prior to the analysis or (b) focus the analysis on residues that fall far below the diagonal in Figure S18.

      In additional data (which can be found at https://github.com/KULL-Centre/_2024_Schulze_abundance-analysis)) ,we provide, for each of the proteins, a list of buried residues for which RMSD<sub>exposed</sub> <RMSD<sub>buried</sub> (for more than 95% of resampled substitution profiles, as described under 1.6). We have not analysed these residues further.

      (1.2) A systematic comparison of matrix-based vs. ΔΔG-based predictions could help understand both exposed sites that behave as buried (as analyzed in the paper) and buried sites that behave as exposed (1.1), potentially revealing mechanisms underlying abundance changes.

      In our revised manuscript, we have added additional analyses to compare matrixbased and ΔΔG-based predictions, focusing on exposed sites for which one prediction method captures variant effects on abundance considerably better the other prediction method. We have not investigated buried sites with exposed-like behaviour any further in this work.

      (1.3) Explore different normalization approaches when pooling data across proteins. In particular, consider using log(abundance score): if the experimental error in abundance measurements is multiplicative (which can be checked from the reported standard errors), then log transformation would convert this into a constant additive error, making the analysis more statistically sound.

      As we answer below to point 2.2, the abundance scores are, within each dataset, min-max normalised to nonsense and synonymous variant scores, and the score scale is thus in this way consistent across the six datasets. We have explained above and in the revised manuscript that abundance score distribution differences across datasets are likely partially a result of the FACS binning of assay-specific variant libraries. Using only the VAMP-seq scores (that is, without further information about the individual experiments), we cannot correct for the influence of the sorting strategy on the reported scores. A score normalisation across datasets that places all data points on a single scale would require inter-dataset references variant scores, which we do not have. We note that in a subsequent manuscript (Schulze et al, bioRxiv, 2025) we have attempted to take system- and experimentspecific score distributions into account. We now refer to this work in the revised manuscript.

      (1.4) Consider using correlation coefficients between predicted and observed abundance profiles as an alternative to RMSD, which is sensitive to the absolute values of the scores.

      We agree with the reviewer that using correlation coefficients to compare substitution profiles might also be useful, in particular for datasets with relatively unique VAMP-seq score distributions, such as the ASPA dataset. To explore this idea, we have repeated the analysis presented in Fig. S18 using the Pearson correlation coefficient r rather than the RMSD.

      As in Fig. S18, we derive r<sub>buried</sub> and r<sub>exposed</sub> for every residue in the six proteins, specifically by calculating r between the abundance score substitution profile of every individual residue and the average abundance score substitution profiles of buried and exposed residues. VAMP-seq data for the protein for which r<sub>buried</sub> and r<sub>exposed</sub> are evaluated is omitted from the calculation of average abundance score substitution profiles, and we use only monomer structures to determine whether residues are buried or exposed. 

      We show the results of this analysis in an Author response image 1 below. In each panel of the figure, r<sub>buried</sub> and r<sub>exposed</sub> are shown for individual residues of a single protein. Blue datapoints indicate residues that are solvent-exposed in the wild-type protein structures, and yellow datapoints indicate residues that are buried in the wild-type structures. Residues for which it is not the case that r<sub>buried</sub> < r<sub>exposed</sub> or r<sub>exposed</sub><r<sub>buried</sub> in more than 95% of 1000 resampled residue substitution profiles (see explanation of resampling method above) are coloured grey. “Acc.” is the balanced classification accuracy, calculated using all non-grey datapoints, indicating how many buried residues have buried-like substitution profiles (r<sub>exposed</sub><r<sub>buried</sub>) and how many solvent-exposed residues have exposed-like substitution profiles (r<sub>buried</sub> < r<sub>exposed</sub>). The classification accuracy per protein in this figure cannot be compared to the classification accuracy of the same protein in Fig. S18, since the number of datapoints used in the accuracy calculation differ between the r- and RMSD-based analyses. 

      Author response image 1.

      Comparing the r-based approach to the RMSD-based approach (Fig. S18), it is clear that the r-based method is less robust than the RMSD-based method for noisy and incomplete datasets. For the noisiest and most mutationally incomplete VAMP-seq datasets (i.e., PTEN, TPMT and CYP2C9) (Fig. 1), there are relatively few residues for which we with high confidence can determine if the substitution profile is more buried- or more exposed-like. When the VAMP-seq data is less noisy and has high mutational completeness, the r-based method becomes more robust and may thus be relevant in potential future work on new VAMP-seq data with small error bars.

      In conclusion, we find that RMSD-based approach to compare substitution profiles is more robust than an r-based approach for several of the VAMP-seq datasets that are included in our analysis. We do believe than an approach based on the correlation coefficient, or potentially several metrics, could be relevant to use, since abundance score distributions from VAMP-seq datasets can differ significantly across datasets. So as not to increase the length of the main text of our manuscript, we have not added this analysis to the revised manuscript.

      (1.5) Consider treating missing abundance scores as zero values, as they might indicate variants with very low abundance, rather than omitting them from the analysis.

      This suggestion would be most relevant for the PTEN, TPMT and CYP2C9 datasets, which all have a relatively small average mutational depth and completeness, as shown in Fig. 1B and 1C. To assess if setting missing abundance scores as zero values would be reasonable, we have compared the distributions of predicted ΔΔG values (from RaSP and ThermoMPNN) and of predicted abundance scores (from our exposure-based substitution matrices) for variants with reported and missing VAMP-seq data. We show the result in Author response image 2, with data aggregated across the six protein systems:

      Author response image 2.

      We find that variants with and without VAMP-seq data have similar ΔΔG score distributions and similar predicted abundance score distributions, and there is thus no clear enrichment of predicted loss of abundance for variants with missing VAMP-seq scores. This suggests that missing abundance scores do not necessarily indicate very low abundance. One cause of missing data might instead be problems with library generation (Matreyek et al, 2018, 2021).

      We show in Fig. S9 (Fig. S8 of the revised manuscript) that predicted scores for variants with experimental abundance scores of 0 are often overestimated for NUDT15, ASPA and PRKN, but this is not so much a problem for PTEN, TMPT and CYP2C9, the datasets with most missing scores. The lack of an enrichment of low abundance variants from the various predictors would thus still support that missing scores do not necessarily indicate low abundance.

      (1.6) Develop a proper statistical framework for comparing buried vs exposed predictions (whether using RMSD or correlations), including confidence intervals, rather than using arbitrary thresholds.

      As explained above and in the methods section of our revised manuscript, we have expanded our approach to compare the substitution profile of a residue to the average profiles of buried and exposed residues, and our method now accounts for the noise in the VAMP-seq data, making the analysis more statistically rigorous. In our expanded approach, we compare the substitution profiles of individual residues to the average profiles for buried and exposed residues 10,000 times per residue to get a residue-specific distribution of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. Individual RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values are calculated by resampling abundance scores from a Gaussian distribution defined by the experimentally reported abundance score and abundance score standard deviation per variant. We now only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> < RMSD<sub>exposed</sub> in at least 95% of our samples. We do not recalculate average scores in substitution matrices for this analysis. We have updated the plots in our manuscript, e.g. in Fig. S18 and S19 of the revised version, to indicate which residues are confidently classified as buried- or exposed-like.

      (2) Presentation improvements:

      (2.1) In Figure 4, consider removing the average abundance scores, which are not directly related to the RMSD comparison being shown.

      We have decided to keep the average abundance scores in Fig. 4 (now Fig. 5), as we find the average abundance scores useful for guiding interpretation of the RMSD values. For example, an unusually small average abundance score with a relatively small standard deviation may explain a case where RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> are both large. This is for example the case for residue G185 in ASPA. 

      In our preprint, the error bars on the average abundance scores in Fig. 4 (now Fig. 5) indicated the standard deviation across the abundance scores that were used to calculate the average per position. We have removed these error bars in the revised manuscript, as we realised that these were not necessarily helpful to the reader.

      (2.2) I am assuming that abundance scores are defined as the ratio abundance_variant/abundance_wt throughout the analysis, but I don't think this has been explicitly defined. If this is correct, please state it explicitly. In such case, log(abundance_score) would have a simple interpretation as the difference in abundance between variant and wild-type.

      Abundance scores are defined throughout the manuscript as sequence-based scores that have been min-max normalised to the abundance of nonsense and synonymous variants, i.e. abundance_score = (abundance_variant abundance_nonsense)/(abundance_wt–abundance_nonsense). We have described the normalisation of scores to wild-type and nonsense variant abundance in lines 164-166 of the original manuscript. We have now added additional information about the normalisation scheme in the methods section. We note that we did not ourselves apply this normalisation to the data; the scores were reported in this manner in the original publications that reported the VAMP-seq experiments for the six proteins.

      (2.3) Consider renaming "rASA" to the more commonly used "RSA" for relative solvent accessibility.

      We have decided to keep using “rASA” throughout the manuscript.

      (2.4) The weighted contact number function used differs from the established WCN measure (Σ1/rij²) introduced by Lin et al. (2008, Proteins). This should be acknowledged and the choice of alternative weighting scheme justified.

      As we have also responded to the first minor point of reviewer 1, we have previously found WCN, as it is defined in our manuscript, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also previously found this type of WCN to correlate well with variant abundance of individual proteins, as measured with VAMP-seq or protein fragment complementation assays (Grønbæk-Thygesen et al., 2024; Clausen et al., 2024; Gersing et al., 2024). We acknowledge that residue contact numbers or weighted contact numbers could also be expressed in other ways and that alternative contact number definitions would likely also produce values that correlate well with VAMP-seq data. Since the WCN, as defined in our manuscript, already correlates relatively well with abundance scores, we have not explored whether alternative definitions produce better correlations.  

      (2.5) Replace the phrase "in the above" with specific references to sections or simply "above" where appropriate. Also, consider replacing many instances of "moreover" with simpler alternatives such as "also" or "in addition" to improve readability.

      We have changed several sentences according to this suggestion and hope that we have improved the readability of our manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) It should be explicitly confirmed earlier that complex structures are used for NUDT15 and ASPA when assessing rASA/WCN. Additionally, it would be interesting to see the effect that deriving the matrices using NUDT15 and ASPA monomers would have.

      We have commented on the use of NUDT15 and ASPA homodimer structures earlier in the revised manuscript (specifically already in the subsection Abundance scores correlate with the degree of residue solvent-exposure section).

      When residues are classified using monomer rather than dimer structures of NUDT15 and ASPA, there is a small effect on the resulting “buried” and “exposed” substitution matrices. Entries in this set of substitution matrices calculated using either monomer or dimer structures typically differ by less than 0.05, and only a single entry differ by more than 0.1. As expected, the “exposed” matrix tend to contain slightly larger numbers when derived from dimer structures than when derived from monomer structures, meaning that when the interface residues are included in the exposed residue category, the average abundance scores of the “exposed” matrix are lowered. For buried residues, the picture is more mixed, although the overall tendency is that the interface residues make the “buried” matrix contain smaller average abundance scores for dimer compared to monomer structures. These results generally support the use of dimer structures for the residue classification.

      We here show the differences between the substitution matrices calculated with dimer or monomer structures of NUDT15 and ASPA and using data for all six proteins in our combined VAMP-seq dataset (average_abundance_score_differece = average_abundance_score_dimers – average_abundance_score _monomers):

      Author response image 3.

      We have not explored these alternative matrices further.

      (2) While the supplemental analyses are rigorous, the abundance of various metrics being presented can be confusing, especially when they seem to differ in their result. For instance, the discussion of Figure S17 (paragraph starting 428) contains mentions of mean differences but then switches to correlations, while both are presented for all panels. The claim "The datasets thus mainly differ due to differences in substitution effects in buried environments. " is well supported by the observed mean differences, but for Pearson's correlations the average panel A ,B values of buried 0.421 vs exposed 0.427 are hardly different. Which of the metrics is more meaningful, and are both needed?

      We agree with the reviewer that the claim that “The datasets thus mainly differ due to differences in substitution effects in buried environments” is not well-supported by the r between the substitution matrices, and we have removed this claim from the text.

      Since some datasets share VAMP-seq score distribution features, while others do not, the absolute difference between scores or matrices may be relevant to check for some dataset pairs, while the r may be more relevant to check for other dataset pairs. Hence, we have included both metrics in Fig S17 (Fig S11 in the revised manuscript).

      (3) Lines 337-340 - does not feel like S7 is the topic, perhaps the authors meant Figure 2A, B? In general, the supplemental figure references are out of order and panel combinations are sometimes confusing.

      We have corrected figures references to now be correct and changed the arrangement of supplemental figures so that they now occur in the correct order. We have looked through the panel combinations with clarity in mind, and hope that the current set of main and supplementary figures balances overview and detail.

      (4) Line 363 "are also are also".

      We have corrected this typo.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study analyzes the gastric fluid DNA content identified as a potential biomarker for human gastric cancer. However, the study lacks overall logicality, and several key issues require improvement and clarification. In the opinion of this reviewer, some major revisions are needed:

      (1) This manuscript lacks a comparison of gastric cancer patients' stages with PN and N+PD patients, especially T0-T2 patients.

      We are grateful for this astute remark. A comparison of gfDNA concentration among the diagnostic groups indicates a trend of increasing values as the diagnosis progresses toward malignancy. The observed values for the diagnostic groups are as follows:

      Author response table 1.

      The chart below presents the statistical analyses of the same diagnostic/tumor-stage groups (One-Way ANOVA followed by Tukey’s multiple comparison tests). It shows that gastric fluid gfDNA concentrations gradually increase with malignant progression. We observed that the initial tumor stages (T0 to T2) exhibit intermediate gfDNA levels, which in this group is significantly lower than in advanced disease (p = 0.0036), but not statistically different from non-neoplastic disease (p = 0.74).

      Author response image 1.

      (2) The comparison between gastric cancer stages seems only to reveal the difference between T3 patients and early-stage gastric cancer patients, which raises doubts about the authenticity of the previous differences between gastric cancer patients and normal patients, whether it is only due to the higher number of T3 patients.

      We appreciate the attention to detail regarding the numbers analyzed in the manuscript. Importantly, the results are meaningful because the number of subjects in each group is comparable (T0-T2, N = 65; T3, N = 91; T4, N = 63). The mean gastric fluid gfDNA values (ng/µL) increase with disease stage (T0-T2: 15.12; T3-T4: 30.75), and both are higher than the mean gfDNA values observed in non-neoplastic disease (10.81 ng/µL for N+PD and 10.10 ng/µL for PN). These subject numbers in each diagnostic group accurately reflect real-world data from a tertiary cancer center.

      (3) The prognosis evaluation is too simplistic, only considering staging factors, without taking into account other factors such as tumor pathology and the time from onset to tumor detection.

      Histopathological analyses were performed throughout the study not only for the initial diagnosis of tissue biopsies, but also for the classification of Lauren’s subtypes, tumor staging, and the assessment of the presence and extent of immune cell infiltrates. Regarding the time of disease onset, this variable is inherently unknown--by definition--at the time of a diagnostic EGD. While the prognosis definition is indeed straightforward, we believe that a simple, cost-effective, and practical approach is advantageous for patients across diverse clinical settings and is more likely to be effectively integrated into routine EGD practice.

      (4) The comparison between gfDNA and conventional pathological examination methods should be mentioned, reflecting advantages such as accuracy and patient comfort.

      We wish to reinforce that EGD, along with conventional histopathology, remains the gold standard for gastric cancer evaluation. EGD under sedation is routinely performed for diagnosis, and the collection of gastric fluids for gfDNA evaluation does not affect patient comfort. Thus, while gfDNA analysis was evidently not intended as a diagnostic EGD and biopsy replacement, it may provide added prognostic value to this exam.

      (5) There are many questions in the figures and tables. Please match the Title, Figure legends, Footnote, Alphabetic order, etc.

      We are grateful for these comments and apologize for the clerical oversight. All figures, tables, titles and figure legends have now been double-checked.

      (6) The overall logicality of the manuscript is not rigorous enough, with few discussion factors, and cannot represent the conclusions drawn.

      We assume that the unusual wording remark regarding “overall logicality” pertains to the rationale and/or reasoning of this investigational study. Our working hypothesis was that during neoplastic disease progression, tumor cells continuously proliferate and, depending on various factors, attract immune cell infiltrates. Consequently, both tumor cells and immune cells (as well as tumor-derived DNA) are released into the fluids surrounding the tumor at its various locations, including blood, urine, saliva, gastric fluids, and others. Thus, increases in DNA levels within some of these fluids have been documented and are clinically meaningful. The concurrent observation of elevated gastric fluid gfDNA levels and immune cell infiltration supports the hypothesis that increased gfDNA—which may originate not only from tumor cells but also from immune cells—could be associated with better prognosis, as suggested by this study of a large real-world patient cohort.

      In summary, we thank Reviewer #1 for his time and effort in a constructive critique of our work.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether the total DNA concentration in gastric fluid (gfDNA), collected via routine esophagogastroduodenoscopy (EGD), could serve as a diagnostic and prognostic biomarker for gastric cancer. In a large patient cohort (initial n=1,056; analyzed n=941), they found that gfDNA levels were significantly higher in gastric cancer patients compared to non-cancer, gastritis, and precancerous lesion groups. Unexpectedly, higher gfDNA concentrations were also significantly associated with better survival prognosis and positively correlated with immune cell infiltration. The authors proposed that gfDNA may reflect both tumor burden and immune activity, potentially serving as a cost-effective and convenient liquid biopsy tool to assist in gastric cancer diagnosis, staging, and follow-up.

      Strengths:

      This study is supported by a robust sample size (n=941) with clear patient classification, enabling reliable statistical analysis. It employs a simple, low-threshold method for measuring total gfDNA, making it suitable for large-scale clinical use. Clinical confounders, including age, sex, BMI, gastric fluid pH, and PPI use, were systematically controlled. The findings demonstrate both diagnostic and prognostic value of gfDNA, as its concentration can help distinguish gastric cancer patients and correlates with tumor progression and survival. Additionally, preliminary mechanistic data reveal a significant association between elevated gfDNA levels and increased immune cell infiltration in tumors (p=0.001).

      Reviewer #2 has conceptually grasped the overall rationale of the study quite well, and we are grateful for their assessment and comprehensive summary of our findings.

      Weaknesses:

      (1) The study has several notable weaknesses. The association between high gfDNA levels and better survival contradicts conventional expectations and raises concerns about the biological interpretation of the findings.

      We agree that this would be the case if the gfDNA was derived solely from tumor cells. However, the findings presented here suggest that a fraction of this DNA would be indeed derived from infiltrating immune cells. The precise determination of the origin of this increased gfDNA remains to be achieved in future follow-up studies, and these are planned to be evaluated soon, by applying DNA- and RNA-sequencing methodologies and deconvolution analyses.

      (2) The diagnostic performance of gfDNA alone was only moderate, and the study did not explore potential improvements through combination with established biomarkers. Methodological limitations include a lack of control for pre-analytical variables, the absence of longitudinal data, and imbalanced group sizes, which may affect the robustness and generalizability of the results.

      Reviewer #2 is correct that this investigational study was not designed to assess the diagnostic potential of gfDNA. Instead, its primary contribution is to provide useful prognostic information. In this regard, we have not yet explored combining gfDNA with other clinically well-established diagnostic biomarkers. We do acknowledge this current limitation as a logical follow-up that must be investigated in the near future.

      Moreover, we collected a substantial number of pre-analytical variables within the limitations of a study involving over 1,000 subjects. Longitudinal samples and data were not analyzed here, as our aim was to evaluate prognostic value at diagnosis. Although the groups are imbalanced, this accurately reflects the real-world population of a large endoscopy center within a dedicated cancer facility. Subjects were invited to participate and enter the study before sedation for the diagnostic EGD procedure; thus, samples were collected prospectively from all consenting individuals.

      Finally, to maintain a large, unbiased cohort, we did not attempt to balance the groups, allowing analysis of samples and data from all patients with compatible diagnoses (please see Results: Patient groups and diagnoses).

      (3) Additionally, key methodological details were insufficiently reported, and the ROC analysis lacked comprehensive performance metrics, limiting the study's clinical applicability.

      We are grateful for this useful suggestion. In the current version, each ROC curve (Supplementary Figures 1A and 1B) now includes the top 10 gfDNA thresholds, along with their corresponding sensitivity and specificity values (please see Suppl. Table 1). The thresholds are ordered from-best-to-worst based on the classic Youden’s J statistic, as follows:

      Youden Index = specificity + sensitivity – 1 [Youden WJ. Index for rating diagnostic tests. Cancer 3:32-35, 1950. PMID: 15405679]. We have made an effort to provide all the key methodological details requested, but we would be glad to add further information upon specific request.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an excellent study by a superb investigator who discovered and is championing the field of migrasomes. This study contains a hidden "gem" - the induction of migrasomes by hypotonicity and how that happens. In summary, an outstanding fundamental phenomenon (migrasomes) en route to becoming transitionally highly significant.

      Strengths:

      Innovative approach at several levels. Migrasomes - discovered by Dr Yu's group - are an outstanding biological phenomenon of fundamental interest and now of potentially practical value.

      Weaknesses:

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      We sincerely thank the reviewer for the encouraging and insightful comments. We fully agree that the fundamental aspects of migrasome biology are of great importance and deserve deeper exploration.

      In line with the reviewer’s suggestion, we have expanded our discussion on the basic biology of engineered migrasomes (eMigs). A recent study by the Okochi group at the Tokyo Institute of Technology demonstrated that hypoosmotic stress induces the formation of migrasome-like vesicles, involving cytoplasmic influx and requiring cholesterol for their formation (DOI: 10.1002/1873-3468.14816, February 2024). Building on this, our study provides a detailed characterization of hypoosmotic stressinduced eMig formation, and further compares the biophysical properties of natural migrasomes and eMigs. Notably, the inherent stability of eMigs makes them particularly promising as a vaccine platform.

      Finally, we would like to note that our laboratory continues to investigate multiple aspects of migrasome biology. In collaboration with our colleagues, we recently completed a study elucidating the mechanical forces involved in migrasome formation (DOI: 10.1016/j.bpj.2024.12.029), which further complements the findings presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors' report describes a novel vaccine platform derived from a newly discovered organelle called a migrasome. First, the authors address a technical hurdle in using migrasomes as a vaccine platform. Natural migrasome formation occurs at low levels and is labor intensive, however, by understanding the molecular underpinning of migrasome formation, the authors have designed a method to make engineered migrasomes from cultured, cells at higher yields utilizing a robust process. These engineered migrasomes behave like natural migrasomes. Next, the authors immunized mice with migrasomes that either expressed a model peptide or the SARSCoV-2 spike protein. Antibodies against the spike protein were raised that could be boosted by a 2nd vaccination and these antibodies were functional as assessed by an in vitro pseudoviral assay. This new vaccine platform has the potential to overcome obstacles such as cold chain issues for vaccines like messenger RNA that require very stringent storage conditions.

      Strengths:

      The authors present very robust studies detailing the biology behind migrasome formation and this fundamental understanding was used to form engineered migrasomes, which makes it possible to utilize migrasomes as a vaccine platform. The characterization of engineered migrasomes is thorough and establishes comparability with naturally occurring migrasomes. The biophysical characterization of the migrasomes is well done including thermal stability and characterization of the particle size (important characterizations for a good vaccine).

      Weaknesses:

      With a new vaccine platform technology, it would be nice to compare them head-tohead against a proven technology. The authors would improve the manuscript if they made some comparisons to other vaccine platforms such as a SARS-CoV-2 mRNA vaccine or even an adjuvanted recombinant spike protein. This would demonstrate a migrasome-based vaccine could elicit responses comparable to a proven vaccine technology. 

      We thank the reviewer for the thoughtful evaluation and constructive suggestions, which have helped us strengthen the manuscript. 

      Comparison with proven vaccine technologies:

      In response to the reviewer’s comment, we now include a direct comparison of the antibody responses elicited by eMig-Spike and a conventional recombinant S1 protein vaccine formulated with Alum. As shown in the revised manuscript (Author response image 1), the levels of S1-specific IgG induced by the eMig-based platform were comparable to those induced by the S1+Alum formulation. This comparison supports the potential of eMigs as a competitive alternative to established vaccine platforms. 

      Author response image 1.

      eMigrasome-based vaccination showed similar efficacy compared with adjuvanted recombinant spike protein The amount of S1-specific IgG in mouse serum was quantified by ELISA on day 14 after immunization. Mice were either intraperitoneally (i.p.) immunized with recombinant Alum/S1 or intravenously (i.v.) immunized with eM-NC, eM-S or recombinant S1. The administered doses were 20 µg/mouse for eMigrasomes, 10 µg/mouse (i.v.) or 50 µg/mouse (i.p.) for recombinant S1 and 50 µl/mouse for Aluminium adjuvant.

      Assessment of antigen integrity on migrasomes:

      To address the reviewer’s suggestion regarding antigen integrity, we performed immunoblotting using antibodies against both S1 and mCherry. Two distinct bands were observed: one at the expected molecular weight of the S-mCherry fusion protein, and a higher molecular weight band that may represent oligomerized or higher-order forms of the Spike protein (Figure 5b in the revised manuscript).

      Furthermore, we performed confocal microscopy using a monoclonal antibody against Spike (anti-S). Co-localization analysis revealed strong overlap between the mCherry fluorescence and anti-Spike staining, confirming the proper presentation and surface localization of intact S-mCherry fusion protein on eMigs (Figure 5c in the revised manuscript). These results confirm the structural integrity and antigenic fidelity of the Spike protein expressed on eMigs.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      I know that the reviewers always ask for more, and this is not the case here. Can the abstract and title be changed to emphasize the science behind migrasome formation, and possibly add a few more fundamental aspects on how hypotonic shock induces migrasomes?

      Alternatively, if the authors desire to maintain the emphasis on vaccines, can immunological mechanisms be somewhat expanded in order to - at least to some extent - explain why migrasomes are a better vaccine vehicle?

      One way or another, this reviewer is highly supportive of this study and it is really up to the authors and the editor to decide whether my comments are of use or not.

      My recommendation is to go ahead with publishing after some adjustments as per above.

      We’d like to thank the reviewer for the suggestion. We have changed the title of the manuscript and modified the abstract, emphasizing the fundamental science behind the development of eMigrasome. To gain some immunological information on eMig illucidated antibody responses, we characterized the type of IgG induced by eM-OVA in mice, and compared it to that induced by Alum/OVA. The IgG response to Alum/OVA was dominated by IgG1. Quite differently, eM-OVA induced an even distribution of IgG subtypes, including IgG1, IgG2b, IgG2c, and IgG3 (Figure 4i in the revised manuscript). The ratio between IgG1 and IgG2a/c indicates a Th1 or Th2 type humoral immune response. Thus, eM-OVA immunization induces a balance of Th1/Th2 immune responses.

      Reviewer #2 (Recommendations For The Authors):

      The study is a very nice exploration of a new vaccine platform. This reviewer believes that a more head-to-head comparison to the current vaccine SARS-CoV-2 vaccine platform would improve the manuscript. This comparison is done with OVA antigen, but this model antigen is not as exciting as a functional head-to-head with a SARS-CoV-2 vaccine.

      I think that two other discussion points should be included in the manuscript. First, was the host-cell protein evaluated? If not, I would include that point on how issues of host cell contamination of the migrasome could play a role in the responses and safety of a vaccine. Second, I would discuss antigen incorporation and localization into the platform. For example, the full-length spike being expressed has a native signal peptide and transmembrane domain. The authors point out that a transmembrane domain can be added to display an antigen that does not have one natively expressed, however, without a signal peptide this would not be secreted and localized properly. I would suggest adding a discussion of how a non-native signal peptide would be necessary in addition to a transmembrane domain.

      We thank the reviewer for these thoughtful suggestions and fully agree that the points raised are important for the translational development of eMig-based vaccines.

      (1) Host cell proteins and potential immunogenicity:

      We appreciate the reviewer’s suggestion to consider host cell protein contamination. Considering potential clinical application of eMigrasomes in the future, we will use human cells with low immunogenicity such as HEK-293 or embryonic stem cells (ESCs) to generate eMigrasomes. Also, we will follow a QC that meets the standard of validated EV-based vaccination techniques. 

      (2) Antigen incorporation and localization—signal peptide and transmembrane domain:

      We also agree with the reviewer’s point that proper surface display of antigens on eMigs requires both a transmembrane domain and a signal peptide for correct trafficking and membrane anchoring. For instance, in the case of full-length Spike protein, the native signal peptide and transmembrane domain ensure proper localization to the plasma membrane and subsequent incorporation into eMigs. In case of OVA, a secretary protein that contains a native signal peptide yet lacks a transmembrane domain, an engineered transmembrane domain is required. For antigens that do not naturally contain these features, both a non-native signal peptide and an artificial transmembrane domain are necessary. We have clarified this point in the revised discussion and explicitly noted the requirement for a signal peptide when engineering antigens for surface display on migrasomes.

    1. Author response:

      The following is the authors’ response to the original reviews

      We again thank the reviewers for their comments and recommendations. In response to the reviewer’s suggestions, we have performed several additional experiments, added additional discussion, and updated our conclusions to reflect the additional work. Specifically, we have performed additional analyses in female WT and Marco-deficient animals, demonstrating that the Marco-associated phonotypes observed in male mice (reduced adrenal weight, increased lung Ace mRNA and protein expression, unchanged expression of adrenal corticosteroid biosynthetic enzymes) are not present in female mice. We also report new data on the physiological consequences of increased aldosterone levels observed in male mice, namely plasma sodium and potassium titres, and blood pressure alterations in WT vs Marco-deficient male mice. In an attempt to address the reviewer’s comments relating to our proposed mechanism on the regulation of lung Ace expression, we additionally performed a co-culture experiment using an alveolar macrophage cell line and an endothelial cell line. In light of the additional evidence presented herein, we have updated our conclusions from this study and changed the title of our work to acknowledge that the mechanism underlying the reported phenotype remains incompletely understood. Specific responses to reviewers can be seen below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels.

      The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung.

      Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored.

      The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear.

      The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      (1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      (2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      Using a power calculator (www.gigacalculator.com) it was determined that our sample size of 13 was one less than sufficient to detect a similar % difference in corticosterone as was detected in corticosterone. We regret that we unable to perform additional measurements as the author suggested in the available timeframe.

      (2) All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      Given the limitations outlined in our previous response to reviewers it was not possible to repeat every experiment from the original manuscript. We were able to measure the expression of lung Ace mRNA, ACE protein, adrenal weights, adrenal expression of steroid biosynthetic enzymes, presence of myeloid cells, and levels of serum electrolytes in female animals. These are presented in figures 1G, 3B, 4A, 4E, 4F, 4I, and 4J. We have elected to not present single cell seq data according to sex as it did not indicate substantial differences between males and females in Marco or Ace expression and so does not substantively change our approach.

      (3) IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We used negative controls to guide our settings during acquisition of immunofluorescent images. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung in addition to the protein level. This data was presented in the original manuscript and is further bolstered by our additional presentation of expression data for Ace mRNA and protein in female animals in this revised manuscript.

      (4) Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      We don’t have ACE-deficient mice so cannot do KO validation of the antibody. We did perform secondary stain controls which confirmed the signal observed is primary antibody-derived. Moreover, we specifically chose an anti-ACE antibody (Invitrogen catalogue # MA5-32741) that has undergone advanced verification with the manufacturer. We additionally tested the antibody in the brain and liver and observed no significant levels of staining.

      Author response image 1.

      (5) The link between alveolar macrophage Marco and ACE is poorly explored.

      We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the discussion.

      (6) Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      This is outside the scope if this project, though we would consider exploring such experiments in future studies.

      (7) Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      (1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      (2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the implications explored in the discussion.

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      (4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern we carried out a co-culture experiment as described above.

    1. eLife Assessment

      This study reports insights into how the caspase Dcp-1, best known for cell death, can also promote tissue growth in Drosophila, extending the authors' earlier work by identifying regulatory factors that shape this non-lethal activity. The valuable findings identify new Dcp-1-interacting proteins Sirt1, Fkbp59, Debcl, Buffy, Atg2, and Atg8a, and help broaden understanding of how growth and death pathways intersect. The evidence is solid, but some conclusions would be strengthened by additional studies, particularly regarding the nature of the cell death observed and the involvement of autophagy.

    2. Reviewer #1 (Public review):

      Summary:

      The authors clearly demonstrate that overexpressed Dcp-1, but not Drice, is activated without canonical apoptosome components. Using TurboID-based proximity labeling, they revealed distinct proximal proteomes, among which Sirtuin 1, an Atg8a deacetylase, which promotes autophagy, was specifically required for Dcp-1 activation. Additionally, the show that autophagy-related genes, including Bcl-2 family members Debcl and Buffy, are required for Dcp-1 activation.

      Using structure-based prediction using AlphaFold3, they identified that Bruce, an autophagy-regulated inhibitor of apoptosis, acts as a Dcp-1-specific regulator acting outside the apoptosome-mediated pathway. Finally, they show that Bruce suppresses wing tissue growth. These findings indicate that non-lethal Dcp-1 activity is governed by the autophagy-Bruce axis, enabling distinct non-lethal functions independent of cell death.

      Strengths:

      This is an excellent paper with very good structure, excellent quality data and analysis.

      Weaknesses:

      This reviewer did not identify any weaknesses or recommendations for revision.

    3. Reviewer #2 (Public review):

      Summary:

      The Drosophila executioner caspase Dcp-1 has established roles in cell death, autophagy, and imaginal disc growth. This study reports previously unrecognized factors that work together with Dcp-1. Specifically, the authors performed a turboID-based proximal ligation experiment to identify factors associated Dcp-1 and Drice. Dcp-1-specific interactors were further examined for their genetic interaction. The authors report autophagy-related genes, including Debcl and Buffy, to be required for Dcp-1 activation. In addition, the authors present evidence of an interaction between Bruce and Dcp-1. Bruce-expression blocks the Dcp-1 overexpression phenotype. Inhibition of effector caspases or overexpression of Bruce commonly reduced wing growth, suggesting a relationship between the two proteins.

      Strengths:

      On the positive side, the study identifies new Dcp-1-interacting proteins and provides a functional link between Dcp-1 and Sirt1, Fkbp59, Debcl, Buffy, Atg2, and Atg8a.

      Weaknesses:

      The data supporting the Dcp-1/Bruce interaction are not strong, even though the title of this manuscript highlights Bruce. For example, the authors' turboID data does not support Dcp-1/Bruce interaction. The case for the interaction is based on a single experiment that overexpresses a truncated Bruce transgene in S2 cells.

    4. Reviewer #3 (Public review):

      Summary:

      The present paper by Shinoda et al. from the Miura group builds upon findings reported in an earlier study by the same team (Shinoda et al., PNAS, 2019), which identified a non-apoptotic role for the Drosophila executioner caspase Dcp-1 in promoting wing tissue growth. That earlier work attributed this function primarily to Dcp-1 and to Decay, a caspase structurally related to executioner caspases, but not to DrICE, the principal apoptotic executioner caspase. The authors further proposed that this non-apoptotic caspase activity operates independently of the initiator caspase Dronc.

      In the current study, the authors both corroborate aspects of their previous findings and extend the investigation to mechanisms regulating Dcp-1 in this context. They identify roles for the giant IAP Bruce, two BCL-2 family members, and autophagy-related components in modulating non-apoptotic Dcp-1 activity. Moreover, they show that Bruce binds to a BIR-like peptide exposed upon Dcp-1 cleavage, but not to DrICE. The study further suggests that low levels of Dcp-1 activity promote wing tissue growth, whereas excessive activity induces cell death, as evidenced by impaired wing development following Dcp-1 overexpression. Overall, the manuscript provides several intriguing insights into the non-apoptotic regulation of the comparatively weak apoptotic executioner caspase Dcp-1 and complements the group's earlier work. However, several concerns remain regarding certain interpretations of the data and the experimental rigour of some of the results.

      Strengths:

      A major strength of the work is its systematic genetic and biochemical approaches, which combine tissue-specific manipulation with protein interaction mapping to explore how Dcp-1 is regulated. The identification of several regulatory factors, including an inhibitor of cell death protein and components linked to autophagy, provides a coherent framework for understanding how Dcp-1 activity might be tuned.

      Weaknesses:

      The evidence supporting some key claims remains incomplete. In particular, the type of cell death form induced when Dcp-1 is overexpressed is not clearly established, and additional tests would be needed to distinguish between the different cell death types.

      Likely impact:

      The study contributes to a growing body of work showing that proteins traditionally associated with cell death can have broader roles in tissue development. This conceptual advance is likely to be of interest to researchers studying growth control and tissue maintenance.

      Specific points:

      (1) Nature of the wing ablation phenotype

      A central concern is whether the wing ablation phenotype observed upon Dcp-1 overexpression truly reflects apoptotic cell death. The authors show in Figure 1c that nuclei in cells overexpressing Dcp-1, but not DrICE, zymogens are highly condensed, which is suggestive of apoptosis. However, it is equally plausible that this phenotype reflects a form of non-apoptotic, Dcp-1-dependent cell death (e.g. autophagy-dependent cell death). This distinction could be readily addressed using TUNEL labelling and direct caspase activity assays. The latter would be particularly informative, as it remains unclear whether zymogen Dcp-1 is capable of cleaving standard effector caspase reporters in vivo. Does the anti-cleaved Dcp-1 antibody detect Dcp-1 activation following overexpression of the Dcp-1 zymogen?

      (2) Role of Decay

      In their earlier study, the authors identified Decay as another caspase influencing wing growth, albeit more modestly than Dcp-1. It is therefore unclear why this line of investigation was not pursued further in the current work. This omission is notable, as Decay is not implicated in apoptosis and, to date, no substantial physiological function has been assigned to this caspase in any system. At a minimum, this point should be discussed explicitly.

      (3) Figure 2: Proximity labelling analysis

      The authors use TurboID-mediated proximity labelling to reveal distinct Dcp-1- and DrICE-associated proteomes across tissues, with a particular focus on the wing disc. They further demonstrate that RNAi-mediated knockdown of the Dcp-1-associated proteins Sirt1 and Fkbp59 suppresses the wing ablation phenotype induced by Dcp-1 overexpression, suggesting that these factors are required for Dcp-1 activity. However, it should be clarified whether Bruce was identified as a Dcp-1 interactor in the proximity labelling dataset, given its proposed central regulatory role. In addition, further discussion of Fkbp59, its known functions and how it might mechanistically influence Dcp-1 activity would be valuable.

      (4) Figure 3: Autophagy-related factors

      Given that Sirt1 is known to promote autophagy, the authors next examine autophagy-related proteins and identify roles for Atg2, Atg8a, Debcl, and Buffy in Dcp-1 activation. Notably, these proteins do not promote cell death in the Hid-induced canonical apoptotic pathway. However, it is important to determine whether knockdown of Debcl, Buffy, Atg2, or Atg8a alone affects wing development in the absence of Dcp-1 overexpression, to exclude the possibility that these perturbations independently impair wing formation.

      (5) Evidence for canonical autophagy

      The involvement of autophagy would be more convincingly demonstrated by testing additional core autophagy genes, such as Atg7, Atg5, and Atg12, as well as performing a combined knockdown of Atg8a and Atg8b. Moreover, direct assessment of autophagy at the cellular level using established genetic reporters would substantially strengthen the conclusions.

      (6) Figures 4-5: Functional consequences

      It would be informative to determine whether Synr, Debcl, or Buffy influence wing size on their own and whether their overexpression enhances wing growth.

      (7) Terminology and interpretation of cell death

      Taken together, the results suggest that Dcp-1 zymogen overexpression induces a form of non-apoptotic cell death, potentially autophagy-dependent or related. The reviewer does not understand the authors' insistence on referring to this process as apoptosis. The authors should be more cautious in their terminology: there is no canonical versus non-canonical apoptosis; there is simply apoptosis. Without stronger evidence, these effects should not be described as apoptotic cell death.

    1. eLife Assessment

      This study presents a valuable advance by enabling functional mapping of Ca²⁺ responses in live human pancreatic tissue slices, providing new opportunities to study islet heterogeneity and diabetes-related dysfunction in an intact tissue context. The evidence supporting the main conclusions is solid, based on reproducible methodology and functional validation across multiple human donor samples. Key revisions needed include clearer quantification of transduction efficiency and tissue viability, and improved clarification of how CaMPARI2 signals should be interpreted.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to overcome a major technical limitation in pancreatic slice research - the inefficient viral transduction of dense, enzyme-active human pancreas tissue - while maintaining tissue integrity and physiological responsiveness. They developed a modified culture and infection protocol that incorporates gentle orbital agitation, removal of protease inhibitors, and physiological temperature during adenoviral transduction. This method increased transduction efficiency by approximately threefold without impairing insulin secretion or calcium signaling responses.

      Strengths:

      The study's major strengths are its clear methodological innovation, experiment optimization, and multiparametric validation. The authors provide compelling evidence that their approach enhances the expression of genetically encoded calcium indicators (GCaMP6m) and integrators (CaMPARI2), preserving both endocrine and exocrine cell functionality. The demonstration of targeted biosensor expression in β-cells and multiplexed imaging of redox and calcium dynamics highlights the versatility of the system. The CaMPARI2-based approach is particularly impactful, as it decouples maximum calcium response assessment from real-time imaging, thereby increasing throughput and reducing bias. The authors successfully apply the technique to samples from non-diabetic, T1D, and T2D donors, revealing disease-relevant alterations in β-cell calcium responses consistent with known physiological dysfunctions. The analysis of islet size versus calcium response further underscores the utility of this platform for probing structure-function relationships in situ.

      Weaknesses:

      The primary limitations are a lack of live/dead assessment to differentiate viability-related effects from methodological improvements, a lack of quantification of the transduction efficiency (while relative efficiency is clearly increased, it is not shown what is absolute efficiency is), lack of IF confirmation of the cell-specific transduction efficiency. These limitations, however, do not detract from the overall strength of the technical advance.

      Overall, this work offers a convincing and practical advance for the diabetes and islet biology community. It substantially improves the toolkit available for live human pancreas studies and will likely catalyze further mechanistic investigations of islet heterogeneity, disease progression, and therapeutic response.

    3. Reviewer #2 (Public review):

      (1) The photoconversion protocol requires a more detailed and quantitative discussion. The current description ("5 s pulses for 5 min, leading to 2.5 min of total light delivery") is too brief to evaluate whether the chosen illumination parameters maintain the CaMPARI2 signal within its linear dynamic range. Because CaMPARI2 photoconversion reflects the time integral of 405 nm photoconverting light exposure in the presence of intracellular [Ca²⁺], the red/green fluorescence ratio is directly proportional to cumulative illumination time until saturation occurs. Previous characterization (PMID: 30361563) shows that photoconversion is approximately linear over the first 0-80 s of 405 nm exposure, after which red fluorescence plateaus. The total exposure used here (=150 s) may therefore exceed the linear regime, potentially obscuring differences between cells with moderate versus strong Ca²⁺ activity. The authors should (i) justify the selected illumination parameters, (ii) provide evidence that the chosen conditions remain within the linear response range for the specific optical setup, (iii) discuss how overexposure might affect quantitative interpretation of red/green ratios and comparisons between experimental groups. Inclusion of calibration data would substantially strengthen the methodological rigor and reproducibility of the study.

      (2) For Figure 8a (middle panels), the data points for 16G and KCl show overlaps, raising the possibility that at it 16G may already be saturated. The authors should comment on the potential for CaMPARI2 saturation at 16G, and clarify whether this affects the interpretation of the KCl results "At maximal stimulation by KCl, there was no size-function correlation (R = 0.15, p = 0.14)."

      (3) The term "calcium activity" is used throughout the manuscript but remains vague. Pancreatic islets typically display a biphasic Ca²⁺ response to high glucose-an initial sustained peak followed by repetitive oscillations - and these phases differ in both kinetics and physiological meaning. Ca²⁺ responses are usually quantified using parameters such as rise time, amplitude, and duration for the initial peak, and amplitude, frequency, burst duration, and duty cycle for the oscillatory phase. The authors should clarify how "calcium activity" is defined in their analyses and discuss the appropriateness of directly comparing Ca²⁺ signals with distinct temporal patterns.

      (4) The CaMPARI2 red/green ratio reflects the time-integral of 405 nm photoconverting light exposure in the presence of Ca²⁺, two Ca²⁺ responses with the same duty cycle but different amplitudes could, in principle, yield the same red/green ratios. This raises an important question regarding how well the CaMPARI2 signal distinguishes differences in Ca²⁺ amplitude versus time spent above threshold. The authors should directly relate single-cell Ca²⁺ traces to corresponding red/green ratios to demonstrate the extent to which CaMPARI2 photoconversion truly reflects "Ca²⁺ activity." Such validation would clarify whether the metric is sensitive to variations in oscillation amplitude, duty cycle, or both, and would strengthen the interpretation of CaMPARI2-based functional comparisons.

    4. Reviewer #3 (Public review):

      Summary:

      Lazimi and coworkers present an updated experimental protocol by which viral vectors can be used with live pancreas slices in order to efficiently transduce fluorescent protein biosensors. This is of high importance, given that live human pancreas slices provide a means to study islet function while maintaining the architecture of the local environment. Thus, efficiently delivering a wide range of fluorescent protein biosensors provides expanded capabilities to study the human islet and its dysfunction in type 1 and type 2 diabetes. The authors demonstrate the improved transduction provided by their revised protocol, which includes orbital culture, while retaining or, in some cases, improving cell viability, hormone release, and Ca2+ responses. Further, the authors demonstrate how a 'Ca2+ integrator', CAMPARI2, can be used to profile the Ca2+ response of large numbers of cells and islets, to capture the variability in islet responses in healthy and diabetic cases.

      Strengths:

      The data presented are generally robust, and the methods are well described, such that this protocol could be repeated by other investigators. All findings are representative of multiple donors. Importantly, the data is highly novel.

      Weaknesses:

      Weaknesses in the manuscript mainly include a lack of technical details by which data is presented or analyzed, as well as caveats by which certain data related to islet size are interpreted.

    1. eLife Assessment

      This paper addresses valuable questions about the evolution of recombination landscape under domestication by examining recombination maps in domesticated chickens and their wild ancestor. However, despite employing a state-of-the-art deep learning method for recombination map inference, the lack of systematic benchmarking and presence of some unexpected patterns raise concerns about the reliability of the inferred maps, thus providing incomplete support for rapid evolution of recombination landscapes. Additionally, due to methodological limitations in testing for intra-genome correlations between evolutionary processes, the current evidence is inadequate to support the associations of recombination with selection and/or introgression in domesticated chickens.

    2. Reviewer #1 (Public review):

      Liu, Li, Ge, and colleagues use whole genome sequence data to estimate the recombination landscape of domesticated chickens and their wild ancestor, Red Junglefowl. They compare landscapes estimated using the deep learning method RelERNN (Adrion et al. 2020) to understand the consequences of domestication for the evolution of recombination. The authors build on previous work in tomato, maize, and other domesticated species to examine how recombination rate and patterning evolve under the demography and selection pressures of domestication. They do so by comparing estimates of local recombination rates across chromosomes and populations, asking if/how well certain sequence and chromatin-based predictors predict recombination rate, and testing for an association between recombination rate and the proportion of introgressed ancestry from Red Junglefowl.

      This study provides evidence for the hypothesis that recombination evolves rapidly in domesticated lineages -- so much so that we see little hotspot sharing between breeds in the present-day! Strengths of the paper include the collection/analysis of data from several domesticated sub-populations and efforts to control for demography and structure in the inference of recombination landscapes (given the challenges of some methods under non-equilibrium demography: https://academic.oup.com/mbe/article/35/2/335/4555533). It is also reassuring to see patterns that have been thoroughly established (e.g., the negative relationship between recombination rate and chromosome size) validated.

      However, I have concerns about the data and methodology.

      (1) My main concern is that the demographic and recombination rate estimates inferred using ~20 whole genomes are likely quite variable and, without quantification of the uncertainty or systematic assessment of the possible biases in the methodology, it is difficult to have confidence in analyses which make use of the RelERNN landscapes.

      (a) Similar studies in rye (https://academic.oup.com/mbe/article/39/6/msac131/6605708) and tomato (https://academic.oup.com/mbe/article/39/1/msab287/6379725) used data from far more individuals (916 individuals split up into populations of size 50 for rye, >75 samples for tomato) to infer recombination maps and conduct downstream analyses. Studies in human genetics make use of an even greater number! The evidence (Lines 189-196 of the main text) that the sample size is sufficient to capture fine-scale variation in recombination is weak. In particular, correlations between the true and estimated recombination rate are based on *equilibrium* demography at sample sizes of 5, 10, and 20, yet used draw the inference "20 samples per population are sufficient to reconstruct their recombination landscapes" under the *non-equilibrium* demography (inferred using SMC+).

      (b) RelERNN learns the recombination landscape by using several signatures (the decay of linkage disequilibrium and, as described in https://academic.oup.com/genetics/advance-article-abstract/doi/10.1093/genetics/iyaf108/8157390, choppiness of the allele frequency spectrum) left in present-day genomes. Both signatures depend strongly on local SNP density. It does not seem the effect of SNP density on the inferred recombination rate is examined, despite the potential for correlated noise in inferred recombination rate (in SNP-sparse regions of the genome) to confound downstream inference.

      (c) It is unclear if the demographic histories for chickens (Figure S6) broadly match what have been previously estimated from whole-genome data, or if a large class of demographic models are compatible with the data (i.e., confidence intervals for the demographic histories are quite large). In Figure S6, its bottlenecks are somewhat weak and affect only a couple of the groups, despite the history of domestication and the expectation that effective sizes vary more widely. The groups affected (LX and WL) are those that have the weakest correlations between recombination rate under the equilibrium and non-equilibrium demographic models.

      (2) The authors test for the effects of chromatin modifications, GC content, etc using correlations between local recombination rate and the features individually. However, joint inference of the effects under a GLM (the distribution of recombination rates is probably better described by, e.g., a Gamma distribution) would permit more straightforward causal inference, given, e.g., the potential effects of chromatin marks on deleterious mutation accumulation. I recognize this likely would not change the direction or significance of the effects in question, but it is worth noting given readers who may want to learn something from the effect sizes and the nature of causes and effects is difficult to disentangle without a multivariate approach.

      Overall:

      Previous work on recombination landscape evolution in birds (namely, the zebra finch and long-tailed finch; Singhal & Leffler 2015) has shown that many hotspots, i.e., small stretches of the genome that experience rates of crossing over that are much higher than the genome-wide average, are conserved over tens of millions of years of evolution. Work in tomato, maize, rye, and other flowering plants with histories of domestication have shown that hotspots can be dynamic. The results of Liu, Li, Ge, and colleagues complement those analyses and will, therefore, be of interest to those working on the evolution of recombination. Additionally, the finding that minor parent ancestry is negatively associated with recombination is interesting to an otherwise general rule in evolutionary biology. Finally, it is quite exciting to see recombination maps inferred using RelERNN, and in a demography-aware fashion!

      That all said, it is difficult to have certainty in the results due to the relatively limited sample size for each of the populations, the lack of control for SNP density, the uncertainty in both recombination maps and demographic histories, and the lack of a joint modelling framework to carefully tease apart effects that are reported in isolation.

    3. Reviewer #2 (Public review):

      Summary:

      Liu et al. use whole genome sequencing data from several strains of chicken as well as a subspecies of the chicken wild ancestor to study the impact of domestication on the recombination landscape. They analyze these data using several machine-learning/AI based methods, using simulation to partially inform their analysis. The authors claim to find substantial deviations in the fine-scale recombination landscape between breeds, and surprising patterns between recombination and introgression/selection. However, there are substantial inconsistencies between the author's findings and the current understanding in the field, supported by indirect evidence that is hard to interpret at best.

      Strengths:

      The data produced by the authors of this and a previous paper is well-suited to answer the questions that they pose. The authors use simulations to support some decisions made in analyzing this data, which partially alleviates some potential questions, and could be extended to address additional concerns. Should further analysis support the claims currently made regarding hotspot turnover and introgression frequency vs. recombination rate, these findings would indeed be striking observations at odds with current understanding in the field.

      Weaknesses:

      I have several major concerns regarding the ability of the analyses to support the claims in this paper, summarized below.

      Substantial deviations from field-standard benchmarks the estimated recombination landscape appear to have been disregarded, particularly with regard to the WL breed.<br /> o For example, the number of detected hotspots per subspecies ranges from maybe 500 to over 100,000 based on figure 2A. While the mean is indeed comparable to estimates from other species (lines 315-317), this characterization masks that each recombination map has far too few or too many hotspots to be biologically accurate (at least without substantial corroboration from more direct analyses). As such, statements about hotspot overlap between breeds and hotspot conservation cannot be taken at face value. Authors might consider using alternative methods to detect hotspots, assessing their power to detect hotspots in each breed, and evaluating hotspot overlap between breeds with respect to random expectation.<br /> o Furthermore, the authors consider the recombination landscape at promoters (Figure S10) and H3K4me3 sites (Figure 2C) and find that levels are slightly elevated, but the magnitude of the elevation (negligible to ~1.5x) is substantially lower than that of any other species studied to date without PRDM9. The magnitude of elevation for both comparisons is especially small for WL, which suggests that the recombination estimates for this breed are particularly noisy, and yet this breed is the focus of the introgression analysis.

      Introgression and strong selection can both be thought of as changing the local Ne along the genome. Estimating recombination from patterns of LD most directly estimates rho (the population recombination rate, 4*Ne*r), and disentangling local changes in Ne from local changes in r is non-trivial. Furthermore, selective sweeps, particularly easy-to-detect hard sweeps, are often characterized by having very little genetic variation. Estimating recombination rate from patterns of LD in regions with very little variation seems particularly challenging, and could bias results such as in Figure S15. The authors do not discuss the implications of these challenges for their analyses, which seems particularly relevant for their analyses of introgression and selection with recombination, as well as comparisons between WL (which the authors report to have undergone more selection and introgression) with other breeds. Authors should quantify their ability/power to detect recombination rates and hotspots under these conditions using simulation - some of these simulations are already mentioned in the paper, but are not analyzed in this way. Also useful would be quantifying the impact of simulated bottlenecks on estimates of recombination rate.

      In many analyses (e.g. hotspot and coldspot overlap, histone mark analysis), authors appear to use 1000 randomly selected regions of the same length as a control. If this characterization is accurate, authors should match the number of control regions to the number of features that they're comparing to. A more careful analysis might also select random regions from the same chromosome, match for GC content where appropriate, etc.

      Authors provide very little detail about the number/locations of coldspots or selective sweeps- how many were detected in each subspecies? Does the fraction of hotspots and coldspots which overlap selective sweeps vary between species? It is unclear whether the numbers in the text (lines 356-364) represent a single breed or an analysis across breeds.

    1. eLife Assessment

      Koch et al. describe a valuable novel methodology, SynSAC, to synchronise cells to analyse meiosis I or meiosis II or mitotic metaphase in budding yeast. The authors present convincing data to validate abscisic acid-induced dimerisation to induce a synthetic spindle assembly checkpoint (SAC) arrest that will be of particular importance to analyse meiosis II. The authors use their approach to determine the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II that will be of interest to the broader meiosis research community.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system, but more work is needed to validate these results, particularly in normal cells.

      Overall, the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      Significance:

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner. Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      Significance:

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

    5. Author response:

      General Statements

      We are delighted that all reviewers found our manuscript to be a technical advance by providing a much sought after method to arrest budding yeast cells in metaphase of mitosis or both meiotic metaphases. The reviewers also valued our use of this system to make new discoveries in two areas. First, we provided evidence that the spindle checkpoint is intrinsically weaker in meiosis I and showed that this is due to PP1 phosphatase. Second, we determined how the composition and phosphorylation of the kinetochore changes during meiosis, providing key insights into kinetochore function and providing a rich dataset for future studies.

      The reviewers also made some extremely helpful suggestions to improve our manuscript, which we will now implement:

      (1) Improvements to the discussion throughout the manuscript. The reviewers recommended that we focus our discussion on the novel findings of the manuscript and drew out some key points of interest that deserve more attention. We fully agree with this and we will address this in a revised version.

      (2) We will add a new supplemental figure to help interpret the mass spectrometry data, to address Reviewer #3, point 4.

      (3) We are currently performing an additional control experiment to address the minor point 1 from reviewer #3. Our experiment to confirm that SynSAC relies on endogenous checkpoint proteins was missing the cell cycle profile of cells where SynSAC was not induced for comparison. We will add this control to our full revision.

      (4) In our full revision we will also include representative images of spindle morphology as requested by Reviewer #1, point 2

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is that it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division. Overall, I have only a few minor suggestions.

      We appreciate the reviewers’ support of our study.

      (1) In wild-type - Pds1 levels are high during M1 and A1, but low in MII. Can the authors comment on this? In line 217, what is meant by "slightly attenuated? Can the authors comment on how anaphase occurs in presence of high Pds1? There is even a low but significant level in MII.

      The higher levels of Pds1 in meiosis I compared to meiosis II has been observed previously using immunofluorescence and live imaging[1–3]. Although the reasons are not completely clear, we speculate that there is insufficient time between the two divisions to re-accumulate Pds1 prior to separase re-activation.

      We agree “slightly attenuated” was confusing and we have re-worded this sentence to read “Addition ABA at the time of prophase release resulted in Pds1securin stabilisation throughout the time course, consistent with delays in both metaphase I and II”.

      We do not believe that either anaphase I or II occur in the presence of high Pds1. Western blotting represents the amount of Pds1 in the population of cells at a given time point. The time between meiosis I and II is very short even when treated with ABA. For example, in Figure 2B, spindle morphology counts show that the anaphase I peak is around 40% at its maxima (105 min) and around 40% of cells are in either metaphase I or metaphase II, and will be Pds1 positive. In contrast, due to the better efficiency of meiosis II, anaphase II hardly occurs at all in these conditions, since anaphase II spindles (and the second nuclear division) are observed at very low frequency (maximum 10%) from 165 minutes onwards. Instead, metaphase II spindles partially or fully breakdown, without undergoing anaphase extension. Taking Pds1 levels from the western blot and the spindle data together leads to the conclusion that at the end of the time-course, these cells are biochemically in metaphase II, but unable to maintain a robust spindle. Spindle collapse is also observed in other situations where meiotic exit fails, and potentially reflects an uncoupling of the cell cycle from the programme governing gamete differentiation[3–5]. We will explain this point in a revised version while referring to representative images that from evidence for this, as also requested by the reviewer below.

      (2) The figures with data characterizing the system are mostly graphs showing time course of MI and MII. There is no cytology, which is a little surprising since the stage is determined by spindle morphology. It would help to see sample sizes (ie. In the Figure legends) and also representative images. It would also be nice to see images comparing the same stage in the SynSAC cells versus normal cells. Are there any differences in the morphology of the spindles or chromosomes when in the SynSAC system?

      This is an excellent suggestion and will also help clarify the point above. We will provide images of cells at the different stages. For each timepoint, 100 cells were scored. We have already included this information in the figure legends 

      (3) A possible criticism of this system could be that the SAC signal promoting arrest is not coming from the kinetochore. Are there any possible consequences of this? In vertebrate cells, the RZZ complex streams off the kinetochore. Yeast don't have RZZ but this is an example of something that is SAC dependent and happens at the kinetochore. Can the authors discuss possible limitations such as this? Does the inhibition of the APC effect the native kinetochores? This could be good or bad. A bad possibility is that the cell is behaving as if it is in MII, but the kinetochores have made their microtubule attachments and behave as if in anaphase.

      In our view, the fact that SynSAC does not come from kinetochores is a major advantage as this allows the study of the kinetochore in an unperturbed state. It is also important to note that the canonical checkpoint components are all still present in the SynSAC strains, and perturbations in kinetochore-microtubule interactions would be expected to mount a kinetochore-driven checkpoint response as normal. Indeed, it would be interesting in future work to understand how disrupting kinetochore-microtubule attachments alters kinetochore composition (presumably checkpoint proteins will be recruited) and phosphorylation but this is beyond the scope of this work. In terms of the state at which we are arresting cells – this is a true metaphase because cohesion has not been lost but kinetochore-microtubule attachments have been established. This is evident from the enrichment of microtubule regulators but not checkpoint proteins in the kinetochore purifications from metaphase I and II. While this state is expected to occur only transiently in yeast, since the establishment of proper kinetochore-microtubule attachments triggers anaphase onset, the ability to capture this properly bioriented state will be extremely informative for future studies. We appreciate the reviewers’ insight in highlighting these interesting discussion points which we will include in a revised version.

      Reviewer #1 (Significance):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

      We appreciate the reviewer’s enthusiasm for our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so-named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      I have only a couple of minor comments:

      (1) I would add the Suppl Figure 1A to main Figure 1A. What is really exciting here is the arrest in metaphase II, so I don't understand why the authors characterize metaphase I in the main figure, but not metaphase II. But this is only a suggestion.

      This is a good suggestion, we will do this in our full revision.

      (2) Line 197, the authors state: “...SyncSACinduced a more pronounced delay in metaphase II than in metaphase I”. However, line 229 and 240 the authors talk about a "longer delay in metaphase <i compared to metaphase II"... this seems to be a mix-up.

      Thank you for pointing this out, this is indeed a typo and we have corrected it.

      (3) The authors describe striking differences for both protein abundance and phosphorylation for key kinetochore associated proteins. I found one very interesting protein that seems to be very abundant and phosphorylated in metaphase I but not metaphase II, namely Sgo1. Do the authors think that Sgo1 is not required in metaphase II anymore? (Top hit in suppl Fig 8D).

      This is indeed an interesting observation, which we plan to investigate as part of another study in the future. Indeed, data from mouse indicates that shugoshin-dependent cohesin deprotection is already absent in meiosis II in mouse oocytes[6], though whether this is also true in yeast is not known. Furthermore, this does not rule out other functions of Sgo1 in meiosis II (for example promoting biorientation). We will include this point in the discussion.

      Reviewer #2 (Significance):

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner.

      We are grateful to the reviewer for their support.

      Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      For many purposes the enrichment and extended time for sample collection is sufficient, as we demonstrate here. However, as pointed out by the reviewer below, the system can be improved by use of the 4A-RASA mutations to provide a stronger arrest (see our response below). We did not experiment with higher ABA concentrations or repeated addition since the very robust arrest achieved with the 4A-RASA mutant deemed this unnecessary.

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      We agree that the 4A-RASA mutant is the best tool to use for the arrest and going forward this will be our approach. We collected the proteomics data and the data on the SynSAC mutant variants concurrently, so we did not know about the improved arrest at the time the proteomics experiment was done. Because very good arrest was already achieved with the unmutated SynSAC construct, we could not justify repeating the proteomics experiment which is a large amount of work using significant resources. However, we will highlight the potential of the 4A-RASA mutant more prominently in our full revision.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      We agree these are intriguing findings that highlight key differences as to the wiring of the spindle checkpoint in meiosis and mitosis and potential for future studies, however, currently we can only speculate as to the underlying cause. The effect of the RASA mutation in mitosis is unexpected and unexplained. However, the fact that the 4A-RASA mutation causes a stronger delay in meiosis I compared to mitosis can be explained by a greater prominence of PP1 phosphatase in meiosis. Indeed, our data (Figure 4A) show that the PP1 phosphatase Glc7 and its regulatory subunit Fin1 are highly enriched on kinetochores at all meiotic stages compared to mitosis.

      We agree that the improved growth of the RVAF mutant is intriguing and points to a role of Aurora B-mediated phosphorylation, though previous work has not supported such a role [7].

      We will include a discussion of these important points in a revised version.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      While we agree with the reviewer that at first glance, normalising to no tag appears to be the most appropriate normalisation, in practice there is very low background signal in the no tag sample which means that any random fluctuations have a big impact on the final fold change used for normalisation. This approach therefore introduces artefacts into the data rather than improving normalisation.

      To provide reassurance that our kinetochore immunoprecipitations are specific, and that the background (no tag) signal is indeed very low, we will provide a new supplemental figure showing the volcanos comparing kinetochore purifications at each stage with their corresponding no tag control.

      It is also important to note that our experiment looks at relative changes of the same protein over time, which we expect to be relatively small in the whole cell lysate. We previously documented proteins that change in abundance in whole cell lysates throughout meiosis[8]. In this study, we found that relatively few proteins significantly change in abundance.

      Our aim in the current study was to understand how the relative composition of the kinetochore changes and for this, we believe that a direct comparison to Dsn1, a central kinetochore protein which we immunoprecipitated is the most appropriate normalisation.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      We strongly agree with this point and we will re-frame the discussion to focus on the novel findings, as also raised by the other reviewers.

      Finally, minor concerns are:

      (1) Meiotic progression in SynSAC strains lacking Mad1, Mad2 or Mad3 is severely affected (Fig. 1D and Supp. Fig. 1), making it difficult to assess whether, as the authors state, the metaphase delays depend on the canonical SAC cascade. In addition, as a general note, graphs displaying meiotic time courses could be improved for clarity (e.g., thinner data lines, addition of axis gridlines and external tick marks, etc.).

      We will generate the data to include a checkpoint mutant +/- ABA for direct comparison. We will take steps to improve the clarity of presentation of the meiotic timecourse graphs, though our experience is that uncluttered graphs make it easier to compare trends.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (3) It is surprising that, although SAC activity is proposed to be weaker in metaphase I, the levels of CPC/SAC proteins seem to be higher at this stage of meiosis than in metaphase II or mitotic metaphase (Fig. 4A-B).

      We agree, this is surprising and we will point this out in the revised discussion. We speculate that the challenge in biorienting homologs which are held together by chiasmata, rather than back-to-back kinetochores results in a greater requirement for error correction in meiosis I. Interestingly, the data with the RASA mutant also point to increased PP1 activity in meiosis I, and we additionally observed increased levels of PP1 (Glc7 and Fin1) on meiotic kinetochores, consistent with the idea that cycles of error correction and silencing are elevated in meiosis I.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (5) Several typographical errors should be corrected (e.g., "Knetochores" in Fig. 4 legend, "250uM ABA" in Supp. Fig. 1 legend, etc.)

      Thank you for pointing these out, they have been corrected.

      Reviewer #3 (Significance):

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have only corrected minor typos as detailed above.

      Description of analyses that authors prefer not to carry out

      The revisions we plan are detailed above. There are just two revisions we believe are either unnecessary or beyond the scope, both minor concerns of Reviewer #3. For clarity we have reproduced them, along with our justification below. In the latter case, the reviewer also acknowledged that further work in this direction is beyond the scope of the current study.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (1) Salah, S.M., and Nasmyth, K. (2000). Destruction of the securin Pds1p occurs at the onset of anaphase during both meiotic divisions in yeast. Chromosoma 109, 27–34.

      (2) Matos, J., Lipp, J.J., Bogdanova, A., Guillot, S., Okaz, E., Junqueira, M., Shevchenko, A., and Zachariae, W. (2008). Dbf4-dependent CDC7 kinase links DNA replication to the segregation of homologous chromosomes in meiosis I. Cell 135, 662–678.

      (3) Marston, A.L.A.L., Lee, B.H.B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Developmental cell 4, 711–726. https://doi.org/10.1016/S1534-5807(03)00130-8.

      (4) Attner, M.A., and Amon, A. (2012). Control of the mitotic exit network during meiosis. Molecular Biology of the Cell 23, 3122–3132. https://doi.org/10.1091/mbc.E12-03-0235.

      (5) Pablo-Hernando, M.E., Arnaiz-Pita, Y., Nakanishi, H., Dawson, D., del Rey, F., Neiman, A.M., and de Aldana, C.R.V. (2007). Cdc15 Is Required for Spore Morphogenesis Independently of Cdc14 in Saccharomyces cerevisiae. Genetics 177, 281–293. https://doi.org/10.1534/genetics.107.076133.

      (6) El Jailani, S., Cladière, D., Nikalayevich, E., Touati, S.A., Chesnokova, V., Melmed, S., Buffin, E., and Wassmann, K. (2025). Eliminating separase inhibition reveals absence of robust cohesin protection in oocyte metaphase II. EMBO J 44, 5187–5214. https://doi.org/10.1038/s44318-025-00522-0.

      (7) Rosenberg, J.S., Cross, F.R., and Funabiki, H. (2011). KNL1/Spc105 Recruits PP1 to Silence the Spindle Assembly Checkpoint. Current Biology 21, 942–947. https://doi.org/10.1016/j.cub.2011.04.011.

      (8) Koch, L.B., Spanos, C., Kelly, V., Ly, T., and Marston, A.L. (2024). Rewiring of the phosphoproteome executes two meiotic divisions in budding yeast. EMBO J 43, 1351–1383. https://doi.org/10.1038/s44318-024-00059-8.

    1. eLife Assessment

      This work offers important insights into the protein CHD4's function in chromatin remodeling and gene regulation in embryonic stem cells, supported by extensive biochemical, genomic, and imaging data. The use of an inducible degron system allows precise functional analysis, and the datasets generated represent a key resource for the field. The revised study offers compelling evidence and makes a significant contribution to understanding CHD4's role in epigenetic regulation. This work will be of interest to the epigenetics and stem biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed an elegant investigation to clarify the roles of CHD4 in chromatin accessibility and transcription regulation. In addition to the common mechanisms of action through nucleosome repositioning and opening of transcriptionally active regions, the authors considered here a new angle of CHD4 action through modulating the off rate of transcription factor binding. Their suggested scenario is that the action of CHD4 is context-dependent and is different for highly-active regions vs low-accessibility regions.

      Strengths:

      This is a very well-written paper that will be of interest to researchers working in this field. The authors performed large work with different types of NGS experiments and the corresponding computational analyses. The combination of biophysical measurements of the off-rate of protein-DNA binding with NGS experiments is particularly commendable.

      Comments on revised version:

      The authors have addressed all my points

    3. Reviewer #2 (Public review):

      This study leverages acute protein degradation of CHD4 to define its role in chromatin and gene regulation. Previous studies have relied on KO and/or RNA interference of this essential protein and as such are hampered by adaptation, cell population heterogeneity, cell proliferation and indirect effects. The authors have established an AID2-based method to rapidly deplete the dMi-2 remodeller to circumvent these problems. CHD4 is gone within an hour, well before any effects on cell cycle or cell viability can manifest. This represents an important technical advance that, for the first time, allows a comprehensive analysis of the immediate and direct effect of CHD4 loss of function on chromatin structure and gene regulation.

      Rapid CHD4 degradation is combined with ATAC-seq, CUT&RUN, (nascent) RNA-seq and single molecule microscopy to comprehensively characterise the impact on chromatin accessibility, histone modification, transcription and transcription factor (NANOG, SOX2, KLF4) binding in mouse ES cells.

      The data support the previously developed model that high levels of CHD4/NuRD maintain a degree of nucleosome density to limit TF binding at open regulatory regions (e.g. enhancers). The authors propose that CHD4 activity at these sites is an important prerequisite for enhancers to respond to novel signals that require an expanded or new set of TFs to bind.

      What I find even more exciting and entirely novel is the finding that CHD4 removes TFs from regions of limited accessibility to repress cryptic enhancers and to suppress spurious transcription. These regions are characterised by low CHD4 binding and have so far never been thoroughly analysed. The authors correctly point out that the general assumption that chromatin regulators act on regions where they seem to be concentrated (i.e. have high ChIP-seq signals) runs the risk of overlooking important functions elsewhere. This insight is highly relevant beyond the CHD4 field and will prompt other chromatin researchers to look into low level binding sites of chromatin regulators.

      The biochemical and genomic data presented in this study is of high quality (I cannot judge single microscopy experiments due to my lack of expertise). This is an important and timely study that is of great interest to the chromatin field.

      Comments on revised version:

      All my comments below have been addressed in the revised version of the manuscript.

      The revised manuscript provides a significant advance of our understanding of how the nucleosome remodeler CHD4 exerts its function. In particular, the findings suggest an intriguing role of CHD4 in TF removal at genomic regions where only low levels of CHD4 can be detected. In the future, it will be interesting to see if this activity is shared by other ATP-dependent nucleosome remodelers.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript an inducible degron approach is taken to investigate the function of the CHD4 chromatin remodelling complex. The cell lines and approaches used are well thought out and the data appear to be of high quality. They show that loss of CHD4 results in rapid changes to chromatin accessibility at thousands of sites. At the majority of locations where changes are detected, chromatin accessibility is decreased and these sites are strongly bound by CHD4 prior to activation of the degron and so likely represent primary sites of action. Somewhat surprisingly while chromatin accessibility is reduced at these sites transcription factor occupancy is little changed. Following CHD4 degradation occupancy of the key pluripotency transcription factors NANOG and SOX2 increases at many locations genome wide and at many of these sites chromatin accessibility increases. These represent important new insights into the function of CHD4 complexes.

      Strengths:

      The experimental approach is well suited to providing insight into a complex regulator such as CHD4. The data generated to characterise how cells respond to loss of CHD4 is of high quality. The study reveals major changes in transcription factor occupancy following CHD4 depletion.

      Weaknesses:

      The main weakness can be summarised as relating to the fact authors favour the interpretation that all rapid changes following CHD4 degradation occur as a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially very low (e.g sites where accessibility is gained, in comparison to that at sites where chromatin acdessibility is lost). The revised discussion acknowledges rapid indirect effects cannot be excluded.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level). 

      This is now expanded in the Discussion

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable. 

      We have normalised figure text as much as is feasible.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text. 

      We have removed references to these terms from the text and included a definition in the figure legend. 

      (4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG". 

      We have removed this panel as it was confusing and did not demonstrate any robust conclusion. 

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section. 

      We have included an explanation of the curve fitting equation in the Methods as suggested.

      The apparent dissociation rate observed is a sum of multiple rates of decay – true dissociation rate (k<sub>off</sub>), signal loss caused by photobleaching k<sub>pb</sub>, and signal loss caused by defocusing/tracking error (k<sub>tl</sub>).

      k<sub>off</sub><sup>app</sup> = k<sub>off</sub>+ k<sub>pb</sub> + k<sub>tl</sub>

      We are making conclusions about relative changes in k<sub>off</sub><sup>app</sup> upon CHD4 depletion, not about the absolute magnitude of true in k<sub>off</sub> or TF residence times.Our conclusions extend to true in k<sub>off</sub> on the assumption that k<sub>pb</sub> and k<sub>tl</sub> are equal across all samples imaged due to identical experimental conditions and analysis. k<sub>pb</sub> and k<sub>tl</sub> vary hugely across experimental set-ups, especially with different laser powers, so other k<sub>off</sub> or k<sub>off</sub><sup>app</sup> values reported in the literature would be expected to differ from ours. Time-lapse experiments or independent determination of k<sub>pb</sub> (and k<sub>tl</sub>) would be required to make any statements about absolute values of k<sub>off</sub>

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1). 

      We have now included a discussion of this point and referenced both papers.

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text. 

      We have endeavoured to define all relevant terms in the figure legends. 

      Reviewer #2 (Public review): 

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of folddifference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation. 

      The heatmap displays z-scores, meaning expression for each gene has been centred and scaled across the entire time course. As a result, time zero is not a true baseline, it simply shows whether the gene’s expression at that moment is above or below its own mean. A transition from blue to red therefore indicates that the gene increases relative to its overall average, which typically corresponds to upregulation, but it doesn’t directly represent fold-change from the 0-hour time point. We have now included a brief explanation of this in the figure legend to make this point clear.  

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)? 

      We have edited the text to more accurately reflect what is going on in the screen shot. We have also replaced “WT” with “0” as this more accurately reflects the status of these cells. 

      (3) There is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself. 

      We now include more speculation on this point in the Discussion.

      Reviewer #3 (Public review): 

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low. 

      We acknowledge that we cannot definitively say any effect is a direct consequence of CHD4 depletion and have mitigated statements in the Results and Discussion. 

      Reviewing Editor Comments: 

      I am pleased to say all three experts had very complementary and complimentary comments on your paper - congratulations. Reviewer 3 does suggest toning down a few interpretations, which I suggest would help focus the manuscript on its greater strengths. I encourage a quick revision to this point, which will not go back to reviewers, before you request a version of record. I would also like to take this opportunity to thank all three reviewers for excellent feedback on this paper. 

      As advised we have mitigated the points raised by the reviewers. 

      Reviewer #2 (Recommendations for the authors): 

      p9, top: The sentence starting with "Genes increasing in expression after four hours...." is very difficult to understand and should be rephrased or broken up. 

      We agree. This has been completely re-written. 

      Reviewer #3 (Recommendations for the authors): 

      Sites of increased chromatin accessibility emerge more slowly than sites of lost chromatin accessibility. Figure 1D, a little increase in accessibility at 30min, but a more noticeable decrease at 30min. The sites of increased accessibility also have lower absolute accessibility than observed at locations where accessibility is lost. This raises the possibility that the sites of increased accessibility represent rapid but indirect changes occurring following loss of CHD4. Consistent with this, enrichment for CHD4 and MDB3 by CUT and TAG is far higher at sites of decreased accessibility. The low level of CHD4 occupancy observed at sites where accessibility increases may not be relevant to the reason these sites are affected. Such small enrichments can be observed when aligning to other genomic features. The authors interpret their findings as indicating that low occupancy of CHD4 exerts a long-lasting repressive effect at these locations. This is one possible explanation; however, an alternative is that these effects are indirect. Perhaps driven by the very large increase in TF binding that is observed following CHD4 degradation and which appears to occur at many locations regardless of whether CHD4 is present. 

      The reviewer is right to point out that we don’t know what is direct and what is indirect. All we know is that changes happen very rapidly upon CHD4 depletion. The changes in standard ATAC-seq signal appear greater at the sites showing decreased accessibility than those increasing, however the starting points are very different: a small increase from very low accessibility will likely be a higher fold change than a more visible decrease from very high accessibility (Fig. 1D). In contrast, Figure 6 shows a more visible increase in Tn5 integrations at sites increasing in accessibility at 30 minutes than the change in sites decreasing in accessibility at 30 minutes. We therefore disagree that the sites increasing in accessibility are more likely to be indirect targets. In further support of this, there is a rapid increase in MNase resistance at these sites upon MBD3 reintroduction (Fig. 6I), possibly indicating a direct impact of NuRD on these sites. 

      Substantial changes in Nanog and SOX2 binding are observed across the time course. These changes are very large, with 43k or 78k additional sites detected. How is this possible? Does the amount of these TF's present in cells change? The argument that transient occupancy of CHD4 acts to prevent TF's binding to what is likely to be many 100's of thousands of sites (if the data for Nanog and SOX2 are representative of other transcription factors such as KLF4) seems unlikely. 

      The large number of different sites identified gaining TF binding is likely to be a reflection of the number of cells being analysed: within the 10<sup>5</sup>-10<sup>6</sup> cells used for a Cut&Run experiment we detect many sites gaining TF binding. In individual cells we agree it would be unlikely for that many sites to become bound at the same time. We detect no changes in the amounts of Nanog or Sox2 in our cells across 4 hour CHD4 depletion time course. However, we maintain that low frequency interactions of CHD4 with a site can counteract low frequency TF binding and prevent it from stimulating opening of a cryptic enhancer. 

      While increased TF binding is observed at sites of gained accessibility, the changes in TF occupancy at the lost sites do not progress continuously across the time course. In addition, the changes in occupancy are small in comparison to those observed at the gained sites. The text comments on an increase in SOX2 and Nanog occupancy at 30 min, but there is either no change or a loss by 4 hours. It's difficult to know what to conclude from this. 

      At sites losing accessibility the enrichment of both Nanog and Sox2 increases at 30 minutes. We suspect this is due to the loss of CHD4’s TF-removal activity. Thereafter the two TFs show different trends: Nanog enrichment then decreases again, probably due to the decrease in accessibility at these sites. Sox2, by contrast, does not change very much, possibly due to its higher pioneering ability. It is true that the amounts of change are very small here, however Cut&Run was performed in triplicate and the summary graphs are plotted with standard error of the mean (which is often too small to see), demonstrating that the detected changes are highly significant. (We neglected to refer to the SEM  in our figure legends: this has now been corrected.) At sites where CHD4 maintains chromatin compaction, the amount of transcription factor binding goes from zero or nearly zero to some finite number, hence the fold change is very large. In contrast the changes at sites losing accessibility starts from high enrichment so fold changes are much smaller. 

      Changes in the diffusive motion of tagged TF's are measured. The data is presented as an average of measurements of individual TF's. What might be anticipated is that subpopulations of TF's would exhibit distinct behaviours. At many locations, occupancy of these TF's are presumably unchanged. At 1 hour, many new sites are occupied, and this would represent a subpopulation with high residence. A small population of TF's would be subject to distinct effects at the sites where accessibility reduces at the onehour time point. The analysis presented fails to distinguish populations of TF's exhibiting altered mobility consistent with the proportion of the TF's showing altered binding. 

      We agree that there are likely subpopulations of TFs exhibiting distinct binding behaviours, and our modality of imaging captures this, but to distinguish subpopulations within this would require a lot more data.

      However, there is no reason to believe that the TF binding at the new sites being occupied at 1 hr would have a difference in residence time to those sites already stably bound by TFs in the wildtype, i.e. that they would exhibit a different limitation to their residence time once bound compared to those sites. We do capture more stably bound trajectories per cell, but that’s not what we’re reporting on - it’s the dissociation rate of those that have already bound in a stable manner at sites where TF occupancy is detected also by ChIP.

      The analysis of transcription shown in Figure 2 indicates that high-quality data has been obtained, showing progressive changes to transcription. The linkage of the differentially expressed genes to chromatin changes shown in Figure 3 is difficult to interpret. The curves showing the distance distribution for increased or decreased DARs are quite similar for up- and down-regulated genes. The frequency density for gained sites is slightly higher, but not as much higher as would be expected, given these sites are c6fold more abundant than the sites with lost accessibility. The data presented do not provide a compelling link between the CHD4-induced chromatin changes and changes to transcription; the authors should consider revising to accommodate this. It is possible that much of the transcriptional response even at early time points is indirect. This is not unprecedented. For example, degradation of SOX2, a transcriptional activator, results in both repression and activation of similar numbers of genes https://pmc.ncbi.nlm.nih.gov/articles/PMC10577566/ 

      We agree that these figures do not provide a compelling link between the observed chromatin changes and gene expression changes. That 50K increased sites are, on average, located farther away from misregulated genes than are the 8K decreasing sites highlights that this is rarely going to be a case of direct derepression of a silenced gene, but rather distal sites could act as enhancers to spuriously activate transcription. This would certainly be a rare event, but could explain the low-level transcriptional noise seen in NuRD mutants. We have edited the wording to make this clearer.

      The model presented in Figure 7 includes distinct roles at sites that become more or less accessible following inactivation of CHD4. This is perplexing as it implies that the same enzymes perform opposing functions at some of the different sites where they are bound. 

      Our point is that it does the same thing at both kinds of sites, but the nature of the sites means that the consequences of CHD4 activity will be different. We have tried to make this clear in the text. 

      At active sites, it is clear that CHD4 is bound prior to activation of the degron and that chromatin accessibility is reduced following depletion. Changes in TF occupancy are complex, perhaps reflecting slow diffusion from less accessible chromatin and a global increase in the abundance of some pluripotency transcription factors such as SOX2 and Nanog that are competent for DNA binding. The link between sites of reduced accessibility and transcription is less clear. 

      At the inactive sites, the increase in accessibility could be driven by transcription factor binding. There is very little CHD4 present at these sites prior to activation of the degron, and TF binding may induce chromatin opening, which could be considered a rapid but indirect effect of the CHD4 degron. The link to transcription is not clear from the data presented, but it would be anticipated that in some cases it would drive activation. 

      We acknowledge these points and have indicated this possibility in the Results and the Discussion.

      No Analysis is performed to identify binding sequences enriched at the locations of decreased accessibility. This could potentially define transcription factors involved in CHD4 recruitment or that cause CHD4 to function differently in different contexts. 

      HOMER analyses failed to provide any unique insights. The sites going down are highly accessible in ES cells: they have TF binding sites that one would expect in ES cells. The increasing sites show an enrichment for G-rich sequences, which reflects the binding preference of CHD4.

    1. eLife Assessment

      This valuable study presents Altair-LSFM, a well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and reduced cost. The approach provides compelling evidence of its strengths, including the use of custom-machined baseplates, detailed assembly instructions, and demonstrated live-cell imaging capabilities. This manuscript will be of interest to microscopists and potentially biologists seeking accessible LSFM tools.

    2. Reviewer #1 (Public review):

      Summary:

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths:

      The article includes extensive supplementary material that complements the information in the main article.

      Live imaging has been incorporated, as requested, increasing the value of the paper.

      Weaknesses:

      None

    3. Reviewer #2 (Public review):

      Summary:

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source light-sheet microscope, that may be relatively easy to align and construct due to a custom-designed mounting plate. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or achieve high-resolution but are difficult to construct and are unstable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for a high-resolution, economical and easy to implement LSFM systems and address this need with Altair.

      Strengths:

      The authors succeed in their goals of implementing a relatively low cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      Weaknesses:

      There is still a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is now discussed in the manuscript but remains a limitation in the currently implemented design.

      (2) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. In the revised manuscript the authors now implement temperature control, but ideal live cell imaging conditions that would include gas and humidity control are not implemented. While, as the authors note, other microscopes that lack full environmental control have achieved widespread adoption, in my view this still limits the use cases of this microscope. There is no discussion on how this limitation of environmental control may be overcome in future iterations.

      (3) While the microscope is well designed and completely open source it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested even if they can afford it. Claims on how easy it may be to align the system for a "Novice" in supplementary table 5, appear to be unsubstantiated and should be removed unless a Novice was indeed able to assemble and validate the system in 2 weeks. It seems that these numbers were just arbitrarily proposed in the current version without any testing. In our experience it's hard to predict how long an alignment will take for a novice.

      (4) There is no quantification on field uniformity and the tunability of the light sheet parameters (FOV, thickness, PSF, uniformity). There is no quantification on how much improvement is offered by the resonant and how its operation may alter the light-sheet power, uniformity and the measured PSF.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging.

      The system is designed for ease of assembly and use, incorporating a custom-machined baseplate and in silico optimized optical paths to ensure robust alignment and performance.

      The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy.

      Compelling validation using fluorescent beads multicolor cellular imaging and dual-color live-cell imaging highlights the system's performance. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers that want to implement such a system, providing also estimate costs and a detailed description of needed expertises.

      Strengths:

      - Strong and accessible technical innovation.

      With an elegant combination of beam shaping and optical modelling, the authors provide a high resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of thin light-sheet and small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      - Impeccable optical performances and ease of mounting of samples

      The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity. At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      -Transparency and comprehensiveness of documentation and resources.

      A very detailed protocol provides detailed documentation about the setup, the optical modeling and the total cost.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. At this stage, I believe the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community.

      Comments on revisions:

      I appreciate the details and the care expressed by the authors in answering all my concerns, both the bigger ones (lack of live cell imaging demonstration) and to the smaller ones (about data storage, costs, expertise needed, and so on). The manuscript has been greatly improved, and I have no other comments to make.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the editors and reviewers for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to promote accessibility and reproducibility. Below, we provide point-by-point responses to each referee comment. In the process, we have significantly revised the manuscript to include live-cell imaging data and a quantitative evaluation of imaging speed. We now more explicitly describe the different variants of lattice light-sheet microscopy—highlighting differences in their illumination flexibility and image acquisition modes—and clarify how Altair-LSFM compares to each. We further discuss challenges associated with the 5 mm coverslip and propose practical strategies to overcome them. Additionally, we outline cost-reduction opportunities, explain the rationale behind key equipment selections, and provide guidance for implementing environmental control. Altogether, we believe these additions have strengthened the manuscript and clarified both the capabilities and limitations of AltairLSFM.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths: 

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      We thank the reviewer for their thoughtful assessment and for recognizing the strengths of our manuscript, including the extensive supplementary material. Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. We would therefore greatly appreciate the reviewer’s guidance on which sections were perceived as superficial so that we can expand them to better support readers and builders of the system.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to substantially reduce the barrier to entry for non-specialist laboratories. Many open-source implementations, such as OpenSPIM, OpenSPIN, and Benchtop mesoSPIM, similarly focused on accessibility and reproducibility rather than introducing new optical modalities, yet have had a measureable impact on the field by enabling broader community participation. Altair-LSFM follows this tradition, providing sub-cellular resolution performance comparable to advanced systems like LLSM, while emphasizing reproducibility, ease of construction through a precision-machined baseplate, and comprehensive documentation to facilitate dissemination and adoption.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We appreciate the reviewer’s comment and agree that there are practical challenges associated with handling 5 mm diameter coverslips in this configuration. In the revised manuscript, we now explicitly describe these challenges and provide practical solutions. Specifically, we highlight the use of a custommachined coverslip holder designed to simplify mounting and handling, and we direct readers to an alternative configuration using the Zeiss W Plan-Apochromat 20×/1.0 objective, which eliminates the need for small coverslips altogether.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We appreciate the reviewer’s perspective and understand the concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. Our decision to use these components was intentional: relying on a unified, professionally supported and maintained platform minimizes complexity associated with sourcing, configuring, and integrating hardware from multiple vendors, thereby reducing non-financial barriers to entry for non-specialist users.

      Importantly, these components are not the primary cost driver of Altair-LSFM (they represent roughly 18% of the total system cost). Nonetheless, for individuals where the price is prohibitive, we also outline several viable cost-reduction options in the revised manuscript (e.g., substituting manual stages, omitting the filter wheel, or using industrial CMOS cameras), while discussing the trade-offs these substitutions introduce in performance and usability. These considerations are now summarized in Supplementary Note 1, which provides a transparent rationale for our design and cost decisions.

      Finally, we note that even with these professional-grade components, Altair-LSFM remains substantially less expensive than commercial systems offering comparable optical performance, such as LLSM implementations from Zeiss or 3i.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our data. As noted, the current manuscript focuses on validating the optical performance and resolution of the system using fixed specimens to ensure reproducibility and stability.

      We fully agree on the importance of environmental control for live-cell imaging. In the revised manuscript, we now describe in detail how temperature regulation can be achieved using a custom-designed heated sample chamber, accompanied by detailed assembly instructions on our GitHub repository and summarized in Supplementary Note 2. For pH stabilization in systems lacking a 5% CO₂ atmosphere, we recommend supplementing the imaging medium with 10–25 mM HEPES buffer. Additionally, we include new live-cell imaging data demonstrating that Altair-LSFM supports in vitro time-lapse imaging of dynamic cellular processes under controlled temperature conditions.

      Reviewer #2 (Public review): 

      Summary: 

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems. 

      We thank the reviewer for their thoughtful summary. We agree that existing open-source systems primarily emphasize imaging of large specimens, whereas commercial systems that achieve sub-cellular resolution remain costly and complex. Our aim with Altair-LSFM was to bridge this gap—providing LLSM-level performance in a substantially more accessible and reproducible format. By combining high-NA optics with a precision-machined baseplate and open-source documentation, Altair offers a practical, high-resolution solution that can be readily adopted by non-specialist laboratories.

      Strengths: 

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful and generous assessment of our work. We are pleased that the manuscript’s emphasis on fundamental optical principles, design rationale, and practical implementation was clearly conveyed. We agree that Altair’s modular and accessible architecture provides a strong foundation for future variants tailored to specific experimental needs. To facilitate this, we have made all Zemax simulations, CAD files, and build documentation openly available on our GitHub repository, enabling users to adapt and extend the system for diverse imaging applications.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      We thank the reviewer for this thoughtful and constructive comment. We have revised the manuscript to more clearly distinguish between the original open-source implementation of LLSM and subsequent commercial versions by 3i and ZEISS. The revised Introduction and Discussion now explicitly note that while open-source and early implementations of LLSM can require expert alignment and maintenance, commercial systems—particularly the ZEISS Lattice Lightsheet 7—are designed for automated operation and stable, turn-key use, albeit at higher cost and with limited modifiability. We have also moderated earlier language regarding usability and stability to avoid anecdotal phrasing.

      We also now provide a more objective proxy for system complexity: the number of optical elements that require precise alignment during assembly and maintenance thereafter. The original open-source LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair-LSFM system contains only nine such elements. By this metric, Altair-LSFM is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories.

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We thank the reviewer for this helpful comment. We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may pose a practical limitation for some users. We now discuss this more explicitly in the revised manuscript. Specifically, we note that replacing the detection objective provides a straightforward solution to this constraint. For example, as demonstrated by Moore et al. (Lab Chip, 2021), pairing the Zeiss W Plan-Apochromat 20×/1.0 detection objective with the Thorlabs TL20X-MPL illumination objective allows imaging beyond the physical surfaces of both objectives, eliminating the need for small-format coverslips. In the revised text, we propose this modification as an accessible path toward greater compatibility with conventional sample mounting formats. We also note in the Discussion that Oblique Plane Microscopy (OPM) inherently avoids such nonstandard mounting requirements and, owing to its single-objective architecture, is fully compatible with standard environmental chambers.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We thank the reviewer for this important observation and agree that environmental control is critical for live-cell imaging applications. It is worth noting that the original open-source LLSM design, as well as the commercial version developed by 3i, provided temperature regulation but did not include integrated control of CO2 or humidity. Despite this limitation, these systems have been widely adopted and have generated significant biological insights. We also acknowledge that both OPM and the ZEISS implementation of LLSM offer clear advantages in this respect, providing compatibility with standard commercial environmental chambers that support full regulation of temperature, CO₂, and humidity.

      In the revised manuscript, we expand our discussion of environmental control in Supplementary Note 2, where we describe the Altair-LSFM chamber design in more detail and discuss its current implementation of temperature regulation and HEPES-based pH stabilization. Additionally, the Discussion now explicitly notes that OPM avoids the challenges associated with non-standard sample mounting and is inherently compatible with conventional environmental enclosures.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We agree that the original LLSM design offers substantially greater flexibility than what is reflected in our initial comparison, including the ability to generate multiple lattice geometries (e.g., square and hexagonal), operate in structured illumination mode, and acquire volumes using both sample- and lightsheet–scanning strategies. To address this, we now include Supplementary Note 3 that provides a detailed overview of the illumination modes and imaging flexibility afforded by the original LLSM implementation, and how these capabilities compare to both the commercial ZEISS Lattice Lightsheet 7 and our AltairLSFM system. In addition, we have revised the discussion to explicitly acknowledge that the original LLSM could operate in alternative scan strategies beyond sample scanning, providing greater context for readers and ensuring a more balanced comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we now include a demonstration of live-cell imaging to directly validate AltairLSFM’s suitability for dynamic biological applications. We also explicitly discuss the temporal resolution of the system in the main text (see Optoelectronic Design of Altair-LSFM), where we detail both software- and hardware-related limitations. Specifically, we evaluate the maximum imaging speed achievable with Altair-LSFM in conjunction with our open-source control software, navigate.

      For simplicity and reduced optoelectronic complexity, the current implementation powers the piezo through the ASI Tiger Controller, which modestly reduces its bandwidth. Nonetheless, for a 100 µm stroke typical of light-sheet imaging, we achieved sufficient performance to support volumetric imaging at most biologically relevant timescales. These results, along with additional discussion of the design trade-offs and performance considerations, are now included in the revised manuscript and expanded upon in the supplementary material.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a basic understanding of (or willingness to learn) optics, electronics, and instrumentation. Such a barrier exists for all open-source microscopes, and our goal is not to eliminate this requirement entirely but to substantially reduce the technical and logistical challenges that typically accompany the construction of custom light-sheet systems.

      Importantly, no machining experience or in-house fabrication capabilities are required. Users can simply submit the provided CAD design files and specifications directly to commercial vendors for fabrication. We have made this process as straightforward as possible by supplying detailed build instructions, recommended materials, and vendor-ready files through our GitHub repository. Our dissemination strategy draws inspiration from other successful open-source projects such as mesoSPIM, which has seen widespread adoption—over 30 implementations worldwide—through a similar model of exhaustive documentation, open-source software, and community support via user meetings and workshops.

      We also recognize that documentation alone cannot fully replace hands-on experience. To further lower barriers to adoption, we are actively working with commercial vendors to streamline procurement and assembly, and Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant that provides resources for hosting workshops, offering real-time community support, and developing supplementary training materials.

      In the revised manuscript, we now expand the Discussion to explicitly acknowledge these implementation considerations and to outline our ongoing efforts to support a broad and diverse user base, ensuring that laboratories with varying levels of technical expertise can successfully adopt and maintain the Altair-LSFM platform.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We thank the reviewer for this insightful comment and agree that our original language regarding adaptability may have overstated the degree to which Altair-LSFM can be modified without prior experience. It was not our intention to imply that the system can be easily redesigned by users with limited technical background. Meaningful adaptations of the optical or mechanical design do require expertise in optical layout, optomechanical design, and alignment.

      That said, for laboratories with such expertise, we aim to facilitate modifications by providing comprehensive resources—including detailed Zemax simulations, complete CAD models, and alignment documentation. These materials are intended to reduce the development burden for expert users seeking to tailor the system to specific experimental requirements, without necessitating a complete re-optimization of the optical path from first principles.

      In the revised manuscript, we clarify this point and temper our language regarding adaptability to better reflect the realistic scope of customization. Specifically, we now state in the Discussion: “For expert users who wish to tailor the instrument, we also provide all Zemax illumination-path simulations and CAD files, along with step-by-step optimization protocols, enabling modification and re-optimization of the optical system as needed.” This revision ensures that readers clearly understand that Altair-LSFM is designed for reproducibility and straightforward assembly in its default configuration, while still offering the flexibility for modification by experienced users.

      Reviewer #3 (Public review):

      Summary: 

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging. The system is designed for ease of assembly and use, incorporating a custommachined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells. The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy. Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      We thank the reviewer for their thoughtful and positive assessment of our work. We appreciate their recognition of Altair-LSFM’s design and performance, including its ability to achieve high-resolution, imaging throughout a 266-micron field of view. While Altair-LSFM approaches the practical limits of diffraction-limited performance, it does not exceed the fundamental diffraction limit; rather, it achieves near-theoretical resolution through careful optical optimization, beam shaping, and alignment. We are grateful for the reviewer’s acknowledgment of the accessibility and comprehensive documentation that make this system broadly implementable.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity.

      At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      We thank the reviewer for their thoughtful and encouraging comments. We are pleased that the technical innovation, optical performance, and accessibility of Altair-LSFM were recognized. Our goal from the outset was to develop a diffraction-limited, high-resolution light-sheet system that balances optical performance with reproducibility and ease of implementation. We are also pleased that the use of precisionmachined baseplates was recognized as a practical and effective strategy for achieving performance while maintaining ease of assembly.

      Weaknesses: 

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signalto-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we have significantly expanded our discussion of different light-sheet systems to provide clearer quantitative and conceptual context for Altair-LSFM. These comparisons are based on values reported in the literature, as we do not have access to many of these instruments (e.g., DaXi, diSPIM, or commercial and open-source variants of LLSM), and a direct experimental comparison is beyond the scope of this work.

      We note that while quantitative parameters such as signal-to-noise ratio are important, they are highly sample-dependent and strongly influenced by imaging conditions, including fluorophore brightness, camera characteristics, and filter bandpass selection. For this reason, we limited our comparison to more general image-quality metrics—such as light-sheet thickness, resolution, and field of view—that can be reliably compared across systems.

      Finally, per the reviewer’s recommendation, we have added additional discussion clarifying the differences between dual-objective and single-objective light-sheet architectures, outlining their respective strengths, limitations, and suitability for different experimental contexts.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We fully agree that environmental control is critical for live-cell imaging applications. In the revised manuscript, we now describe the design and implementation of a temperature-regulated sample chamber in Supplementary Note 2, which maintains stable imaging conditions through the use of integrated heating elements and thermocouples. This approach enables precise temperature control while minimizing thermal gradients and optical drift. For pH stabilization, we recommend the use of 10–25 mM HEPES in place of CO₂ regulation, consistent with established practice for most light-sheet systems, including the initial variant of LLSM. Although full humidity and CO₂ control are not readily implemented in dual-objective configurations, we note that single-objective designs such as OPM are inherently compatible with commercial environmental chambers and avoid these constraints. Together, these additions clarify how environmental control can be achieved within Altair-LSFM and situate its capabilities within the broader LSFM design space.

      (3) System cost and data storage cost: While the system presented has the advantage of being opensource, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We agree that cost considerations are critical for adoption in academic environments. We would also like to clarify that the quoted $150k includes the optical table and laser source. In the revised manuscript, Supplementary Note 1 now includes an expanded discussion of cost–performance trade-offs and potential paths for cost reduction.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      In the revised manuscript, we now include Supplementary Note 4, which provides a high-level discussion of data storage needs, approximate costs, and practical strategies for managing large datasets generated by light-sheet microscopy. This section offers general guidance—including file-format recommendations, and cost considerations—but we note that actual costs will vary by institution and contractual agreements.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) A picture, or full CAD design of the complete instrument, should be included as a main figure.

      A complete CAD rendering of the microscope is now provided in Supplementary Figure 4.

      (2) There is no quantitative comparison of the effects of the tilting resonant galvo; only a cartoon, a figure should be included.

      The cartoon was intended purely as an educational illustration to conceptually explain the role of the tilting resonant galvo in shaping and homogenizing the light sheet. To clarify this intent, we have revised both the figure legend and corresponding text in the main manuscript. For readers seeking quantitative comparisons, we now reference the original study that provides a detailed analysis of this optical approach, as well as a review on the subject.

      (3) Description of L4 is missing in the Figure 1 caption.

      Thank you for catching this omission. We have corrected it.

      (4) The beam profiles in Figures 1c and 3a, please crop and make the image bigger so the profile can be appreciated. The PSFs in Figure 3c-e should similarly be enlarged and presented using a dynamic range/LUT such that any aberrations can be appreciated.

      In Figure 1c, our goal was to qualitatively illustrate the uniformity of the light-sheet across the full field of view, while Figure 1d provided the corresponding quantitative cross-section. To improve clarity, we have added an additional figure panel offering a higher-magnification, localized view of the light-sheet profile. For Figure 3c–e, we have enlarged the PSF images and adjusted the display range to better convey the underlying signal and allow subtle aberrations to be appreciated.

      (5) It is unclear why LLSM is being used as the gold standard, since in its current commercial form, available from Zeiss, it is a turn-key system designed for core facilities. The original LLSM is also a versatile instrument that provides much more than the square lattice for illumination, including structured illumination, hexagonal lattices, live-cell imaging, wide-field illumination, different scan modes, etc. These additional features are not even mentioned when compared to the Altair-LSFM. If a comparison is to be provided, it should be fair and balanced. Furthermore, as outlined in the public review, anecdotal statements on "most used", "difficult to align", or "unstable" should not be provided without data.

      In the revised manuscript, we have carefully removed anecdotal statements and, where appropriate, replaced them with quantitative or verifiable information. For instance, we now explicitly report that the square lattice was used in 16 of the 20 figure subpanels in the original LLSM publication, and we include a proxy for optical complexity based on the number of optical elements requiring alignment in each system.

      We also now clearly distinguish between the original LLSM design—which supports multiple illumination and scanning modes—and its subsequent commercial variants, including the ZEISS Lattice Lightsheet 7, which prioritizes stability and ease of use over configurational flexibility (see Supplementary Note 3).

      (6) The authors should recognize that implementing custom optics, no matter how well designed, is a big barrier to cross for most cell biology labs.

      We fully understand and now acknowledge in the main text that implementing custom optics can present a significant barrier, particularly for laboratories without prior experience in optical system assembly. However, similar challenges were encountered during the adoption of other open-source microscopy platforms, such as mesoSPIM and OpenSPIM, both of which have nonetheless achieved widespread implementation. Their success has largely been driven by exhaustive documentation, strong community support, and standardized design principles—approaches we have also prioritized in Altair-LSFM. We have therefore made all CAD files, alignment guides, and detailed build documentation publicly available and continue to develop instructional materials and community resources to further reduce the barrier to adoption.

      (7) Statements on "hands on workshops" though laudable, may not be appropriate to include in a scientific publication without some documentation on the influence they have had on implanting the microscope.

      We understand the concern. Our intention in mentioning hands-on workshops was to convey that the dissemination effort is supported by an NIH Biomedical Technology Development and Dissemination grant, which includes dedicated channels for outreach and community engagement. Nonetheless, we agree that such statements are not appropriate without formal documentation of their impact, and we have therefore removed this text from the revised manuscript.

      (8) It is claimed that the microscope is "reliable" in the discussion, but with no proof, long-term stability should be assessed and included.

      Our experience with Altair-LSFM has been that it remains well-aligned over time—especially in comparison to other light-sheet systems we worked on throughout the last 11 years—we acknowledge that this assessment is anecdotal. As such, we have omitted this claim from the revised manuscript.

      (9) Due to the reliance on anecdotal statements and comparisons without proof to other systems, this paper at times reads like a brochure rather than a scientific publication. The authors should consider editing their manuscript accordingly to focus on the technical and quantifiable aspects of their work.

      We agree with the reviewer’s assessment and have revised the manuscript to remove anecdotal comparisons and subjective language. Where possible, we now provide quantitative metrics or verifiable data to support our statements.

      Reviewer #3 (Recommendations for the authors):

      Other minor points that could improve the manuscript (although some of these points are explained in the huge supplementary manual): 

      (1) The authors explain thoroughly their design, and they chose a sample-scanning method. I think that a brief discussion of the advantages and disadvantages of such a method over, for example, a laserscanning system (with fixed sample) in the main text will be highly beneficial for the users.

      In the revised manuscript, we now include a brief discussion in the main text outlining the advantages and limitations of a sample-scanning approach relative to a light-sheet–scanning system. Specifically, we note that for thin, adherent specimens, sample scanning minimizes the optical path length through the sample, allowing the use of more tightly focused illumination beams that improve axial resolution. We also include a new supplementary figure illustrating how this configuration reduces the propagation length of the illumination light sheet, thereby enhancing axial resolution.

      (2) The authors justify selecting a 0.6 NA illumination objective over alternatives (e.g., Special Optics), but the manuscript would benefit from a more quantitative trade-off analysis (beam waist, working distance, sample compatibility) with other possibilities. Within the objective context, a comparison of the performances of this system with the new and upcoming single-objective light-sheet methods (and the ones based also on optical refocusing, e.g., DAXI) would be very interesting for the goodness of the manuscript.

      In the revised manuscript, we now provide a quantitative trade-off analysis of the illumination objectives in Supplementary Note 1, including comparisons of beam waist, working distance, and sample compatibility. This section also presents calculated point spread functions for both the 0.6 NA and 0.67 NA objectives, outlining the performance trade-offs that informed our design choice. In addition, Supplementary Note 3 now includes a broader comparison of Altair-LSFM with other light-sheet modalities, including diSPIM, ASLM, and OPM, to further contextualize the system’s capabilities within the evolving light-sheet microscopy landscape.

      (3) The modularity of the system is implied in the context of the manuscript, but not fully explained. The authors should specify more clearly, for example, if cameras could be easily changed, objectives could be easily swapped, light-sheet thickness could be tuned by changing cylindrical lens, how users might adapt the system for different samples (e.g., embryos, cleared tissue, live imaging), .etc, and discuss eventual constraints or compatibility issues to these implementations.

      Altair-LSFM was explicitly designed and optimized for imaging live adherent cells, where sample scanning and short light-sheet propagation lengths provide optimal axial resolution (Supplementary Note 3). While the same platform could be used for superficial imaging in embryos, systems implementing multiview illumination and detection schemes are better suited for such specimens. Similarly, cleared tissue imaging typically requires specialized solvent-compatible objectives and approaches such as ASLM that maximize the field of view. We have now added some text to the Design Principles section that explicitly state this.

      Altair-LSFM offers varying levels of modularity depending on the user’s level of expertise. For entry-level users, the illumination numerical aperture—and therefore the light-sheet thickness and propagation length—can be readily adjusted by tuning the rectangular aperture conjugate to the back pupil of the illumination objective, as described in the Design Principles section. For mid-level users, alternative configurations of Altair-LSFM, including different detection objectives, stages, filter wheels, or cameras, can be readily implemented (Supplementary Note 1). Importantly, navigate natively supports a broad range of hardware devices, and new components can be easily integrated through its modular interface. For expert users, all Zemax simulations, CAD models, and step-by-step optimization protocols are openly provided, enabling complete re-optimization of the optical design to meet specific experimental requirements.

      (4) Resolution measurements before and after deconvolution are central to the performance claim, but the deconvolution method (PetaKit5D) is only briefly mentioned in the main text, it's not referenced, and has to be clarified in more detail, coherently with the precision of the supplementary information. More specifically, PetaKit5D should be referenced in the main text, the details of the deconvolution parameters discussed in the Methods section, and the computational requirements should also be mentioned. 

      In the revised manuscript, we now provide a dedicated description of the deconvolution process in the Methods section, including the specific parameters and algorithms used. We have also explicitly referenced PetaKit5D in the main text to ensure proper attribution and clarity. Additionally, we note the computational requirements associated with this analysis in the same section for completeness.

      (5)  Image post-processing is not fully explained in the main text. Since the system is sample-scanning based, no word in the main text is spent on deskewing, which is an integral part of the post-processing to obtain a "straight" 3D stack. Since other systems implement such a post-processing algorithm (for example, single-objective architectures), it would be beneficial to have some discussion about this, and also a brief comparison to other systems in the main text in the methods section. 

      In the revised manuscript, we now explicitly describe both deskewing (shearing) and deconvolution procedures in the Alignment and Characterization section of the main text and direct readers to the Methods section. We also briefly explain why the data must be sheared to correct for the angled sample-scanning geometry for LLSM and Altair-LSFM, as well as both sample-scanning and laser-scanning-variants of OPMs.

      (6) A brief discussion on comparative costs with other systems (LLSM, dispim, etc.) could be helpful for non-imaging expert researchers who could try to implement such an optical architecture in their lab.

      Unfortunately, the exact costs of commercial systems such as LLSM or diSPIM are typically not publicly available, as they depend on institutional agreements and vendor-specific quotations. Nonetheless, we now provide approximate cost estimates in Supplementary Note 1 to help readers and prospective users gauge the expected scale of investment relative to other advanced light-sheet microscopy systems.

      (7) The "navigate" control software is provided, but a brief discussion on its advantages compared to an already open-access system, such as Micromanager, could be useful for the users.

      In the revised manuscript, we now include Supplementary Note 5 that discusses the advantages and disadvantages of different open-source microscope control platforms, including navigate and MicroManager. In brief, navigate was designed to provide turnkey support for multiple light-sheet architectures, with pre-configured acquisition routines optimized for Altair-LSFM, integrated data management with support for multiple file formats (TIFF, HDF5, N5, and Zarr), and full interoperability with OMEcompliant workflows. By contrast, while Micro-Manager offers a broader library of hardware drivers, it typically requires manual configuration and custom scripting for advanced light-sheet imaging workflows.

      (8) The cost and parts are well documented, but the time and expertise required are not crystal clear.Adding a simple time estimate (perhaps in the Supplement Section) of assembly/alignment/installation/validation and first imaging will be very beneficial for users. Also, what level of expertise is assumed (prior optics experience, for example) to be needed to install a system like this? This can help non-optics-expert users to better understand what kind of adventure they are putting themselves through.

      We thank the reviewer for this helpful suggestion. To address this, we have added Supplementary Table S5, which provides approximate time estimates for assembly, alignment, validation, and first imaging based on the user’s prior experience with optical systems. The table distinguishes between novice (no prior experience), moderate (some experience using but not assembling optical systems), and expert (experienced in building and aligning optical systems) users. This addition is intended to give prospective builders a realistic sense of the time commitment and level of expertise required to assemble and validate AltairLSFM.

      Minor things in the main text:

      (1) Line 109: The cost is considered "excluding the laser source". But then in the table of costs, you mention L4cc as a "multicolor laser source", for 25 K. Can you explain this better? Are the costs correct with or without the laser source? 

      We acknowledge that the statement in line 109 was incorrect—the quoted ~$150k system cost does include the laser source (L4cc, listed at $25k in the cost table). We have corrected this in the revised manuscript.

      (2) Line 113: You say "lateral resolution, but then you state a 3D resolution (230 nm x 230 nm x 370 nm). This needs to be fixed.

      Thank you, we have corrected this.

      (3) Line 138: Is the light-sheet uniformity proven also with a fluorescent dye? This could be beneficial for the main text, showing the performance of the instrument in a fluorescent environment.

      The light-sheet profiles shown in the manuscript were acquired using fluorescein to visualize the beam. We have revised the main text and figure legends to clearly state this.

      (4) Line 149: This is one of the most important features of the system, defying the usual tradeoff between light-sheet thickness and field of view, with a regular Gaussian beam. I would clarify more specifically how you achieve this because this really is the most powerful takeaway of the paper.

      We thank the reviewer for this key observation. The ability of Altair-LSFM to maintain a thin light sheet across a large field of view arises from diffraction effects inherent to high NA illumination. Specifically, diffraction elongates the PSF along the beam’s propagation direction, effectively extending the region over which the light sheet remains sufficiently thin for high-resolution imaging. This phenomenon, which has been the subject of active discussion within the light-sheet microscopy community, allows Altair-LSFM to partially overcome the conventional trade-off between light-sheet thickness and propagation length. We now clarify this point in the main text and provide a more detailed discussion in Supplementary Note 3, which is explicitly referenced in the discussion of the revised manuscript.

      (5) Line 171: You talk about repeatable assembly...have you tried many different baseplates? Otherwise, this is a complicated statement, since this is a proof-of-concept paper. 

      We thank the reviewer for this comment. We have not yet validated the design across multiple independently assembled baseplates and therefore agree that our previous statement regarding repeatable assembly was premature. To avoid overstating the current level of validation, we have removed this statement from the revised manuscript.

      (6) Line 187: same as above. You mention "long-term stability". For how long did you try this? This should be specified in numbers (days, weeks, months, years?) Otherwise, it is a complicated statement to make, since this is a proof-of-concept paper.

      We also agree that referencing long-term stability without quantitative backing is inappropriate, and have removed this statement from the revised manuscript.

      (7) Line 198: "rapid z-stack acquisition. How rapid? Also, what is the limitation of the galvo-scanning in terms of the imaging speed of the system? This should be noted in the methods section.

      In the revised manuscript, we now clarify these points in the Optoelectronic Design section. Specifically, we explicitly note that the resonant galvo used for shadow reduction operates at 4 kHz, ensuring that it is not rate-limiting for any imaging mode. In the same section, we also evaluate the maximum acquisition speeds achievable using navigate and report the theoretical bandwidth of the sample-scanning piezo, which together define the practical limits of volumetric acquisition speed for Altair-LSFM.

      (8) Line 234: Peta5Kit is discussed in the additional documentation, but should be referenced here, as well.

      We now reference and cite PetaKit5D.

      (9) Line 256: "values are on par with LLSM", but no values are provided. Some details should also be provided in the main text.

      In the revised manuscript, we now provide the lateral and axial resolution values originally reported for LLSM in the main text to facilitate direct comparison with Altair-LSFM. Additionally, Supplementary Note 3 now includes an expanded discussion on the nuances of resolution measurement and reporting in lightsheet microscopy.

      Figures:

      (1) Figure 1 could be implemented with Figure 3. They're both discussing the validation of the system (theoretically and with simulations), and they could be together in different panels of the same figure. The experimental light-sheet seems to be shown in a transmission mode. Showing a pattern in a fluorescent dye could also be beneficial for the paper.

      In Figure 1, our goal was to guide readers through the design process—illustrating how the detection objective’s NA sets the system’s resolution, which defines the required pixel size for Nyquist sampling and, in turn, the field of view. We then use Figure 1b–c to show how the illumination beam was designed and simulated to achieve that field of view. In contrast, Figure 3 presents the experimental validation of the illumination system. To avoid confusion, we now clarify in the text that the light sheet shown in Figure 3 was visualized in a fluorescein solution and imaged in transmission mode. While we agree that Figures 1 and 3 both serve to validate the system, we prefer to keep them as separate figures to maintain focus within each panel. We believe this organization better supports the narrative structure and allows readers to digest the theoretical and experimental validations independently.

      (2) Figure 3: Panels d and e show the same thing. Why would you expect that xz and yz profiles should be different? Is this due to the orientation of the objectives towards the sample?

      In Figure 3, we present the PSF from all three orthogonal views, as this provides the most transparent assessment of PSF quality—certain aberration modes can be obscured when only select perspectives are shown. In principle, the XZ and YZ projections should be equivalent in a well-aligned system. However, as seen in the XZ projection, a small degree of coma is present that is not evident in the YZ view. We now explicitly note this observation in the revised figure caption to clarify the difference between these panels.

      (3) Figure 4's single boxes lack a scale bar, and some of the Supplementary Figures (e.g. Figure 5) lack detailed axis labels or scale bars. Also, in the detailed documentation, some figures are referred to as Figure 5. Figure 7 or, for example, figure 6. Figure 8, and this makes the cross-references very complicated to follow

      In the revised manuscript, we have corrected these issues. All figures and supplementary figures now include appropriate scale bars, axis labels, and consistent formatting. We have also carefully reviewed and standardized all cross-references throughout the main text and supplementary documentation to ensure that figure numbering is accurate and easy to follow.

    1. eLife Assessment

      In this study, the authors investigate the role of ZMAT3, a p53 target gene, in tumor suppression and RNA splicing regulation. Using quantitative proteomics, the authors uncover that ZMAT3 knockout leads to upregulation of HKDC1, a gene linked to mitochondrial respiration, and that ZMAT3 suppresses HKDC1 expression by inhibiting c-JUN-mediated transcription. This set of convincing evidence reveals a fundamental mechanism by which ZMAT3 contributes to p53-driven tumor suppression by regulating mitochondrial respiration.

    2. Reviewer #1 (Public review):

      Summary:

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53-mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and showed markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration. The data are novel, compelling and very interesting.

      Comments on revisions:

      The authors have done a thorough job addressing my comments. This manuscript is quite strong and will be highly cited for its novelty and rigor.

    3. Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.

      Comments on revisions:

      The authors have mostly addressed to the concerns raised previously by this reviewer. The lack of functional assays made the reported findings mostly mechanistic with no clear biological context.

      The present manuscript is certainly improved compared to the previous version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:  

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration.  

      Strengths:

      The authors use multiple orthogonal approaches to test the majority of their findings.  The authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration.  

      Weaknesses:

      Some indication as to whether other c-JUN target genes are also regulated by ZMAT3 would improve the broad relevance of the authors' findings.  

      We thank the reviewer for the kind words and the thoughtful suggestion. As recommended, to identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.  

      Strengths:

      Mechanistically, ZMAT3 suppresses HKDC1 transcription by sequestering JUN and preventing its binding to the HKDC1 promoter, resulting in reduced HKDC1 expression. Conversely, p53 mutation leads to ZMAT3 downregulation and HKDC1 overexpression, thereby promoting increased mitochondrial respiration and proliferation. This mechanism is novel; however, the authors should address several points.  

      Weaknesses:

      The authors conduct mechanistic experiments (e.g., transcript and protein quantification, luciferase assays) to demonstrate regulatory interactions between p53, ZMAT3, JUN, and HKDC1. These findings should be supported with functional assays, such as proliferation, apoptosis, or mitochondrial respiration analyses.  

      We thank the reviewer for appreciating our work and for this valuable suggestion. The reviewer rightly pointed out that supporting the regulatory interactions between p53, ZMAT3, JUN and HKDC1 with functional assays such as proliferation, apoptosis and mitochondrial respiration analyses would strengthen our mechanistic data. During the revision of our manuscript, we attempted to address this point by performing simultaneously knockdown of these proteins; however, we observed substantial toxicity under these conditions, making the functional assays technically unfeasible. This outcome was not unexpected as knockdown of JUN or HKDC1 individually results in growth defects.  We therefore focused our efforts on addressing the recommendation for authors.  

      Reviewer #3 (Public review):

      Summary:  

      In their manuscript, Kumar et al. investigate the mechanisms underlying the tumor suppressive function of the RNA binding protein ZMAT3, a previously described tumor suppressor in the p53 pathway. To this end, they use RNA-sequencing and proteomics to characterize changes in ZMAT3-deficient cells, leading them to identify the hexokinase HKDC1 as upregulated with ZMAT3 deficiency first in colorectal cancer cells, then in other cell types of both mouse and human origin. This increase in HKDC1 is associated with increased mitochondrial respiration. As ZMAT3 has been reported as an RNA-binding and DNA-binding protein, the authors investigated this via PAR-CLIP and ChIP-seq but did not observe ZMAT3 binding to HKDC1 pre-mRNA or DNA. Thus, to better understand how ZMAT3 regulates HKDC1, the authors used quantitative proteomics to identify ZMAT3interacting proteins. They identified the transcription factor JUN as a ZMAT3-interacting protein and showed that JUN promotes the increased HKDC1 RNA expression seen with ZMAT3 inactivation. They propose that ZMAT3 inhibits JUN-mediated transcriptional induction of HKDC1 as a mechanism of tumor suppression. This work uncovers novel aspects of the p53 tumor suppressor pathway.  

      Strengths:

      This novel work sheds light on one of the most well-established yet understudied p53 target genes, ZMAT3, and how it contributes to p53's tumor suppressive functions. Overall, this story establishes a p53-ZMAT3-HKDC1 tumor suppressive axis, which has been strongly substantiated using a variety of orthogonal approaches, in different cell lines and with different data sets.  

      Weaknesses:

      While the role of p53 and ZMAT3 in repressing HKDC1 is well substantiated, there is a gap in understanding how ZMAT3 acts to repress JUN-driven activation of the HKDC1 locus. How does ZMAT3 inhibit JUN binding to HKDC1? Can targeted ChIP experiments or RIP experiments be used to make a more definitive model? Can ZMAT3 mutants help to understand the mechanisms? Future work can further establish the mechanisms underlying how ZMAT3 represses JUN activity.  

      We thank the reviewer for the kind words and the invaluable suggestion. The reviewer has an excellent point regarding how ZMAT3 inhibits JUN binding to HKDC1 locus.Our new data included in the revised manuscript show that the ZMAT3-JUN interaction is lost in the presence of DNase or RNase, indicating that the interaction requires both DNA and RNA. This result suggests that ZMAT3 and JUN  form an RNA-dependent, chromatin- associated complex. Although not directly investigated in our study, this finding is consistent with emerging evidence that RBPs can function as chromatin-associated cofactors in transcription. For example, functional interplay between transcription factor YY1 and the RNA binding protein RBM25 co-regulates a broad set of genes, where RBM25 appears to engage promoters first and then recruit YY1, with RNA proposed to guide target recognition. We have discussed this possibility in the discussion section of revised manuscript (page 13). We agree that future work using ZMAT3 mutants and targeted ChIP or RIP assays will be valuable to delineate the precise mechanism by which ZMAT3 inhibits JUN binding to its target genes.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. HKDC1 is emerging as an important player in human cancer. Importantly, the authors show both acute (gene silencing) and chronic (CRISPR KO) approaches to silence ZMAT3, and they do this in several cell lines. Notably, they show that ZMAT3 silencing leads to impaired mitochondrial respiration, in a manner that is rescued by silencing of HKDC1. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter (intron 1), and altered mitochondrial respiration. The findings are compelling, and the authors use multiple orthogonal approaches to test most findings. And the authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration. As such, enthusiasm is high for this manuscript. 

      Addressing the following question would improve the manuscript. 

      It is not clear how many (other) c-JUN target genes might be impacted by ZMAT3; other important c-JUN targets in cancer include GLS1, WEE1, SREBP1, GLUT1, and CD36, so there could be a global impact on metabolism in ZMAT3 KO cells. Can the authors perform qPCR on these targets in ZMAT3 WT and KO cells and see if these target genes are differentially expressed? 

      We thank the reviewer for this thoughtful suggestion. As recommended, we examined the expression of key c-JUN target genes GLS1 (also known as GLS), WEE1, SREBP1, GLUT1, and CD36 in ZMAT3-WT and ZMAT3-KO cells. We first analyzed publicly available JUN ChIP-Seq data from three ENCODE cell lines, which revealed JUN binding peaks near or upstream of exon 1 for GLS1/GLS, SREBP1, and SLC2A1/GLUT1, but not for WEE1 or CD36 (Appendix 1, panels A-E). Based on these results, we performed RT-qPCR for GLS1/GLS, SREBP1 and SLC2A1 in ZMAT3-WT and ZMAT3-KO cells, with or without JUN knockdown. GLS mRNA was significantly reduced upon JUN knockdown in both ZMAT3-WT cells and ZMAT3-KO cells, but it was not upregulated upon loss of ZMAT3, indicating that GLS is a JUN target gene, but it is not regulated by ZMAT3. In contrast, SREBF1 or SLC2A1 expression remained unchanged upon ZMAT3 loss or JUN knockdown (Appendix 1 panels F-H). These data suggest that the ZMAT3/JUN axis does not regulate the expression of these genes.

      To identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Minor concerns: 

      (1) Line 150: observed a modest. 

      (2) Line 159: Figure 2G appears to be inaccurately cited. 

      (3) Line 191: assays to measure. 

      We thank the reviewer for pointing these out. These minor concerns have been addressed in the text.  

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 1E: Can the authors clarify what the numbers on the left side of the chart represent? Do they refer to the scale?

      The numbers on the Y-axis represent the -log 10 (p- value) where higher values correspond to more significant changes. For visualization purposes, the significant changes are shown in red.  

      (2) Page 5, line 123: The sentence "As expected, ZMAT3 mRNA levels were decreased in the ZMAT3-KO cells" is redundant, as this information was already mentioned on page 4, line 103.  

      We thank the reviewer for noticing this redundancy. The repeated sentence has been removed in the revised manuscript.  

      (3) Page 5: The authors state: "Transcriptome-wide, upon loss of ZMAT3, 606 genes were significantly up-regulated (adj. p < 0.05 and 1.5-fold change) and 552 were down-regulated, with a median fold change of 1.76 and 0.55 for the up- and down-regulated genes, respectively." Later, on page 6, they write: "Comparison of the RNA-seq data from ZMAT3WT vs. ZMAT3-KO and CTRL siRNA vs. ZMAT3 siRNA-transfected HCT116 cells indicated that 1023 genes were commonly up-regulated, and 1042 were commonly down-regulated upon ZMAT3 loss (Figure S2C and D)." Why is the number of deregulated transcripts higher in the ZMAT3-WT vs. ZMAT3-KO comparison than in the CTRL siRNA vs. ZMAT3 siRNA comparison? Are the authors using less stringent criteria in the second analysis? This point should be clarified. 

      We thank the reviewer for highlighting this point. The reviewer is correct that less stringent criteria were used in the second analysis. On page 5, we applied stringent thresholds (adjusted p-value < 0.05 and 1.5-fold change) to identify high-confidence transcriptome-wide changes upon ZMAT3 loss. In contrast, for the comparison of both RNA-seq datasets (ZMAT3-WT vs. KO and siCTRL vs. siZMAT3), we included genes that were consistently up- or downregulated, without applying a fold change threshold, focusing instead on significantly altered genes (adjusted p < 0.05) in both datasets. This allowed us to capture broader and more reproducible transcriptomic changes that occur upon ZMAT3 depletion, including modest but significant changes upon transient ZMAT3 knockdown with siRNAs. We have now clarified this distinction on page 6 of the revised manuscript.

      (4) Figures 2B and 2E: The authors should provide quantification of HKDC1 protein levels normalized to a loading control. In addition, they should assess HKDC1 protein abundance upon ZMAT3 interference in SWI1222 and HCEC1CT cells, not just in HepG2 and HCT116 cells. 

      We thank the reviewer for this suggestion. We have now quantified all immunoblots presented throughout the manuscript, including those shown in Figures 2B and 2E, and all other figures containing protein analyses. Band intensities were quantified using ImageJ densitometry and normalized to GAPDH as the loading control. In addition, as suggested, we examined HKDC1 protein levels following ZMAT3 knockdown in two additional cell lines, SW1222 and HCEC-1CT. Consistent with our observations in HepG2 and HCT116 cells, ZMAT3 depletion led to increased HKDC1 protein levels in both SW1222 and HCEC-1CT cells. These new data are now included in Figure 2-figure supplement 1F and G. We have updated the Results section, figure legends, and figures to reflect these additions.

      (5) Figure 3A: It is unclear which gene was knocked out in the "KO cells." The authors should clearly specify this.

      We thank the reviewer for pointing this out. We have now updated Figure 3A.

      (6) Figure 3D: The result appears counterintuitive in comparison to Figure 3E. Why does HKDC1 knockdown reduce cell confluency more in ZMAT3 KO cells than in control (ZMAT3 wild-type) cells? The authors should explain this discrepancy more clearly.

      We thank the reviewer for this insightful comment. As shown in Figure 3D and 3E, knockdown of HKDC1 resulted in a greater decrease in proliferation in ZMAT3-KO cells than in ZMAT3-WT cells. This observation was indeed unexpected, given that HKDC1 acts downstream of ZMAT3. One possible explanation is that elevated HKDC1 expression in ZMAT3-KO cells increases their reliance on HKDC1 for sustaining proliferation, and that HKDC1 may also participate in additional pathways in ZMAT3-KO cells. Consequently, transient knockdown of HKDC1 in ZMAT3-KO cells would have a more pronounced effect on proliferation due to their increased dependency on HKDC1 activity. In contrast, ZMAT3WT cells which express lower levels of HKDC1 are less dependent on its function and therefore less sensitive to its depletion. We have now clarified this point on page 8 of the revised manuscript.  

      Reviewer #3 (Recommendations for the authors):  

      (1) Why do the authors start their analysis by knocking out the p53 response element in Zmat3? That should be clarified. In addition, since clones were picked after CRISPR KO of Zmat3, were experiments done to confirm that p53 signaling was not disrupted?

      We thank the reviewer for this thoughtful question. We began our study by targeting the p53 response element (p53RE) in the ZMAT3 locus because the basal expression of ZMAT3 is regulated by p53 (Muys, Bruna R. et al., Genes & Development, 2021). Deleting the p53RE therefore allowed us to markedly reduce ZMAT3 expression without disrupting the entire ZMAT3 locus. We have clarified this rationale on page 4 of the revised manuscript. To ensure that p53 signaling was not affected by this modification, we verified that canonical p53 targets such as p21 were equivalently induced in both ZMAT3WT and KO cells following Nutlin treatment and that p53 induction was unchanged(Figure 4F and Figure 1 – figure supplement 1A).

      (2) Throughout the text, many immunoblots are used to validate the knockouts and knockdowns used, but some clarification is needed. In Figure S1A, the Zmat3-WT sample seems to have significantly more p53 than the Zmat3 KO sample. Does Zmat3 KO compromise p53 levels in other experiments? It would be good to understand if Zmat3 affects p53 function by affecting its levels. Also, the p21 blot is overloaded.

      We thank the reviewer for this helpful observation. To determine whether ZMAT3 knockout affects p53 function by affecting its levels, we repeated the experiment three independent times. Western blots from these biological replicates, together with protein quantification, are now included in Appendix-2 and Figure 1-figure supplement 1A. These data show no significant differences in p53 or p21 induction between ZMAT3-WT and ZMAT3-KO cells following Nutlin treatment. In the revised manuscript, we have replaced the blot in Figure 1-figure supplement 1A with a more representative image from one of these replicate experiments.

      In Figure 2E, HKDC1 protein levels are not shown for the SW1222 and HCEC-1CT cell lines, 

      We thank the reviewer for this suggestion. HKDC1 protein levels in SW1222 and HCEC1-CT cells following ZMAT3 knockdown are now included as Figure 2- figure supplement 1F and 1G, together with the corresponding quantification.

      and Zmat3 does not appear as its characteristic two bands on the blot. What does this signify?

      We thank the reviewer for this observation. Endogenous ZMAT3 typically appears as two closely migrating bands on immunoblots. As shown in Figure 4D and Appendix 2A and 2B, these two bands are observed at the expected molecular weight following Nutlin treatment and are specific to ZMAT3, as they are markedly reduced in ZMAT3-KO cells. In contrast, only a single ZMAT3 band is visible in Figure 2E. This likely reflects limited resolution of the two bands in some blots rather than a biological difference.   

      (3) Why does HKDC1 knockdown only have an effect on metabolic phenotypes when ZMAT3 is gone? In Figure 3A, there does not seem to be a decrease in hexokinase activity in the siCTRL + siHKDC1 condition compared to siCTRL alone. Also, in Figure 3A, does phosphorylation activity of HKDC1 necessarily reflect glucose uptake, as stated? Additionally, in Figure 3C, there is no effect on mitochondrial respiration with siHKDC1, even though recent studies have shown a significant effect of HKDC1 on this.

      We thank the reviewer for raising these important questions. As noted, HKDC1 knockdown alone in wild-type cells (siCTRL + siHKDC1) does not significantly reduce hexokinase activity (Figure 3A). This likely reflects the low basal expression of HKDC1 in these cells. Thus, the metabolic phenotype may only become apparent when HKDC1 expression exceeds a functional threshold, as observed in ZMAT3-KO cells where HKDC1 is upregulated.

      Regarding the glucose uptake assay, HKDC1 itself is not phosphorylated; rather, it phosphorylates a non-catabolizable glucose analog, 2-deoxyglucose (2-DG) upon cellular uptake. According to the manufacturer’s protocol, intracellular 2-DG is phosphorylated by hexokinases to 2-deoxyglucose-6-phosphate (2-DG6P), which cannot be further metabolized and therefore accumulates. The accumulated 2-DG6P is quantified using a luminescence-based readout. This assay is widely used as a surrogate for glucose uptake because it reflects both glucose import and phosphorylation — the first step of glycolytic flux. As for the lack of change in mitochondrial respiration (Figure 3C), we acknowledge that some studies have reported mitochondrial roles for HKDC1 under basal conditions; however, such effects may be cell type-specific.

      (4) The emphasis on glycolysis signatures is confusing, as in the end, glycolysis does not seem to be affected by ZMAT3 status, but mitochondrial respiration is affected. Can the text be clarified to address this? It is also difficult to understand the role of oxygen consumption rate (OCR) in ZMAT3 phenotypes, as it does not fully track with proliferation. For example, ZMAT3 KD has the highest OCR, and the other conditions have similar OCRs but different proliferative rates in Figure 3D. Also, the colors used in Figure 3 to denote different genotypes change between B/C and D, which is confusing.

      We thank the reviewer for pointing out the inconsistency in the colors of the graph in Figure 2, which we have now corrected. Our data indicates that ZMAT3 regulates mitochondrial respiration without significantly affecting glycolysis. It is possible that mitochondria in ZMAT3-KO cells are oxidizing more substrates that are not produced by glycolysis. Additional work will be required to fully determine these mechanisms. We have clarified this on page 8 of the revised manuscript.      

      (5) The lack of ZMAT3 binding to RNAs in PAR-CLIP is not proof that it does not do so. A more targeted approach should be used, using individual RIP assays. The authors should also analyze the splicing of HKDC1, which could be affected by ZMAT3.

      As suggested, we performed ZMAT3 RNA IP experiments (RIP) using doxycycline-inducible HCT116-ZMAT3-FLAG cells. However, we did not observe significant enrichment of HKDC1 mRNA in the ZMAT3 IPs (Figure 5 – figure supplement 1A), consistent with previously published ZMAT3 RIP-seq data (Bersani et al, Oncotarget, 2016). These findings further support the notion that ZMAT3 does not directly bind to HKDC1 mRNA in these cells. We Accordingly, we have modified the text on page 10 of the revised manuscript.

      In addition, as suggested by the reviewer, we analyzed changes in splicing of HKDC1 pre-mRNA using rMATS in HCT116 cells by comparing our previously published RNA-seq data from siCTRL and siZMAT3-transfected HCT116 cells (Muys et al, Genes Dev, 2021). We focused on splicing events with an FDR < 0.05 and a delta PSI > |0.1| (representing at least a 10% change in splicing). The splicing analysis (data not shown) did not reveal any significant alterations in HKDC1 pre-mRNA splicing upon ZMAT3 knockdown. Corresponding text has been updated on page 10 of the revised manuscript.

      (6) The authors say that they examine JUN binding at the HKDC1 promoter several times, but they focus on intron 1 in Figure 5. They should revise the text accordingly, and they should also show JUN ChIP data traces for the whole HKDC1 locus in Figure 5C.

      We thank the reviewer for this helpful suggestion. As recommended, we have revised the text throughout the manuscript and replaced HKDC1 promoter with HKDC1 intron 1 DNA to accurately reflect our analysis, and Figure 5 now shows the JUN ChIP-seq signal across the entire HKDC1 locus.

      (7) In the ZMAT3 and JUN interaction assays, were these tested in the presence of DNAse or RNAse to determine if nucleic acids mediate the interaction?

      We thank the reviewer for this valuable suggestion. To test whether nucleic acids mediate the ZMAT3-JUN interaction, we performed ZMAT3 immunoprecipitation (IPs) in the presence or absence of DNase and RNase from doxycycline-inducible ZMAT3-FLAG expressing HCT116 cells. The ZMAT3-JUN interaction was lost upon treatment with either DNase or RNase, indicating that the interaction is mediated by nucleic acids. This data has been added in the revised manuscript (Figure 5-figure supplement 1D and on page 11).

    1. eLife Assessment

      This important study provides the first putative evidence that alteration of the Hox code in neck lateral plate mesoderm is sufficient to induce ectopic development of forelimb buds at neck level. The authors use both gain-of-function (GOF) and loss-of-function (LOF) approaches in chick embryos to test the roles of Hox paralogy group (PG) 4-7 genes in limb development. The GOF data provide strong evidence that overexpression of Hox PG6/7 genes are sufficient to induce forelimb buds at neck level. However, the experiments using dominant negative constructs are lacking some key controls that are needed to demonstrate the specificity of the LOF effect rendering the work as a whole incomplete.

    2. Reviewer #2 (Public review):

      In the original review of this manuscript, I noted that this study provides the first evidence that alteration of the Hox code in neck lateral plate mesoderm is sufficient for ectopic forelimb budding. Their finding that ectopic expression of Hoxa6 or Hoxa7 induces wing budding at neck level, a demonstration of sufficiency, is of major significance. The experiments used to test the necessity of specific Hox genes for limb budding involved overexpression of dominant negative constructs, and there were questions about whether the controls were well designed. The reviewers made several suggestions for additional experiments that would address their concerns. In their responses to those comments, the authors indicated that they would conduct those experiments, and they acknowledged the requests for further discussion of a few points.

      In the revised version of the manuscript, the authors have provided additional RNA-seq data in Table 3, which lists 221 genes that are shared between the Hoxa6-induced limb bud and normal wing bud but not the neck. This shows that the ectopic limb bud has a limb-like character. The authors also expanded the discussion of their results in the context of previous work on the mouse. These changes have improved the paper.

      The authors elected not to conduct the co-transfection experiments that were suggested to test the ability of Hoxa4/a5 to block the limb-inducing ability of Hoxa6/a7. They also chose not to conduct the additional control experiments that were suggested for the dominant negative studies. The authors' justification for not conducting these experiments is provided in the responses to reviewers.

      The paper is improved over the previous version, but the conclusions, particularly regarding the dominant negative experiments, would have been strengthened by the additional experiments that were recommended by the reviewers. Under the current publishing model for eLife, it is the authors' prerogative to decide whether to revise in accordance with the reviewers' suggestions. Therefore, it seems to me that this version of the manuscript is the definitive version that the authors want to publish, and that eLife should publish it together with the reviewers' comments and the authors' responses.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      We thank the reviewer for emphasizing the importance of appropriate controls for the dominant-negative experiments. Dominant-negative Hox constructs have been successfully and widely used in previous studies, supporting the reliability of this approach. In our experiments, electroporation of the dominant-negative constructs into the limb field produced clear and reproducible effects when compared with both unoperated embryos and embryos electroporated with a GFP control construct. The GFP construct serves as an appropriate control, as it accounts for any effects of electroporation or exogenous protein expression without altering Hox gene function. We therefore conclude that the observed phenotypes specifically reflect dominant-negative Hox activity rather than procedural artifacts.

      The absence of overt limb phenotypes in PG4–PG7 mouse mutants likely reflects both functional redundancy among Hox paralogs and the difficulty of detecting subtle limbspecific effects in bilateral, systemically affected embryos. In contrast, the chick embryo system allows unilateral gene manipulation, providing an internal control and greater sensitivity for detecting weak or localized effects that may be masked in whole-animal mouse mutants.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      We thank the reviewer for this thoughtful suggestion. We fully agree that functional redundancy among Hox paralogs is an important consideration. However, Hox gene interactions are highly context-dependent and not strictly additive. Simultaneous interference with multiple Hox groups often leads to complex or compensatory effects that are difficult to interpret mechanistically, particularly when using dominant-negative constructs that may affect overlapping transcriptional networks.

      Our current experimental design, which targets individual paralog groups, allows us to attribute observed phenotypes to specific Hox activities and to interpret the results more precisely. Moreover, as shown in previous studies, simultaneous knockdown of multiple Hox genes does not necessarily produce stronger. For these reasons, we believe that the present single–dominant-negative experiments are the most informative and sufficient for addressing the specific questions in this study.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We thank the reviewer for this insightful suggestion. However, because of the extensive functional redundancy and regulatory interdependence within the Hox network, simultaneous inhibition of Hox4 and Hox5 is unlikely to produce a simple or interpretable outcome. Previous studies have shown that combinatorial Hox manipulations can trigger compensatory changes in other Hox genes, often obscuring rather than clarifying specific relationships.

      In our study, the proposed permissive role of Hox4/5 is supported by the spatial and temporal patterns of Hox expression and by the phenotypic effects observed upon individual dominant-negative perturbations. These data together suggest that Hox4/5 establish a forelimb-competent domain, on which Hox6/7 subsequently act to promote limb outgrowth. We therefore believe that the current evidence sufficiently supports this model without necessitating the additional combined experiment, which may not provide clear mechanistic insight due to redundancy effects.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      We thank the reviewer for this important comment. We agree that Tbx5 expression alone may be not sufficient to define forelimb identity. However, in our experiments, the induced bulge displays several additional characteristics consistent with early limb identity (in pre-AER stage). First, the Tbx5 expression we observe corresponds to the stage when the limb field is already specified, not the earlier broad mesodermal phase described in other systems. Second, the induced domain also expresses Lmx1, a marker of dorsal limb mesenchyme, further supporting its limb-specific nature. Third, our RNA sequencing analysis reveals upregulation of multiple genes associated with early limb development pathways, providing molecular evidence for limb-type identity rather than non-specific mesodermal expansion. Taken together, these results strongly indicate that the induced bulge represents a forelimb-like structure rather than a generic mesodermal thickening.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We thank the reviewer for this helpful suggestion. We have analyzed the cartilage structures of the operated embryos. No skeletal elements were detected within the ectopic wing bud in the neck region. Furthermore, we did not observe any significant structural changes in the wing skeleton following loss-of-function (dnHox) experiments. These observations indicate that the ectopic bulges do not progress to form skeletal elements, consistent with their identity as early limb-like outgrowths rather than fully developed limbs.

      Reviewer #2 (Public review):

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We thank the reviewer for raising this important point regarding the specificity of the dominant-negative constructs. The dnHox constructs used in this study were generated by truncating the C-terminal region of each Hox protein, a strategy that removes the homeodomain and has been demonstrated to act as a specific dominant-negative by interfering with the corresponding Hox function without broadly affecting unrelated Hox genes. This approach has been successfully validated and used in previous work (Moreau et al., Curr. Biol. 2019), where similar constructs effectively and specifically inhibited Hox activity in the chick embryo.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here)

      We thank the reviewer for this insightful suggestion. We agree that, in principle, coelectroporation of dnHox4/5 with Hox6/7 could test the hierarchical relationship between these genes. However, due to the extensive redundancy and regulatory interdependence among Hox genes, simultaneous manipulation of multiple genes often leads to compensatory effects or complex outcomes that are difficult to interpret mechanistically. As discussed in our response to Point 3 of the reviewer 1, inhibition of only one or two Hox4/5 paralogs is unlikely to completely abolish the permissive function of this group.

      Our current data — showing that Hox6/7 gain-of-function can induce ectopic limb-like outgrowths, while dnHox4/5 and dnHox6/7 lead to reduced limb formation — already provide strong evidence for both the necessity and sufficiency of these Hox activities in forelimb positioning. We therefore believe that the existing experiments adequately support our proposed model without the need for additional combinatorial manipulations.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We thank the reviewer for this constructive suggestion. In response, we have added a table (Table 3) listing the genes expressed in both the native limb/wing bud and the Hoxa6-induced wing bud, as identified from our RNA-Seq dataset. This table provides the underlying data for the Venn diagram, heatmap, and GO analysis presented in Figure 3. We agree that including these data improves transparency and helps readers better appreciate the molecular similarity between the induced and native limb buds.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      We thank the reviewer for this important point. We have addressed this issue in our response to Reviewer 1, Point 1, and have expanded the relevant discussion in the manuscript. Briefly, we believe that the apparent discrepancy between chick and mouse results arises from both the high degree of functional redundancy among Hox paralogs and the limitations of detecting subtle limb-specific effects in systemic mouse mutants, where both sides of the embryo are equally affected. In contrast, the chick system allows unilateral gene manipulation, providing an internal control and greatly enhancing sensitivity to detect weak or localized effects. Thus, the chick embryo model can reveal subtle Hox-dependent limb-induction activities that are masked in conventional mouse knockout approaches.

    1. eLife Assessment

      This study reports useful information on the mechanisms by which a high-fat diet induces arrhythmias in the model organism Drosophila. Specifically, the authors propose that adipokinetic hormone (Akh) secretion is increased with this diet, and through binding of Akh to its receptor on cardiac neurons, arrhythmia is induced. The authors have revised their manuscript, but in some areas the evidence remains incomplete, which the authors say future studies will be directed to closing the present gaps. Nonetheless, the data presented will be helpful to those who wish to extend the research to a more complex model system, such as the mouse.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

    3. Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Comments on revisions:

      The authors have addressed my other concerns. The only outstanding issue is in regard to the following comment:

      The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high fat diet compared to normal fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      The authors state that 8 sec time windows were selected at the discretion of the imager for analysis. I don't know how to avoid bias unless the person acquiring the imaging is blinded to the condition and the analysis is also done blind. Can you comment whether data acquisition and analysis was done in a blinded fashion? If not, this should be stated as a limitation of the study.

      Drosophila heart rate is highly variable. During the recording, we were biased to choose a time window when heartbeat was fairly stable. This is a limitation of the study, which we mentioned in the revised version. We chose to use intact over “semi-intact” flies with an intention to avoid damaging the cardiac neurons. 

      Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      We thank the reviewer for the positive comments. We believe that more signaling pathways are active in the AkhR neurons and regulate rhythmic heartbeat. We are current searching for the molecules and pathways that act on the AkhR cardiac neurons to regulate the heartbeat. Thus, AkhR neuron x shall have a more profound effect. Loss of AkhR is not equivalent to AkhR neuron ablation. 

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhRexpressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      We thank the reviewer for suggesting the detailed experiments and we believe that address these points shall consolidate the results. As AkhR-Gal4 also expresses in the fat body, we set out to build a more specific driver. We planned to use split-Gal4 system (Luan et al. 2006. PMID: 17088209). The combination of pan neuronal Elav-Gal4.DBD and AkhRp65.AD shall yield AkhR neuron specific driver. We selected 2580 bp AkhR upstream DNA and cloned into pBPp65ADZpUw plasmid (Addgene plasmid: #26234). After two rounds of injection, however, we were not able to recover a transgenic line.

      We used GCaMP to record the calcium signal in the AkhR neurons. AkhR-Gal4>GCaMP has extremely high levels of fluorescence in the cardiac neurons under normal condition.

      We are screening Gal4 drivers, trying to find one line that is specific to the cardiac neurons and has a lower level of driver activity.   

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UASrpr.

      We quantified the AkhR neuron ablation and found that about 69% (n=28) showed a single ACN in AkhR-Gal4>rpr flies. It is more challenging to quantify other AkhR-expressing cells, as they are wide-spread distributed. We tried to add more copies of UAS-rpr or AkhR-Gal4, which caused developmental defects (pupa lethality). Thus, as mentioned above, we are trying to find a more specific driver for targeting the cardiac neurons.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors refer 'crop' as the functional equivalent of the human stomach. Considering the difference in their primary functions, this cannot be justified.

      In Drosophila, the crop functions analogously to the stomach in vertebrates. It is a foregut storage and preliminary processing organ that regulates food passage into the midgut. It’s more than a simple reservoir. Crop engages in enzymatic mixing, neural control, and active motility.

      Line 163 and 166, APCs are not neurons.

      Akh-producing cells (APCs) in Drosophila are neuroendocrine cells, residing in the corpora cardiaca (CC). While they produce and secrete the hormone AKH (akin to glucagon), they are not brain interneurons per se. APCs share many neuronal features (vesicular release, axon-like projections) and receive neural inputs, effectively functioning as a peripheral endocrine center.

    1. eLife Assessment

      This fundamental study is part of an impressive, large-scale effort to assess the reproducibility of published findings in the field of Drosophila immunity. In a companion article, the authors analyze 400 papers published between 1959 and 2011, and assess how many of the claims in these papers have been tested in subsequent publications. In this article, the authors report the results of validation experiments to assess a subset of the claims that, according to the literature, have not been corroborated. While the evidence reported for some of these validation studies is convincing, it remains incomplete for others.

    2. Reviewer #1 (Public review):

      Summary:

      This work revisits a substantial part of the published literature in the field of Drosophila innate immunity from 1959 to 2011. The strategy has been to restrain the analysis to some 400 articles and then to extract a main claim, two to four major claims and up to four minor claims totaling some 2000 claims overall. The consistency of these claims with the current state-of-the-art has been evaluated and reported on a dedicated Web site known as ReproSci and also in the text as well as in the 28 Supplements that report experimental verification, direct or indirect, e.g., using novel null mutants unavailable at the time, of a selected set of claims made in several articles. Of note, this review is mostly limited to the manuscript and its associated supplements and does not integrally cover the ReproSci website.

      Strengths:

      One major strength of this article is that it tackles the issue of reproducibility/consistency on a large scale. Indeed, while many investigators have some serious doubts about some results found in the literature, few have the courage, or the means and time, to seriously challenge studies, especially if published by leaders in the field. The Discussion adequately states the major limitations of the ReproSci approach, which should be kept in mind by the reader to form their own opinion.

      This study also allows investigators not familiar with the field to have a clearer understanding of the questions at stake and to derive a more coherent global picture that allows them to better frame their own scientific questions. Besides a thorough and up-to-date knowledge of the literature used to assess the consistency of the claims with our current knowledge, a merit of this study is the undertaking of independent experiments to address some puzzling findings and the evidence presented is often convincing, albeit one should keep in mind the inherent limitations as several parameters are difficult to control, especially in the field of infections, as underlined by the authors themselves. Importantly, some work of the lead author has also been re-evaluated (Supplements S2-S4). Thus, while utmost caution should be exerted, and often is, in challenging claims, even if the challenge eventually proves to be not grounded, it is valuable to point out potential controversial issues to the scientific community.

      While this is not a point of this review, it should be acknowledged that the possibility to post comments on the ReproSci website will allow further readjustments by the community in the appreciation of the literature and also of the ReproSci assessments themselves and of its complementary additional experiments.

      Weaknesses:

      Challenging the results from articles is, by its very nature, a highly sensitive issue, and utmost care should be taken when challenging claims. While the authors generally acknowledge the limitations of their approach in the main text and Supplements, there are a few instances where their challenges remain questionable and should be reassessed. This is certainly the case for Supplement S18, for which the ReproSci authors make a claim for a point that was not made in the publication under scrutiny. The authors of that study (Ramet et al., Immunity, 2001) never claimed that scavenger receptor SR-CI is a phagocytosis receptor, but that it is required for optimal binding of S2 cells to bacteria. Westlake et al. here have tested for a role of this scavenger receptor in phagocytosis, which had not been tested by Ramet et al. Thus, even though the ReproSci study brings additional knowledge to our understanding of the function of SR-CI by directly testing its involvement in phagocytosis by larval hemocytes, it did not address the major point of the Ramet et al. study, SR-CI binding to bacteria, and thus inappropriately concludes in Supplement S18 that "Contrary to (Ramet et al., 2001, Saleh et al., 2006), we find that SR-CI is unlikely to be a major Drosophila phagocytic receptor for bacteria in vivo." It follows that the results of Ramet et al. cannot be challenged by ReproSci as it did not address this program. Of note, Saleh et al. (2006) also mistakenly stated that SR-CI impaired phagocytosis in S2 cells and could be used as a positive control to monitor phagocytosis in S2 cells. Their assay appears to have actually not monitored phagocytosis but the association of FITC-labeled bacteria to S2 cells by FACS, as they did not mention quenching the fluorescence of bacteria associated with the surface with Trypan blue.

      The inference method to assess the consistency of results with current knowledge also has limitations that should be better acknowledged. At times, the argument is made that the gene under scrutiny may not be expressed at the right time according to large-scale data or that the gene product was not detected in the hemolymph by a mass-spectrometry approach. While being in theory strong arguments, some genes, for instance, those encoding proteases at the apex of proteolytic activation cascades, need not necessarily be strongly expressed and might be released by a few cells. In addition, we are often lacking relevant information on the expression of genes of interest upon specific immune challenges such as infections with such and such pathogens.

      As regards mass spectrometry, there is always the issue of sensitivity that limits the force of the argument. Our understanding of melanization remains currently limited, and methods are lacking to accurately measure the killing activity associated with the triggering of the proPO activation cascade. In this study, the authors monitor only the blackening reaction of the wound site based on a semi-quantitative measurement. They are not attempting to use other assays, such as monitoring the cleavage of proPOs into active POs or measuring PO enzymatic activity. These techniques are sometimes difficult to implement, and they suffer at times from variability. Thus, caution should be exerted when drawing conclusions from just monitoring the melanization of wounds.

      Likewise, the study of phagocytosis is limited by several factors. As most studies in the field focus on adults, the potential role of phagocytosis in controlling Gram-negative bacterial infections is often masked by the efficiency of the strong IMD-mediated systemic immune response mediated by AMPs (Hanson et al, eLife, 2019). This problem can be bypassed in rare instances of intestinal infections by Gram-negative bacteria such as Serratia marcescens (Nehme et al., PLoS Pathogens, 2007) or Pseudomonas aeruginosa (Limmer et al. PNAS, 2011), which escape from the digestive tract into the hemocoel without triggering, at least initially, the systemic immune response. It is technically feasible to monitor bacterial uptake in adults by injecting fluorescently labeled bacteria and subsequently quenching the signal from non-ingested bacteria. Nonetheless, many investigators prefer to resort to ex vivo assays starting from hemocytes collected from third-instar wandering larvae as they are easier to collect and then to analyze, e.g., by FACS. However, it should be pointed out that these hemocytes have been strongly exposed to a peak of ecdysone, which may alter their properties. Like for S2 cells, it is thus not clear whether third-instar larval hemocytes faithfully reproduce the situation in adults. The phagocytic assays are often performed with killed bacteria. Evidence with live microorganisms is better, especially with pathogens. Assays with live bacteria require however, an antibody used in a differential permeabilization protocol. Furthermore, the killing method alters the surface of the microorganisms, a key property for phagocytic uptake. Bacterial surface changes are minimal when microorganisms are killed by X-ray or UV light. These limitations should be kept in mind when proceeding to inference analysis of the consistency of claims. Eater illustrates this point well. Westlake et al. state that:" [...] subsequent studies showed that a null mutation of eater does not impact phagocytosis". The authors refer here to Bretscher et al., Biology Open, 2015, in which binding to heat-killed E. coli was assessed in an ex vivo assay in third instar larvae. In contrast, Chung and Kocks (JBC, 2011) tested whether the recombinant extracellular N-terminal ligand-binding domain was able to bind to bacteria. They found that this domain binds to live Gram-positive bacteria but not to live Gram-negative bacteria. For the latter, killing bacteria with ethanol or heating, but not by formaldehyde treatment, allowed binding. More importantly, Chung and Kocks documented a complex picture in which AMPs may be needed to permeabilize the Gram-negative bacterial cell wall that would then allow access of at least the recombinant secreted Eater extracellular domain to peptidoglycan or peptidoglycan-associated molecules. Thus, the systemic Imd-dependent immune response would be required in vivo to allow Eater-dependent uptake of Gram-negative bacteria by adult hemocytes. In ex vivo assays, any AMPs may be diluted too much to effectively attack the bacterial membrane. A prediction is then that there should be an altered phagocytosis of Gram-negative bacteria in IMD-pathway mutants, e.g., an imd null mutant but not the hypomorphic imd[1] allele. This could easily be tested by ReproSci using the adult phagocytosis assay used by Kocks et al, Cell, 2005. At the very least, the part on the role of Eater in phagocytosis should take the Chung &Kocks study into account, and the conclusions modulated.

      Another point is that some mutant phenotypes may be highly sensitive to the genetic background, for instance, even after isogenization in two different backgrounds. In the framework of a Reproducibility project, there might be no other option for such cases than direct reproduction of the experiment as relying solely on inference may not be reliable enough.

      With respect to the experimental part, some minor weaknesses have been noted. The authors rely on survival to infection experiments, but often do not show any control experiments with mock-challenged or noninfected mutant fly lines. In some cases, monitoring the microbial burden would have strengthened the evidence. For long survival experiments, a check on the health status of the lines (viral microbiota, Wolbachia) would have been welcome. Also, the experimental validation of reagents, RNAi lines, or KO lines is not documented in all cases.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an ambitious and large-scale reproducibility analysis of 400 articles on Drosophila immunity published before 2011. They extract major and minor claims from each article, assess their verifiability through literature comparison and, when possible, through targeted experimental re-testing, and synthesize their findings in an openly accessible online database. The goal is to provide clarity to the community regarding claims that have been contradicted, incompletely supported, or insufficiently followed up in the literature, and to foster broader community participation in evaluating historical findings. The manuscript summarizes the major insights emerging from this systematic effort.

      Strengths:

      (1) Novelty and community value: This work represents a rare example of a systematic, transparent, and community-facing reproducibility project in a specific research domain. The creation of a dedicated public platform for disseminating and discussing these assessments is particularly innovative.

      (2) Breadth and depth: The authors analyze an impressive number of publications spanning multiple decades, and they couple literature-based assessments with new experimental data where follow-up is missing.

      (3) Clarity of purpose: The manuscript carefully distinguishes between assessing evidential support for claims and judging the scientific merit of historical work. This helps frame the project as constructive rather than punitive.

      (4) Metascientific relevance: The analysis identifies methodological and contextual factors that commonly underlie irreproducible claims, providing a useful guide for future study design and interpretation.

      (5) Transparency: Supplementary datasets and the public website provide an exceptional degree of openness, which should facilitate community engagement and further refinement.

      Weaknesses:

      (1) Subjectivity in selection: Despite the authors' efforts, the choice of which papers and claims to highlight cannot be entirely objective. This is an inherent limitation of any retrospective curation effort, but it remains important to acknowledge explicitly.

      (2) Emphasis on irreproducible claims: The manuscript focuses primarily on claims that are challenged or found to be weakly supported. While understandable from the perspective of novelty, this emphasis may risk overshadowing the value of claims that are well supported and reproducible.

      (3) Framing and language: Certain passages could benefit from more neutral phrasing and avoidance of binary terms such as "correct" or "incorrect," in keeping with the open-ended and iterative nature of scientific progress.

      (4) Community interaction with the dataset: While the website is an excellent resource, the manuscript could further clarify how the community is expected to contribute, challenge, or refine the annotations, especially given the large volume of supplementary data.

      (5) Minor inconsistency: The manuscript states that papers from 1959-2011 were included, but the Methods section mentions a range beginning in 1940. This should be aligned for clarity.

      Impact and significance:

      This contribution is likely to have a meaningful impact on both the Drosophila immunity community and the broader scientific ecosystem. It highlights methodological pitfalls, encourages transparent post-publication evaluation, and offers a reusable framework that other fields could adopt. The work also has pedagogical value for early-career researchers entering the field, who often struggle to navigate contradictory or outdated claims. By centralizing and contextualizing these discussions, the manuscript should help accelerate more robust and reproducible research.

    4. Reviewer #3 (Public review):

      Summary:

      In this ambitious study, the authors set out to analyse the validity of a number of claims, both minor and major, from 400 published articles within the field of Drosophila immunity that were published before 2011. The authors were able to determine initially if claims were supported by comparing them to other published literature in the field and, if required, by experimentally testing 'unchallenged' claims that had not been followed up in subsequent published literature. Using this approach, the authors identified a number of claims that had contradictory evidence using new methods or taking into account developments within the field post-initial publication. They put their findings on a publicly available website designed to enable the research community to assess published work within the field with greater clarity.

      Strengths:

      The work presented is rigorous and methodical, the data presentation is high quality, and importantly, the data presented support the conclusions. The discussion is balanced, and the study is written considerately and respectfully, highlighting that the aim of the study is not to assign merit to individual scientists or publications but rather to improve clarity for scientists across the field. The approach carried out by the researchers focuses on testing the validity of the claims made in the original papers rather than testing whether the original experimental methods produced reproducible results. This is an important point since there are many reasons why the original interpretation of data may have understandably led to the claims made. These potential explanations for irreproducible data or conclusions are discussed in detail by the authors for each claim investigated.

      The authors have generated an accompanying website, which provides a valuable tool for the Drosophila Immunity research community that can be used to fact-check key claims and encourages community engagement. This will achieve one important goal of this study - to prevent time loss for scientists who base their research on claims that are irreproducible. The authors rightly point out that it is impossible (and indeed undesirable) to avoid publication of irreproducible results within a field since science is 'an exploratory process where progress is made by constant course correction'. This study is, however, an important piece of work that will make that course correction more efficient.

      Weaknesses:

      I have little to recommend for the improvement of this manuscript. As outlined in my comments above, I am very supportive of this manuscript and think it is a bold and ambitious body of work that is important for the Drosophila immunity field and beyond.

    5. Reviewer #4 (Public review):

      This is an important paper that can do much to set an example for thoughtful and rigorous evaluation of a discipline-wide body of literature. The compiled website of publications in Drosophila immunity is by itself a valuable contribution to the field. There is much to praise in this work, especially including the extensive and careful evaluation of the published literature. However, there are also cautions.

      One notable concern is that the validation experiments are generally done at low sample sizes and low replication rates, and often lack statistical analysis. This is slippery ground for declaring a published study to be untrue. Since the conclusions reported here are nearly all negative, it is essential that the experiments be performed with adequate power to detect the originally described effects. At a minimum, they should be performed with the same sample size and replication structure as the originally reported studies.

      The first section of Results should be an overview of the general accuracy of the literature. Of all claims made in the 400 evaluated papers, what proportion fell into each category of "verified", "unchallenged", "challenged", "mixed", or "partially verified"? This summary overview would provide a valuable assessment of the field as a whole. A detailed dispute of individual highlighted claims could follow the summary overview.

      Section headings are phrased as declarative statements, "Gene X is not involved in process Y", which is more definitive phrasing than we typically use in scientific research. It implies proving a negative, which is difficult and rare, and the evidence provided in the present manuscript generally does not reach that threshold. A more common phrasing would be "We find no evidence that gene X contributes to process Y". A good model for this more qualified phrasing is the "We conclude that while Caspar might affect the Imd pathway in certain tissue-specific contexts, it is unlikely to act as a generic negative regulator of the Imd pathway," concluding the section on the role of Caspar. I am sure the authors feel that the softer, more qualified phrasing would undermine their article's goal of cleansing the literature of inaccuracies, but the hard declarative 'never' statements are difficult to justify unless every validation experiment is done with a high degree of rigor under a variety of experimental conditions. This caveat is acknowledged in the 3rd paragraph of the Discussion, but it is not reflected in the writing of the Results. The caveat should also appear in the Introduction.

      The article is clear that "Claims were assessed as verified, unchallenged, challenged, mixed, or partially verified," but the project is called "reproducibility project" in the 7th line of the abstract, and the website is "ReproSci". The fourth line of the abstract and the introduction call some published research "irreproducible". Most of the present manuscript does not describe reproduction or replication. It describes validation, or independent experimental tests for consistency. Published work is considered validated if subsequent studies using distinct approaches yielded consistent results. For work that the authors consider suspicious, or that has not been subsequently tested, the new experiments provided here do not necessarily recreate the published experiment. Instead, the published result is evaluated with experiments that use different tools or methods, again testing for consistency of results. This is an important form of validation, but it is not reproduction, and it should not be referred to as such. I strongly suggest that variations of the words "reproducible" or "replication" be removed from the manuscript and replaced with "validation". This will be more scientifically accurate and will have the additional benefit of reducing the emotional charge that can be associated with declaring published research to be irreproducible.

      The manuscript includes an explanatory passage in the Results section, "Our project focuses on assessing the strength of the claims themselves (inferential/indirect reproducibility) rather than testing whether the original methods produce repeatable results (results/direct reproducibility). Thus, our conclusions do not directly challenge the initial results leading to a claim, but rather the general applicability of the claim itself." Rather than first appearing in Results, this statement should appear prominently in the abstract and introduction because it is a core element of the premise of the study. This can be combined with the content of the present Disclaimer section into a single paragraph in the Introduction instead of appearing in two redundant passages. I would again encourage the authors to substitute the word validation for reproduction, which would eliminate the need for the invented distinction between indirect versus direct reproduction. It is notable that the authors have chosen to title the relevant Methods section "Experimental Validation" and not "Replication".

      Experimental data "from various laboratories" in the last paragraph of the Introduction and the first paragraph of the Results are ambiguous. Since these new experiments are part of the central core of the manuscript, the specific laboratories contributing them should be named in the two paragraphs. If experiments are being contributed by all authors on the manuscript, it would suffice to say "the authors' laboratories". The attribution to "various labs" appears to be contradicted by the Discussion paragraph 2, which states "the host laboratory has expertise in" antibacterial and antifungal defense, implying a single lab. The claim of expertise by the lead author's laboratory is unnecessary and can be deleted if the Lemaitre lab is the ultimate source of all validation experiments.

      The passage on the controversial role of Duox in the gut is balanced and scholarly, and stands out for its discussion of multiple alternative lines of evidence in the published literature and supplement. This passage may benefit from research by multiple groups following up on the original claims that are not available for other claims, but the tone of the Duox section can be a model for the other sections.

      Comments on other sections and supplements:

      I understand the desire to explain how original results may have been obtained when they are not substantiated by subsequent experiments. However, statements such as "The initial results may have been obtained due to residual impurities in preparations of recombinant GNBP1" and "Non-replicable results on the roles of Spirit, Sphinx and Spheroide in Toll pathway activation may be due to off-target effects common to first-generation RNAi tools" are speculation. No experimental data are presented to support these assertions, so these statements and others like them (currently at the end of most "insights" sections) should not appear in Results. I recognize that the authors are trying to soften their criticism of prior studies by providing explanations for how errors may have occurred innocently. If they wish to do so, the speculative hypotheses should appear in the Discussion.

      The statement in Results that "The initial claim concerning wntD may be explained by a genetic background effect independent of wntD" similarly appears to be a speculation based on the reading of the main text Results. However, the Discussion clarifies that "Here, we obtained the same results as the authors of the claim when using the same mutant lines, but the result does not stand when using an independent mutant of the same gene, indicating the result was likely due to genetic background." That additional explanation in the Discussion greatly increases reader confidence in the Result and should be explained with reference to S5 in the Results. Such complete explanations should be provided everywhere possible without requiring the reader to check the Supplement in each instance.

      In some cases, such as "The results of the initial papers are likely due to the use of ubiquitous overexpression of PGRP-LE, resulting in melanization due to overactivation of the Imd pathway and resulting tissue damage", the claim to explain the original finding would be easy to test. The authors should perform those tests where they can, if they wish to retain the statements in the manuscript. Similarly, the claim "The published data are most consistent with a scenario in which RNAi generated off-target knockdown of a protein related to retinophilin/undertaker, while Undertaker itself is unlikely to have a role in phagocytosis" would be stronger if the authors searched the Drosophila genome for a plausible homolog that might have been impacted by the RNAi construct, and then put forth an argument as to why the off-target gene is more likely to have generated the original phenotype than the nominally targeted gene. There is a brief mention in S19 that junctophilin is the authors' preferred off-target candidate, but no evidence or rationale is presented to support that assertion. If the original RNAi line is still available, it would be easy enough to test whether junctophilin is knocked down as an off-target, and ideally then to use an independent knockdown of junctophilin to recapitulate the original phenotype. Otherwise, the off-target knockdown hypothesis is idle speculation.

      A good model is the passage on extracellular DNA, which states, "experiments performed for ReproSci using the original DNAse IIlo hypomorph show that elevated Diptericin expression in the hypomorph is eliminated by outcrossing of chromosome II, and does not occur in an independent DNAse II null mutant, indicating that this effect is due to genetic background (Supplementary S11)." In this case, the authors have performed a clear experiment that explains the original finding, and inclusion of that explanation is warranted. Similar background replacement experiments in other validations are equally compelling.

      The statement "Analysis of several fly stocks expected to carry the PGRP-SDdS3 mutation used in the initial study revealed the presence of a wild-type copy PGRP-SD, suggesting that either the stock used in this study did not carry the expected mutation, or that the mutation was lost by contamination prior to sharing the stock with other labs" provides a documentable explanation of a potential error in the original two manuscripts, but the subsequent "analysis of several fly stocks" needs citations to published literature or explanation in the supplement. It is unclear from this passage how the wildtype allele in the purportedly mutant stocks could have led to the misattribution of function to PGRP-SD, so that should be explained more clearly in the manuscript.

      The originally claimed anorexia of the Gr28b mutation is explained as having been "likely obtained due to comparison to a wild-type line with unusually high feeding rates". This claim would be stronger if the wildtype line in question were named and data showing a high rate of feeding were presented in the supplement or cited from published literature. Otherwise, this appears to be speculation.

      In the section "The Toll immune pathway is not negatively regulated by wntD", FlyAtlas is cited as evidence that wntD is not expressed in adult flies. However, the FlyAtlas data is not adequately sensitive to make this claim conclusively. If the present authors wish to state that wntD is not expressed in adults, they should do a thorough test themselves and report it in the Supplement.

      Alternatively, the statement "data from FlyAtlas show that wntD is only expressed at the embryonic stage and not at the adult stage at which the experiments were performed by (Gordon et al., 2005a)" could be rephrased to something like "data from FlyAtlas show strong expression of wntD in the embryo but not the adult" and it should be followed by a direct statement that adult expression was also found to be near-undetectable by qPCR in supplement S5. That data is currently "not shown" in the supplement, but it should be shown because this is a central result that is being used to refute the original claim. This manuscript passage should also describe the expression data described in Gordon et al. (2005), for contrast, which was an experimental demonstration of expression in the embryo and a claim "RT-PCR was used to confirm expression of endogenous wntD RNA in adults (data not shown)."

      Inclusion of the section on croquemort is curious because it seems to be focused exclusively on clearance of apoptotic cells in the embryo, not on anything related to immunity. The subsection is titled "Croquemort is not a phagocytic engulfment receptor for apoptotic cells or bacteria", but the text passage contains no mention of phagocytosis of bacteria, and phagocytosis of bacteria is not tested in the S17 supplement. I would suggest deleting this passage entirely if there is not going to be any discussion of the immune-related phenotypes.

      The claim "Toll is not activated by overexpression of GNBP3 or Grass: Experiments performed for ReproSci find that contrary to previous reports, overexpression of GNBP3 (Gottar et al., 2006) or<br /> Grass (El Chamy et al., 2008) in the absence of immune challenge does not effectively activate Toll signaling (Supplementaries S6, S7)" is overly strongly stated unless the authors can directly repeat the original published studies with identical experimental conditions. In the absence of that, the claim in the present manuscript needs to be softened to "we find no evidence that..." or something similar. The definitive claim "does not" presumes that the current experiments are more accurate or correct than the published ones, but no explanation is provided as to why that should be the case. In the absence of a clear and compelling argument as to why the current experiment is more accurate, it appears that there is one study (the original) that obtained a certain result and a second study (the present one) that did not. This can be reported as an inconsistency, but the second experiment does not prove that the first was an error. The same comment applies to the refutation of the roles for Edin and IRC. Even though the current experiments are done in the context of a broader validation study, this does not automatically make them more correct. The present work should adhere to the same standards of reporting that we expect in any other piece of science.

      The statement "Furthermore, evidence from multiple papers suggests that this result, and other instances where mutations have been found to specifically eliminate Defensin expression, is likely due to segregating polymorphisms within Defensin that disrupt primer binding in some genetic backgrounds and lead to a false negative result (Supplementary S20)" should include citations to the multiple papers being referenced. This passage would benefit from a brief summary of the logic presented in S20 regarding the various means of quantifying Defensin expression.

      In S22 Results, the statement "For general characterization of the IrcMB11278 mutant, including developmental and motor defects and survival to septic injury, see additional information on the ReproSci website" is not acceptable. All necessary information associated with the paper needs to be included in the Supplement. There cannot be supporting data relegated to an independent website with no guaranteed stability or version control. The same comment applies to "Our results show that eiger flies do not have reduced feeding compared to appropriate controls (See ReproSci website)" in S25.

      Supplement S21 appears to show a difference between the wildtype and hemese mutants in parasitoid encapsulation, which would support the original finding. However, the validation experiment is performed at a small sample size and is not replicated, so there can be no statistical analysis. There is no reported quantification of lamellocytes or total hemocytes. The validation experiment does not support the conclusion that the original study should be refuted. The S21 evaluation of hemese must either be performed rigorously or removed from the Supplement and the main text.

      In S22, the second sentence of the passage "Due to the fact that IrcMB11278 flies always survived at least 24h prior to death after becoming stuck to the substrate by their wings, we do not attribute the increased mortality in Ecc15-fed IrcMB11278 flies primarily to pathogen ingestion, but rather to locomotor defects. The difference in survival between sucrose-fed and Ecc15-fed IrcMB11278 flies may be explained by the increased viscosity of the Ecc15-containing substrate compared to the sucrose-containing substrate" is quite strange. The first sentence is plausible and a reasonable interpretation of the observations. But to then conclude that the difference between the bacterial treatment versus the control is more plausibly due to substrate viscosity than direct action of the bacteria on the fly is surprising. If the authors wish to put forward that interpretation, they need to test substrate viscosity and demonstrate that fly mortality correlates with viscosity. Otherwise, they must conclude that the validation experiment is consistent with the original study.

      In S27, the visualization of eiger expression using a GFP reporter is very non-standard as a quantitative assay. The correct assay is qPCR, as is performed in other validation experiments, and which can easily be done on dissected fat body for a tissue-specific analysis. S27 Figure 1 should be replaced with a proper experiment and quantitative analysis. In S27 Figure 2, the authors should add a panel showing that eiger is successfully knocked down with each driver>construct combination. This is important because the data being reported show no effect of knockdown; it is therefore imperative to show that the knockdown is actually occurring. The same comment applies everywhere there is an RNAi to demonstrate a lack of effect.

      The Drosomycin expression data in S3 Figure 2A look extremely noisy and are presented without error bars or statistical analysis. The S4 claim that sphinx and spheroid are not regulators of the Toll pathway because quantitative expression levels of these genes do not correlate with Toll target expression levels is an extremely weak inference. The RNAi did not work in S4, so no conclusion should be inferred from those experiments. Although the original claims in dispute may be errors in both cases, the validation data used to refute the original claims must be rigorous and of an acceptable scientific standard.

      In S6 Figure 1, it is inappropriate to plot n=2 data points as a histogram with mean and standard errors. If there are fewer than four independent points, all points should be plotted as a dot plot. This comment applies to many qPCR figures throughout the supplement. In S7 Figure 1, "one representative experiment" out of two performed is shown. This strongly suggests that the two replicates are noisy, and a cynical reader might suspect that the authors are trying to hide the variance. This also applies to S5 Fig 3. Particularly in the context of a validation study, it is imperative to present all data clearly and objectively, especially when these are the specific data that are being used to refute the claim.

      Other comments:

      In S26, the authors suggest that much of the observed melanization arises from excessive tissue damage associated with abdominal injection contrasted to the lesser damage associated with thoracic injection. I believe there may be a methodological difference here. The Methods of S27 are not entirely clear, but it appears that the validation experiment was done with a pinprick, whereas the original Mabary and Schneider study was done with injection via a pulled capillary. My lab group (and I personally) have extensive experience with both techniques. In our hands, pinpricks to the abdomen do indeed cause substantial injury, and the physically less pliable thorax is more robust to pinpricks. However, capillary injections to the abdomen do virtually no tissue damage - very probably less than thoracic injections - and result in substantially higher survivals of infection even than thoracic injections. Thus, the present manuscript may infer substantial tissue damage in the original study because they are employing a different technique.

    1. eLife Assessment

      This important study builds on previous work from the same authors to present a conceptually distinct workflow for cryo-EM reconstruction that uses 2D template matching to enable high-resolution structure determination of small (sub-50 kDa) protein targets. The paper describes how density for small-molecule ligands bound to such targets can be reconstructed without these ligands being present in the template. However, the evidence described for the claim that this technique "significantly" improves the alignment of the reconstruction of small complexes is incomplete. The authors could better evaluate the effects of model bias on the reconstructed densities.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes an application of the high-resolution cryo-EM 2D template matching technique to sub-50kDa complexes. The paper describes how density for ligands can be reconstructed without having to process cryo-EM data through the conventional single particle analysis pipelines.

      Strengths:

      This paper contributes additional data (alongside other papers by the same authors) to convey the message that high-resolution 2D template matching is a powerful alternative for cryo-EM structure determination. The described application to ligand density reconstruction, without the need for extensive refinements, will be of interest to the pharmaceutical industry, where often multiple structures of the same protein in complex with different ligands are solved as part of their drug development pipelines. Improved insights into which particles contribute to the best ligand density are also highly valuable and transferable to other applications of the same technique.

      Weaknesses:

      Although the convenient visualisation of small molecules bound to protein targets of a known structure would be relevant for the pharmaceutical industry, the evidence described for the claim that this technique "significantly" improves alignment of reconstruction of small complexes is incomplete. The authors are encouraged to better evaluate the effects of model bias on the reconstructed densities in a revised paper.

    3. Reviewer #2 (Public review):

      In this manuscript, Zhang et al describe a method for cryo-EM reconstruction of small (sub-50kDa) complexes using 2D template matching. This presents an alternative, complementary path for high-resolution structure determination when there is a prior atomic model for alignment. Importantly, regions of the atomic model can be deleted to avoid bias in reconstructing the structure of these regions, serving as an important mechanism of validation.

      The manuscript focuses its analysis on a recently published dataset of the 40kDa kinase complex deposited to EMPIAR. The original processing workflow produced a medium resolution structure of the kinase (GSFSC ~4.3A, though features of the map indicate ~6-7A resolution); at this resolution, the binding pocket and ligand were not resolved in the original published map. With 2DTM, the authors produce a much higher resolution structure, showing clear density for the ATP binding pocket and the bound ATP molecule. With careful curation of the particle images using statistically derived 2DTM p-values, a high-resolution 2DTM structure was reconstructed from just 8k particles (2.6A non-gold standard FSC; ligand Q-score of 0.6), in contrast to the 74k particles from the original publication. This aligns with recent trends that fewer, higher-quality particles can produce a higher-quality structure. The authors perform a detailed analysis of some of the design choices of the method (e.g., p-value cutoff for particle filtering; how large a region of the template to delete).

      Overall, the workflow is a conceptually elegant alternative to the traditional bottom-up reconstruction pipeline. The authors demonstrate that the p-values from 2DTM correlations provide a principled way to filter/curate which particle images to extract, and the results are impressive. There are only a few minor recommendations that I could make for improvement.

    4. Reviewer #3 (Public review):

      Summary:

      Due to the low SNR of cryo-EM micrographs necessitated by radiation damage, determining the structure of proteins smaller than 50 kDa is exceedingly challenging, such that only a handful have been solved to date. This work aims to improve the reconstruction of small proteins in single-particle cryo-EM by using high-resolution 2D template matching, an algorithm previously used to locate and align macromolecules in situ, to align and reconstruct small proteins. This approach uses an existing macromolecular structure, either experimentally determined or predicted by AlphaFold, to simulate a noise-free 3D reference and generates whitened projections, crucially including high-spatial-frequency information, to align particles by the orientation with maximal cross-correlation. They demonstrate the success of this approach by generating a 3D reconstruction from an existing dataset of a 41.3 kDa protein kinase that had previously evaded attempts at high-resolution structure determination. To alleviate concerns that this is purely from template bias, they demonstrate clear density at two regions that were not present in the template: 6 residues in an alpha helix and an ATP in the ligand binding pocket. The latter is particularly important for its implications in determining structures of ligand-bound proteins for drug discovery. Additionally, the authors provide an update to the classic calculation in Henderson 1995 to predict the minimum molecular mass of a protein that can be solved by single-particle cryo-EM.

      Strengths:

      I am in no doubt that this technique can be used to gain valuable insights into the structures of small proteins, and this is an important advancement for the field. The ability to determine the structure of ligands in a binding site is particularly important, and this paper provides a method of doing that which outperforms traditional single-particle cryo-EM processing workflows.

      The claim that using high-spatial frequency information is essential for aligning small proteins is a valuable insight. A recent pre-print published at a similar time to this manuscript used high-resolution information in standard ab-initio reconstruction to generate a high-resolution reconstruction from the same dataset, supporting the claims made in the manuscript.

      The theoretical section outlined in the appendix is also theoretically sound. It uses the same logic as Henderson, but applies more up-to-date knowledge, such as incorporating dose-weighting and altering the cross-correlation-based noise estimation. This update is valuable for understanding factors preventing us from reaching the theoretical limit.

      Weaknesses:

      Given that this technique creates template bias, only parts of the reconstruction not in the template can be trusted, unlike standard single-particle processing, where the independent half-maps from separate, ab initio templates are used to generate a 3D reconstruction. Although, in principle, one could perform the search many times such that every residue has been omitted in at least one search, this will be extremely computationally intensive and was not demonstrated in this manuscript. It is therefore currently only realistically applicable when only a small portion of the sub-50 kDa protein is of interest.

      The applicability of this technique to more than a single target was also not demonstrated, and there are concerns that it may not work effectively in many cases. The authors note in the results that "the ATP density was consistently recovered more robustly than nearby residues" and speculate that this may be because misalignments disproportionately blur peripheral residues. Since the region of interest in a structure is not necessarily in the center, this may need further investigation. The implications of this statement may also be unclear to the reader. For example, can this issue be minimized by having the region of interest centered in the simulated volume?

      In Figure 3, the authors demonstrate that it is not solely improved particle filtering and a noise-free reference that improves alignment, but that the high spatial frequency information is important. This information is very valuable since it can be applied to other, more standard methods. However, this key figure is not as clear or convincing as it could be. The FSC curves are possibly misleading, since the reduced resolution could be explained by reduced template bias when auto-refining with a map initially low-pass filtered to 10 Å. Moreover, although the helix reconstruction does look slightly better using the 2DTM angles, the improvement in density for ATP in the binding pocket is not clear. A qualitative argument only clear in one out of two cases is not as convincing as a quantitative metric across more examples.

    1. eLife Assessment

      This work identifies a novel, conserved link between glycolysis and sulfur metabolism that governs fungal morphogenesis and virulence. The compelling evidence, integrating multiple approaches, provides an important conceptual advance. A future mechanistic dissection of how sulfur metabolites interface with known pathways is encouraged.

    2. Reviewer #1 (Public review):

      Summary:

      Fungal survival and pathogenicity rely on the ability to undergo reversible morphological transitions, which is often linked to nutrient availability. In this study, the authors uncover a conserved connection between glycolytic activity and sulfur amino acid biosynthesis that drives morphogenesis in two fungal model systems. By disentangling this process from canonical cAMP signaling, the authors identify a new metabolic axis that integrates central carbon metabolism with developmental plasticity and virulence.

      Strengths:

      The study integrates different experimental approaches, including genetic, biochemical, transcriptomic and morphological analyses and convincingly demonstrates that perturbations in glycolysis alters sulfur metabolic pathways and thus impacts pseudohyphal and hyphal differentiation. Overall, this work offers new and important insights into how metabolic fluxes are intertwined with fungal developmental programs and therefore opens new perspectives to investigate morphological transitioning in fungi.

      Importantly, in the revised version the authors now substantiate the transcriptomic findings by RT-qPCR analyses in the pfk1ΔΔ and adh1ΔΔ strains, demonstrating that genetic disruption of glycolytic flux generally mirrors the effects of 2-deoxyglucose treatment. The manuscript's discussion has also been strengthened by explicitly addressing why cysteine and methionine differ in their ability to rescue filamentation in S. cerevisiae versus C. albicans, highlighting species-specific differences in sulfur uptake and transsulfuration pathways.

      Overall, this revised manuscript provides compelling evidence for a previously unrecognized coupling between glycolysis and sulfur metabolism that shapes fungal morphogenesis and virulence. It opens new perspectives on metabolic control of fungal development and raises interesting mechanistic questions for future work.

      Comments on revisions:

      The authors have incorporated all of my suggested changes and addressed all raised concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript investigates the interplay between glycolysis and sulfur metabolism in regulating fungal morphogenesis and virulence. Using both Saccharomyces cerevisiae and Candida albicans, the authors demonstrate that glycolytic flux is essential for morphogenesis under nitrogen-limiting conditions, acting independently of the established cAMP-PKA pathway. Transcriptomic and genetic analyses reveal that glycolysis influences the de novo biosynthesis of sulfur-containing amino acids, specifically cysteine and methionine. Notably, supplementation with sulfur sources restores morphogenetic and virulence defects in glycolysis-deficient mutants, thereby linking core carbon metabolism with sulfur assimilation and fungal pathogenicity.

      Strengths:

      The work identifies a previously uncharacterized link between glycolysis and sulfur metabolism in fungi, bridging metabolic and morphogenetic regulation which is an important conceptual advance and fungal pathogenicity. Demonstrating that adding cysteine supplementation rescues virulence defects in animal model connects basic metabolism to infection outcomes that add on biomedical importance.

      Comments on revisions:

      The authors have sufficiently addressed my concern and provided a clear justification for their proposed model including the limitations of performing the mechanistic assays at this stage. I am satisfied with the response and have no further comments

    4. Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms.

      Strengths:

      The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is an advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      The ability to rescue the morphogenesis defect through supplementation of sulfur-containing amino acids provides a functional validation.

      Weaknesses:

      cAMP addition rescued the pseudohyphal differentiation defect exhibited by the ΔΔgpa2 strain. More clarity is needed on how this mechanism is mechanistically distinct from the metabolic control - whether cAMP acts in parallel or downstream to sulfur-containing amino acids biosynthesis has to be characterized. Supplementation of cysteine and methionine bypasses glycolytic regulation; the link between these amino acids and their role in fungal morphogenesis is not completely characterized.

      The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the downstream effects of Met4 activation were not fully characterized. How does Cysteine/Methionine rescue morphogenesis? The author's response figure 1 shows that there are no significant transcriptional changes in the expression of cAMP-PKA pathway-associated genes, which alone could not completely explain the role of gpa2 in morphogenesis, because exogenous cAMP can restore pseudohyphal differentiation in the ΔΔgpa2 background (Revised Fig. 1L). This implies that gpa2's function in morphogenesis is an additional, or possibly a metabolic or post-transcriptional, layer of regulation, and its connection to sulfur-containing amino acids remains to be elucidated.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fungal survival and pathogenicity rely on the ability to undergo reversible morphological transitions, which are often linked to nutrient availability. In this study, the authors uncover a conserved connection between glycolytic activity and sulfur amino acid biosynthesis that drives morphogenesis in two fungal model systems. By disentangling this process from canonical cAMP signaling, the authors identify a new metabolic axis that integrates central carbon metabolism with developmental plasticity and virulence.

      Strengths:

      The study integrates different experimental approaches, including genetic, biochemical, transcriptomic, and morphological analyses, and convincingly demonstrates that perturbations in glycolysis alter sulfur metabolic pathways and thus impact pseudohyphal and hyphal differentiation. Overall, this work offers new and important insights into how metabolic fluxes are intertwined with fungal developmental programs and therefore opens new perspectives to investigate morphological transitioning in fungi.

      We thank the reviewer for finding this study to be of importance and for appreciating our multipronged approach to substantiate our finding that perturbations in glycolysis alter sulfur metabolism and thus impact pseudohyphal and hyphal differentiation in fungi.

      Weaknesses:

      A few aspects could be improved to strengthen the conclusions. Firstly, the striking transcriptomic changes observed upon 2DG treatment should be analyzed in S. cerevisiae adh1 and pfk1 deletion strains, for instance, through qPCR or western blot analyses of sulfur metabolism genes, to confirm that observed changes in 2DG conditions mirror those seen in genetic mutants. Secondly, differences between methionine and cysteine in their ability to rescue the mutant phenotype in both species are not mentioned, nor discussed in more detail. This is especially important as there seem to be differences between S. cerevisiae and C. albicans, which might point to subtle but specific metabolic adaptations.

      The authors are also encouraged to refine several figure elements for clarity and comparability (e.g., harmonized axes in bar plots), condense the discussion to emphasize the conceptual advances over a summary of the results, and shorten figure legends.

      We are grateful for this valuable and constructive feedback, and we agree with the reviewer on the necessity of performing RT-qPCR analysis of sulfur metabolism genes in ∆∆pfk1 and ∆∆adh1 strains of S. cerevisiae to validate our RNA-Seq results using 2DG. We have performed this experiment, and our results show that several genes involved in the de novo biosynthesis of sulfur-containing amino acids are downregulated in both the ∆∆pfk1 and ∆∆adh1 strains, corroborating the downregulation of sulfur metabolism genes in the 2DG treated samples. This new data is now included in the revised manuscript as Supplementary Figure 2C. 

      Furthermore, we acknowledge the reviewer’s point regarding the significance of comparing the differences in the ability of methionine and cysteine to rescue filamentation defects exhibited by the mutants, between S. cerevisiae and C. albicans. The observed differences between S. cerevisiae and C. albicans likely highlight species-specific metabolic adaptations within the sulfur assimilation pathway.  While both yeasts employ the transsulfuration pathway to interconvert these sulfur-containing amino acids, the precise regulatory points including the specific enzymes, their compartmentalization, and transcriptional control are not identical. For instance, differences in the feedback inhibition mechanisms or the expression levels of key transsulfuration enzymes between S. cerevisiae and C. albicans could explain the variations in the phenotypic rescue experiments (Chebaro et al., 2017; Lombardi et al., 2024; Rouillon et al., 2000; Shrivastava et al., 2021; Thomas and Surdin-Kerjan, 1997). Furthermore, the species-specific differences in amino acid transport systems (permeases) adds another layer of complexity. S. cerevisiae primarily uses multiple, low-affinity permeases for cysteine transport (Gap1, Bap2, Bap3, Tat1, Tat2, Agp1, Gnp1, Yct1), while relying on a limited set of high-affinity transporters (like Mup1) for methionine transport, with the added complexity that its methionine transporters can also transport cysteine (Düring-Olsen et al., 1999; Huang et al., 2017; Kosugi et al., 2001; Menant et al., 2006). In contrast, C. albicans utilizes a high-affinity transporters for the uptake of both amino acids, employing Cyn1 specifically for cysteine and Mup1 for methionine, indicating a greater reliance on dedicated transport mechanisms for these sulfur-containing molecules in the pathogenic yeast (Schrevens et al., 2018; Yadav and Bachhawat, 2011). A combination of the aforesaid factors could be the potential reason for the differences in the ability of cysteine and methionine to rescue filamentation in S. cerevisiae and C. albicans.

      Finally, we have enhanced the quantitative rigor and clarity of the data presentation in the revised manuscript by implementing Y-axis uniformity across all relevant bar graphs to facilitate a more robust and direct comparative analysis. We have also condensed the discussion to emphasize the conceptual advances and have shortened the figure legends as per the reviewer suggestions

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the interplay between glycolysis and sulfur metabolism in regulating fungal morphogenesis and virulence. Using both Saccharomyces cerevisiae and Candida albicans, the authors demonstrate that glycolytic flux is essential for morphogenesis under nitrogen-limiting conditions, acting independently of the established cAMP-PKA pathway. Transcriptomic and genetic analyses reveal that glycolysis influences the de novo biosynthesis of sulfur-containing amino acids, specifically cysteine and methionine. Notably, supplementation with sulfur sources restores morphogenetic and virulence defects in glycolysis-deficient mutants, thereby linking core carbon metabolism with sulfur assimilation and fungal pathogenicity.

      Strengths:

      The work identifies a previously uncharacterized link between glycolysis and sulfur metabolism in fungi, bridging metabolic and morphogenetic regulation, which is an important conceptual advance and fungal pathogenicity. Demonstrating that adding cysteine supplementation rescues virulence defects in animal models connects basic metabolism to infection outcomes, which adds to biomedical importance.

      We would like to thank the reviewer for the positive comments on our work. We are pleased that they recognize the novel metabolic link between glycolysis and sulfur metabolism as a key conceptual advance in fungal morphogenesis. 

      Weaknesses:

      The proposed model that glycolytic flux modulates Met30 activity post-translationally remains speculative. While data support Met4 stabilization in met30 deletion strains, the mechanism of Met30 modulation by glycolysis is not demonstrated.

      We thank the reviewer for this valuable feedback. The activity of the SCF<sup>Met30</sup> E3 ubiquitin ligase, mediated by the F box protein Met30, is dynamically regulated through both proteolytic degradation and its dissociation from the SCF complex, to coordinate sulfur metabolism and cell cycle progression (Smothers et al., 2000; Yen et al., 2005). Our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism. This observation is consistent with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates the SCF<sup>Met30</sup> E3 ubiquitin ligase via the dissociation of Met30 from the Skp1 subunit of the SCF complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Mechanistic validation of this hypothesis, particularly the assessment of Met30 dissociation from the SCF<sup>Met30</sup> complex via immunoprecipitation (IP), is technically challenging. Since these experiments will involve isolation of cells from colonies undergoing pseudohyphal differentiation, on solid media (given that pseudohyphal differentiation does not occur in liquid media that is limiting for nitrogen (Gancedo, 2001; Gimeno et al., 1992)), current cell yields (OD<sub>600</sub>≈1 from ≈80-100 colonies) are significantly below the amount of cells that is needed to obtain the required amount of total protein concentration, for standard pull down assays (OD<Sub>600</sub>≈600-800 is required to achieve 1-2 mg/ml of total protein which is the standard requirement for pull down protocols in S. cerevisiae (Lauinger et al., 2024)).

      Given that the primary objective of our study is to establish the novel regulatory link between glycolysis and sulfur metabolism in the context of fungal morphogenesis, we would like to explore these crucial mechanistic details, in depth, in a subsequent study.

      Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms. However, despite the novel conceptual framework, the study provides limited mechanistic characterization of how the sulfur metabolism and glycolysis blockade directly drive morphological outcomes. In particular, the rationale for selecting specific gene deletions, such as Met32 (and not Met4), or the Met30 deletion used to probe this pathway, is not clearly explained, making it difficult to assess whether these targets comprehensively represent the metabolic nodes proposed to be critical. Further supportive data and experimental validation would strengthen the claims on connections between glycolysis, sulfur amino acid metabolism, and virulence.

      Strengths:

      (1) The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is a significant advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      (2) Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      (3) The ability to rescue the morphogenesis defect through exogenous supplementation of sulfur-containing amino acids provides functional validation.

      (4) The findings from the murine Pfk1-deficient model underscore the clinical significance of metabolic pathways in fungal infections.

      We are grateful for this comprehensive and insightful summary of our work. We deeply appreciate the reviewer's recognition of the key conceptual breakthroughs regarding the metabolic regulation of fungal morphogenesis and the clinical relevance of our findings.

      Weaknesses:

      (1) While the link between glycolysis and sulfur amino acid biosynthesis is established via transcriptomic and proteomic analysis, the specific regulation connecting these pathways via Met30 remains to be elucidated. For example, what are the expression and protein levels of Met30 in the initial analysis from Figure 2? How specific is this effect on Met30 in anaerobic versus aerobic glycolysis, especially when the pentose phosphate pathway is involved in the growth of the cells when glycolysis is perturbed ?

      We are grateful for the insightful feedback provided by the reviewer. S. cerevisiae is a Crabtree positive organism that primarily uses anaerobic glycolysis to metabolize glucose, under glucose-replete conditions (Barford and Hall, 1979; De Deken, 1966) and our pseudohyphal differentiation assays are performed in glucose-rich conditions (Gimeno et al., 1992). Furthermore, perturbation of glycolysis is known to induce compensatory upregulation of the Pentose Phosphate Pathway (PPP) (Ralser et al., 2007) and we have also observed the upregulation of the gene that encodes for transketolase-1 (Tkl1), a key enzyme in the PPP, in our RNA-seq data. Importantly, our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism.  This aligns with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates SCF<sup>Met30</sup> E3 ubiquitin ligase via Met30 dissociation from the Skp1 subunit of the complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Further experiments are required to delineate the specific role of pentose phosphate pathway in the aforesaid proposed regulation of the Met30 activity under glycolysis perturbation and this will be explored in our subsequent study.

      (2) Including detailed metabolite profiling could have strengthened the metabolic connection and provided additional insights into intermediate flux changes, i.e., measuring levels of metabolites to check if cysteine or methionine levels are influenced intracellularly. Also, it is expected to see how Met30 deletion could affect cell growth. Data on Met30 deletion and its effect on growth are not included, especially given that a viable heterozygous Met30 strain has been established. Measuring the cysteine or methionine levels using metabolomic analysis would further strengthen the claims in every section.

      We are grateful to the reviewer for this constructive feedback. To address the potential impact of met30 deletion on cell growth, we have included new data (Suppl. Fig. 4A) demonstrating that the deletion of a single copy of met30 in diploid S. cerevisiae does not compromise overall cell growth under nitrogen-limiting conditions as the ∆met30 strain grows similar to the wild-type strain. 

      Our pseudohyphal/hyphal differentiation assays show that the defects induced by glycolytic perturbation is fully rescued by the exogenous supplementation of sulfur-containing amino acids, cysteine or methionine. Since these data conclusively demonstrate that the primary metabolic limitation caused by the perturbation of glycolysis, which leads to filamentation defects is sulfur metabolism, we posit that performing comprehensive metabolic profiling would primarily reconfirm the aforesaid results. We believe that our in vitro and in vivo sulfur add-back experiments sufficiently substantiate the novel regulatory metabolic link between glycolysis and sulfur metabolism.

      (3) In comparison with the previous bioRxiv (doi: https://doi.org/10.1101/2025.05.14.654021) of this article in May 2025 to the recent bioRxiv of this article (doi: https://doi.org/10.1101/2025.05.14.654021), there have been some changes, and Met30 deletion has been recently included, and the chemical perturbation of glycolysis has been added as new data. Although the changes incorporated in the recent version of the article improved the illustration of the hypothesis in Figure 6, which connects glycolysis to Sulfur metabolism, the gene expression and protein levels of all genes involved in the illustrated hypothesis are not consistently shown. For example, in some cases, the Met4 expression is not shown (Figure 4), and the Met30 expression is not shown during profiling (gene expression or protein levels) throughout the manuscript. Lack of consistency in profiling the same set of key genes makes understanding more complicated.

      We thank the reviewer for this feedback which helps us to clarify the scope of our transcriptomic analysis. Our decision to focus our RT-qPCR experiments on downstream targets, while excluding met4 and met30 from the RT-qPCR analysis, is based on their known regulatory mechanisms. Met4 activity is predominantly regulated by post-translational ubiquitination by the SCFMet30 complex followed by its degradation (Rouillon et al., 2000; Shrivastava et al., 2021; Smothers et al., 2000)  while Met30 activity is primarily regulated by its auto-degradation or its dissociation from the SCFMet30 complex (Lauinger et al., 2024; Smothers et al., 2000; Yen et al., 2005).  Consistent with this, our RNA-Seq results indicate that neither met4 nor met30 transcripts are differentially expressed, in response to 2DG addition. For all our RT-qPCR analysis in S. cerevisiae and C. albicans, we have consistently used the same set of sulfur metabolism genes and these include met32, met3, met5, met10 and met17. Our data on protein expression analysis of Met30 in S. cerevisiae (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCFMet30 proteasomal degradation as the dominant regulatory mechanism.

      (4) The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the Met4 activation was not fully characterized, nor were the data presented when virulence was assessed in Figure 4. Why is Met4 not included in Figure 4D and I? Especially, according to Figure 6, Met4 activation is crucial and guides the differences between glycolysis-active and inactive conditions.

      We thank the reviewer for their input. As the Met4 transcription factor in C. albicans is primarily regulated post-translationally through its degradation and inactivation by the SCFMet30 E3 ubiquitin ligase complex (Shrivastava et al., 2021), we opted to monitor the transcriptional status of downstream targets of Met4 (i.e., genes directly regulated by Met4), as these are the genes that exhibit the most direct and functionally relevant transcriptional changes in response to the altered Met4 levels.

      (5) Similarly, the rationale behind selecting Met32 for characterizing sulfur metabolism is unclear. Deletion of Met32 resulted in a significant reduction in pseudohyphal differentiation; why is this attributed only to Met32? What happens if Met4 is deleted? It is not justified why Met32, rather than Met4, was chosen. Figure 6 clearly hypothesizes that Met4 activation is the key to the mechanism.

      We sincerely thank the reviewer for this insightful query regarding our selection of the met32 for our gene deletion experiments. The choice of ∆∆met32 strain was strategically motivated by its unique phenotypic properties within the de novo biosynthesis of sulfur-containing amino acids pathway. While deletions of most the genes that encode for proteins involved in the de novo biosynthesis of sulfurcontaining amino acids, result in auxotrophy for methionine or cysteine, ∆∆met32 strain does not exhibit this phenotype (Blaiseau et al., 1997). This key distinction is attributed to the functional redundancy provided by the paralogous gene, met31 (Blaiseau et al., 1997). Crucially, given that the deletion of the central transcriptional regulator, met4, results in cysteine/methionine auxotrophy, the use of the ∆∆met32 strain provides an essential, viable experimental model for investigating the role of sulfur metabolism during pseudohyphal differentiation in S. cerevisiae.

      (6) The comparative RT-qPCR in Figure 5 did not account for sulfur metabolism genes, whereas it was focused only on virulence and hyphal differentiation. Is there data to support the levels of sulfur metabolism genes?

      We thank the reviewer for this feedback. We wish to respectfully clarify that the data pertaining to expression of sulfur metabolism genes in the presence of 2DG or in the ∆∆pfk1 strain in C. albicans are already included and discussed within the manuscript. These results can be found in Figure 4, panels D and I, respectively.

      (7) To validate the proposed interlink between sulfur metabolism and virulence, it is recommended that the gene sets (illustrated in Figure 6) be consistently included across all comparative data included throughout the comparisons. Excluding sulfur metabolism genes in Figure 5 prevents the experiment from demonstrating the coordinated role of glycolysis perturbation → sulfur metabolism → virulence. The same is true for other comparisons, where the lack of data on Met30, Met4, etc., makes it hard.to connect the hypothesis. It is also recommended to check the gene expression of other genes related to the cAMP pathway and report them to confirm the cAMP-independent mechanism. For example, gap2 deletion was used to confirm the effects of cAMP supplementation, but the expression of this gene was not assessed in the RNA-seq analysis in Figure 2. It would be beneficial to show the expression of cAMP-related genes to completely confirm that they do not play a role in the claims in Figure 2.

      We thank the reviewer for this valuable feedback. The transcriptional analysis of the sulfur metabolism genes in the presence of 2DG and the ∆∆pfk1 strain is shown in Figures 4D and 4I.

      Our RNA-seq analysis (Author response image 1) confirms that there is no significant transcriptional change in the expression of cAMP-PKA pathway associated genes (Log2 fold change ≥ 1 for upregulated genes and Log2 fold change ≤ -1 for downregulated genes) in 2DG treated cells compared to the untreated control cells, reinforcing our conclusion that the glycolytic regulation of fungal morphogenesis is mediated through a cAMP-PKA pathway independent mechanism.

      Author response image 1.

      (8) Although the NAC supplementation study is included in the new version of the article compared to the previous version in BioRxiv (May 2025), the link to sulfur metabolism is not well characterized in Figure 5 and their related datasets. The main focus of the manuscript is to delineate the role of sulfur metabolism; hence, it is anticipated that Figure 5 will include sulfur-related metabolic genes and their links to pfk1 deletion, using RT-PCR measurements as shown for the virulence genes.

      We thank the reviewer for this question. The relevant data are indeed present within the current submission. We respectfully direct the reviewer's attention to Figure 4, panels D and I, where the data pertaining to expression of sulfur metabolism genes in the presence of 2DG or in the ∆∆pfk1 strain in C. albicans can be found.

      (9) The manuscript would benefit from more information added to the introduction section and literature supports for some of the findings reported earlier, including the role of (i) cAMP-PKA and MAPK pathways, (ii) what is known in the literature that reports about the treatment with 2DG (role of Snf1, HXT1, and HXT3), as well as how gpa2 is involved. Some sentences in the manuscripts are repetitive; it would be beneficial to add more relevant sections to the introduction and discussion to clarify the rationale for gene choices.

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 107: As morphological transitions are indeed a conserved phenomenon across fungal species, hosts & environmental niches, the authors could refer to a few more here (infection structures like appressoria; fruiting bodies, etc.).

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      Line 119/120: That's a bit misleading in my opinion. Gpr1 acts as a key sensor of external carbon, while Ras proteins control the cAMP pathway as intracellular sensory proteins. That should be stated more clearly. cAMP is the output and not the sensor.

      We appreciate the reviewer's detailed attention to this signaling network. We have revised the manuscript to precisely reflect this established signaling hierarchy for maximum clarity.

      (2) Line 180: ..differentiation

      We thank the reviewer for this valuable feedback. We have incorporated this change in our revised manuscript.

      (3) Figure 1 panels C & F. The authors should provide the same scale for all experiments. Otherwise, the interpretation can be difficult. The same applies to the different bar plots in Figure 4. Have the authors quantified pseudohyphal differentiation in the cAMP add-back assays? I agree that the chosen images look convincing, but they don't reflect quantitative analyses.

      We thank the reviewer for detailed and constructive feedback. We have changed the Y-axis and made it more uniform to improve the clarity of our data presentation in the revised manuscript.

      We have also incorporated the quantitative analysis of the cAMP add-back assays in S. cerevisiae, in Figure 2 Panel L.

      (4) Line 367/68: "cysteine or methionine was able to completely rescue". Here, the authors should phrase their wording more carefully. Figure 3C shows the complete rescue of the phenotype qualitatively, but Figure 3D clearly shows that there are differences between the supplementation of cysteine and methionine, with the latter not fully restoring the phenotype.

      We sincerely appreciate the reviewer's meticulous attention to the data interpretation. We fully agree that the initial phrasing in lines 367/368 requires adjustment, as Figure 3D establishes a quantitative difference in the efficiency of phenotypic rescue between cysteine and methionine supplementation. We have revised the text to articulate this difference.

      (5) Line 568: Here, apparently, the ability to rescue the differentiation phenotype is reversed compared to the experiment with S. cerevisiae. Cysteine only results in ~20% hyphal cells, while methionine restores to wild-type-like hyphal formation. Can the authors comment on where these differences might originate from? Is there a difference in the uptake of cysteine vs. methionine in the two species or consumption rates?

      We thank the reviewer for their detailed and constructive feedback. We believe this phenotypic difference can be due to the distinct metabolic prioritization of sulfur amino acids in C. albicans. Methionine is a known trigger for hyphal differentiation in C. albicans and serves as the immediate precursor for the universal methyl donor, S-adenosylmethionine (SAM) (Schrevens et al., 2018). (Kraidlova et al., 2016). The morphological transition to hyphae involves a complex regulatory cascade which requires high rates of methylation, and this requires a rapid and direct conversion of methionine into SAM (Kraidlova et al., 2016; Schrevens et al., 2018). Cysteine, however, must first be converted into methionine via the transsulfuration pathway to produce SAM, making it metabolically less efficient for these aforesaid processes.

      Reviewer #2 (Recommendations for the authors):

      The study's comprehensive experimental approach with integrating pharmacological inhibition, genetic manipulation, transcriptomics, and infection animal model, provides strong evidence for a conserved mechanism, though some aspects need further clarification.

      Major Comments:

      (1) While the data suggest that glycolysis affects Met30 activity post-translationally, the underlying mechanism remains speculative. The authors should perform co-immunoprecipitation or ubiquitination assays to confirm whether glycolytic perturbation alters Met30-SCF complex interactions or Met4 ubiquitination levels.

      We thank the reviewer for this valuable feedback. The activity of the SCF<sup>Met30</sup> E3 ubiquitin ligase, mediated by the F box protein Met30, is dynamically regulated through both proteolytic degradation and its dissociation from the SCF complex, to coordinate sulfur metabolism and cell cycle progression (Smothers et al., 2000; Yen et al., 2005). Our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism. This observation is consistent with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates the SCF<sup>Met30</sup> E3 ubiquitin ligase via the dissociation of Met30 from the Skp1 subunit of the SCF complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Mechanistic validation of this hypothesis, particularly the assessment of Met30 dissociation from the SCF<sup>Met30 </sup>complex via immunoprecipitation (IP), is technically challenging. Since these experiments will involve isolation of cells from colonies undergoing pseudohyphal differentiation, on solid media (given that pseudohyphal differentiation does not occur in liquid media that is limiting for nitrogen (Gancedo, 2001; Gimeno et al., 1992)), current cell yields (OD<sup>600</sup>≈1 from ≈80-100 colonies) are significantly below the amount of cells that is needed to obtain the required amount of total protein concentration, for standard pull down assays (OD600≈600-800 is required to achieve 1-2 mg/ml of total protein which is the standard requirement for pull down protocols in S. cerevisiae (Lauinger et al., 2024)).

      Given that the primary objective of our study is to establish the novel regulatory link between glycolysis and sulfur metabolism in the context of fungal morphogenesis, we would like to explore these crucial mechanistic details, in depth, in a subsequent study.

      (2) 2DG can exert pleiotropic effects unrelated to glycolytic inhibition (e.g., ER stress, autophagy induction). The authors are encouraged to perform complementary metabolic flux analyses, such as quantification of glycolytic intermediates or ATP levels, to confirm specific glycolytic inhibition.

      We appreciate the reviewer's concern regarding the potential pleiotropic effects of 2DG. While we acknowledge that 2DG may induce secondary cellular stress, we are confident that the observed phenotypes are robustly attributed to glycolytic inhibition based on our complementary genetic evidence. Specifically, the deletion strains ∆∆pfk1 and ∆∆adh1, which genetically perturb distinct steps in glycolysis, recapitulate the phenotypic results observed with 2DG treatment. Given this strong congruence between chemical inhibition and specific genetic deletions of key glycolytic enzymes, we are confident that our observed phenotypes are predominantly driven by the perturbation of the glycolytic pathway by 2DG.

      (3) The differential rescue effects (cysteine-only in inhibitor assays vs. both cysteine and methionine in genetic mutants) require further explanation. The authors should discuss potential differences in metabolic interconversion or amino acid transport that may account for this observation.

      We thank the reviewer for their valuable feedback. One explanation for the observed differential rescue effects of cysteine and methionine can be due to the distinct amino acid transport systems used by S. cerevisiae to transport these amino acids. S. cerevisiae primarily uses multiple, lowaffinity permeases (Gap1, Bap2, Bap3, Tat1, Tat2, Agp1, Gnp1, Yct1) for cysteine transport, while relying on a limited set of high-affinity transporters (like Mup1) for methionine transport, with the added complexity that its methionine transporters can also transport cysteine (Düring-Olsen et al., 1999; Huang et al., 2017; Kosugi et al., 2001; Menant et al., 2006). Hence, it is likely that cysteine uptake could be happening at a higher efficiency in S. cerevisiae compared to methionine uptake. Therefore, to achieve a comparable functional rescue by exogenous supplementation of methionine, it is necessary to use a higher concentration of methionine. When we performed our rescue experiments using higher concentrations of methionine, we did not see any rescue of pseudohyphal differentiation in the presence of 2DG and in fact we noticed that, at higher concentrations of methionine, the wild-type strain failed to undergo pseudohyphal differentiation even in the absence of 2DG. This is likely due to the fact that increasing the methionine concentration raises the overall nitrogen content of the medium, thereby making the medium less nitrogen-starved. This presents a major experimental constraint, as pseudohyphal differentiation is strictly dependent on nitrogen limitation, and the elevated nitrogen resulting from the higher methionine concentration can inhibit pseudohyphal differentiation.

      (4) NAC may influence host redox balance or immune responses. The discussion should consider whether the observed virulence rescue could partly result from host-directed effects.

      We thank the reviewer for this valuable feedback. We acknowledge the role of NAC in host directed immune response. It is important to note that, in the context of certain bacterial pathogens, NAC has been reported to augment cellular respiration, subsequently increasing Reactive Oxygen Species (ROS) generation, which contributes to pathogen clearance (Shee et al., 2022). Interestingly, in our study, NAC supplementation to the mice was given prior to the infection and maintained continuously throughout the duration of the experiment. This continuous supply of NAC likely contributes to the rescue of virulence defects exhibited by the ∆∆pfk1 strain (Fig. 5I and J). Essentially, NAC likely allows the mutant to fully activate its essential virulence strategies (including morphological switching), to cause a successful infection in the host. As per the reviewer suggestion, this has been included in the discussion section of the manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the comments related to improving the manuscript have been provided in the public review. Here are some specifics for the authors to consider:

      (1) It is important to clarify the rationale for choosing specific gene deletions over other key genes (e.g., Met32 and Met30) and explain why Met4 was not included, given its proposed central role in Figure 6.

      We sincerely thank the reviewer for this insightful query regarding our selection of the met32 for our gene deletion experiments. The choice of ∆∆met32 strain was strategically motivated by its unique phenotypic properties within the de novo biosynthesis of sulfur-containing amino acids pathway. While deletions of most the genes that encode for proteins involved in the de novo biosynthesis of sulfurcontaining amino acids, result in auxotrophy for methionine or cysteine, ∆∆met32 strain does not exhibit this phenotype (Blaiseau et al., 1997). This key distinction is attributed to the functional redundancy provided by the paralogous gene, met31 (Blaiseau et al., 1997). Crucially, given that the deletion of the central transcriptional regulator, met4, results in cysteine/methionine auxotrophy, the use of the ∆∆met32 strain provides an essential, viable experimental model for investigating the role of sulfur metabolism during pseudohyphal differentiation in S. cerevisiae.

      (2) Comparison of consistent gene and protein expression data (Met30, Met4, Met32) across all relevant figures and analyses would strengthen the mechanistic connection in a better way. Some data that might help connect the sections is not included; please see the public review for more details.

      We thank the reviewer for this valuable input, which helps us to clarify the scope of our transcriptomic analysis. Our decision to focus our RT-qPCR experiments on downstream targets, while excluding Met4 and Met30 from the RT-qPCR analysis, is based on their known regulatory mechanisms. Met4 activity is predominantly regulated by post-translational ubiquitination by the SCFMet30 complex followed by its degradation (Rouillon et al., 2000; Shrivastava et al., 2021; Smothers et al., 2000)  while Met30 activity is primarily regulated by its auto-degradation or its dissociation from the SCFMet30 complex (Lauinger et al., 2024; Smothers et al., 2000; Yen et al., 2005).  Consistent with this, our RNA-Seq results indicate that neither met4 nor met30 transcripts are differentially expressed, in response to 2DG addition. For all our RT-qPCR analysis in S. cerevisiae and C. albicans, we have consistently used the same set of sulfur metabolism genes and these include met32, met3, met5, met10 and met17. Our data on protein expression analysis of Met30 in S, cerevisiae (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCFMet30 proteasomal degradation as the dominant regulatory mechanism.

      (3) Suggested to include metabolomic profiling (cysteine, methionine, and intermediate metabolites) to substantiate the proposed metabolic flux between glycolysis and sulfur metabolism.

      We thank the reviewer for this valuable input. Our pseudohyphal/hyphal differentiation assays show that the defects induced by glycolytic perturbation is fully rescued by the exogenous supplementation of sulfur-containing amino acids, cysteine or methionine. Since these data conclusively demonstrate that the primary metabolic limitation caused by the perturbation of glycolysis, which leads to filamentation defects, is sulfur metabolism, we posit that performing comprehensive metabolic profiling would primarily reconfirm the aforesaid results. We believe that our in vitro and in vivo sulfur add-back experiments sufficiently substantiate the novel regulatory metabolic link between glycolysis and sulfur-metabolism.

      (4) Data on the effects of Met30 deletion on cell growth are currently not included, and relevant controls should be included to ensure observed phenotypes are not due to general growth defects.

      We are grateful to the reviewer for this constructive feedback. To address the potential impact of met30 deletion on cell growth, we have included new data (Suppl. Fig. 4A) demonstrating that the deletion of a single copy of met30 in diploid S. cerevisiae does not compromise overall growth under nitrogen-limiting conditions as the ∆met30 strain grows similar to the wild-type strain.

      (5) Expanding RT-qPCR and data from transcriptomic analyses to include sulfur metabolism genes and key cAMP pathway genes to confirm the proposed cAMP-independent mechanism during virulence characterization is necessary.

      We thank the reviewer for this valuable feedback. The transcriptional analysis of the sulfur metabolism genes in the presence of 2DG and the ∆∆pfk1 strain is shown in Figures 4D and 4I. 

      In order to confirm that glycolysis is critical for fungal morphogenesis in a cAMP-PKA pathway independent manner under nitrogen-limiting conditions in C. albicans, we performed cAMP add-back assays. Interestingly, corroborating our S. cerevisiae data, the exogenous addition of cAMP failed to rescue hyphal differentiation defect caused by the perturbation of glycolysis through 2DG addition or by the deletion of the pfk1 gene, under nitrogen-limiting condition in C. albicans. This data is now included in Suppl. Fig. 5B.

      (6) Enhancing the introduction and discussion by providing a clearer rationale for gene selection and more detailed references to established pathways (cAMP-PKA, MAPK, Snf1/HXT regulation, gpa2 involvement) is needed to reinstate the hypothesis.

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      (7) Reducing redundancy in the text and improving figure consistency, particularly by ensuring that the gene sets depicted in Figure 6 are represented across all datasets, would strengthen the interconnections among sections.

      We thank the reviewer for this valuable feedback.  We have incorporated these changes in our revised manuscript.

      References

      Barford JP, Hall RJ. 1979. An examination of the crabtree effect in Saccharomyces cerevisiae: The role of respiratory adaptation. J Gen Microbiol. https://doi.org/10.1099/00221287-114-2-267

      Blaiseau, P. L., & Thomas, D. (1998). Multiple transcriptional activation complexes tether the yeast activator Met4 to DNA. The EMBO journal, 17(21), 6327–6336. https://doi.org/10.1093/emboj/17.21.6327

      Chebaro, Y., Lorenz, M., Fa, A., Zheng, R., & Gustin, M. (2017). Adaptation of Candida albicans to Reactive Sulfur Species. Genetics, 206(1), 151–162. https://doi.org/10.1534/genetics.116.199679

      De Deken R. H. (1966). The Crabtree effect: a regulatory system in yeast. Journal of general microbiology, 44(2), 149–156. https://doi.org/10.1099/00221287-44-2-149

      Düring-Olsen, L., Regenberg, B., Gjermansen, C., Kielland-Brandt, M. C., & Hansen, J. (1999). Cysteine uptake by Saccharomyces cerevisiae is accomplished by multiple permeases. Current genetics, 35(6), 609–617. https://doi.org/10.1007/s002940050459

      Gancedo J. M. (2001). Control of pseudohyphae formation in Saccharomyces cerevisiae. FEMS microbiology reviews, 25(1), 107–123. https://doi.org/10.1111/j.1574-6976.2001.tb00573.x

      Gimeno, C. J., Ljungdahl, P. O., Styles, C. A., & Fink, G. R. (1992). Unipolar cell divisions in the yeast S. cerevisiae lead to filamentous growth: regulation by starvation and RAS. Cell, 68(6), 1077–1090. https://doi.org/10.1016/0092-8674(92)90079-r

      Huang, C. W., Walker, M. E., Fedrizzi, B., Gardner, R. C., & Jiranek, V. (2017). Yeast genes involved in regulating cysteine uptake affect production of hydrogen sulfide from cysteine during fermentation. FEMS yeast research, 17(5), 10.1093/femsyr/fox046. https://doi.org/10.1093/femsyr/fox046

      Kosugi, A., Koizumi, Y., Yanagida, F., & Udaka, S. (2001). MUP1, high affinity methionine permease, is involved in cysteine uptake by Saccharomyces cerevisiae. Bioscience, biotechnology, and biochemistry, 65(3), 728–731. https://doi.org/10.1271/bbb.65.728

      Kraidlova, L., Schrevens, S., Tournu, H., Van Zeebroeck, G., Sychrova, H., & Van Dijck, P. (2016). Characterization of the Candida albicans Amino Acid Permease Family: Gap2 Is the Only General Amino Acid Permease and Gap4 Is an S-Adenosylmethionine (SAM) Transporter Required for SAM-Induced Morphogenesis. mSphere, 1(6), e00284-16. https://doi.org/10.1128/mSphere.00284-16

      Lauinger, L., Andronicos, A., Flick, K., Yu, C., Durairaj, G., Huang, L., & Kaiser, P. (2024). Cadmium binding by the F-box domain induces p97-mediated SCF complex disassembly to activate stress response programs. Nature communications, 15(1), 3894. https://doi.org/10.1038/s41467-024-48184-6

      Lombardi, L., Salzberg, L. I., Cinnéide, E. Ó., O'Brien, C., Morio, F., Turner, S. A., Byrne, K. P., & Butler, G. (2024). Alternative sulphur metabolism in the fungal pathogen Candida parapsilosis. Nature communications, 15(1), 9190. https://doi.org/10.1038/s41467-024-53442-8

      Menant, A., Barbey, R., & Thomas, D. (2006). Substrate-mediated remodeling of methionine transport by multiple ubiquitin-dependent mechanisms in yeast cells. The EMBO journal, 25(19), 4436–4447. https://doi.org/10.1038/sj.emboj.7601330

      Ralser, M., Wamelink, M. M., Kowald, A., Gerisch, B., Heeren, G., Struys, E. A., Klipp, E., Jakobs, C., Breitenbach, M., Lehrach, H., & Krobitsch, S. (2007). Dynamic rerouting of the carbohydrate flux is key to counteracting oxidative stress. Journal of biology, 6(4), 10. https://doi.org/10.1186/jbiol61

      Rouillon, A., Barbey, R., Patton, E. E., Tyers, M., & Thomas, D. (2000). Feedback-regulated degradation of the transcriptional activator Met4 is triggered by the SCF(Met30 )complex. The EMBO journal, 19(2), 282–294. https://doi.org/10.1093/emboj/19.2.282

      Schrevens, S., Van Zeebroeck, G., Riedelberger, M., Tournu, H., Kuchler, K., & Van Dijck, P. (2018). Methionine is required for cAMP-PKA-mediated morphogenesis and virulence of Candida albicans. Molecular microbiology, 108(3), 258–275. https://doi.org/10.1111/mmi.13933

      Shee, S., Singh, S., Tripathi, A., Thakur, C., Kumar T, A., Das, M., Yadav, V., Kohli, S., Rajmani, R. S., Chandra, N., Chakrapani, H., Drlica, K., & Singh, A. (2022). Moxifloxacin-Mediated Killing of Mycobacterium tuberculosis Involves Respiratory Downshift, Reductive Stress, and Accumulation of Reactive Oxygen Species. Antimicrobial agents and chemotherapy, 66(9), e0059222. https://doi.org/10.1128/aac.00592-22

      Shrivastava, M., Feng, J., Coles, M., Clark, B., Islam, A., Dumeaux, V., & Whiteway, M. (2021). Modulation of the complex regulatory network for methionine biosynthesis in fungi. Genetics, 217(2), iyaa049. https://doi.org/10.1093/genetics/iyaa049

      Smothers, D. B., Kozubowski, L., Dixon, C., Goebl, M. G., & Mathias, N. (2000). The abundance of Met30p limits SCF(Met30p) complex activity and is regulated by methionine availability. Molecular and cellular biology, 20(21), 7845–7852. https://doi.org/10.1128/MCB.20.21.7845-7852.2000

      Thomas, D., & Surdin-Kerjan, Y. (1997). Metabolism of sulfur amino acids in Saccharomyces cerevisiae. Microbiology and molecular biology reviews : MMBR, 61(4), 503–532. https://doi.org/10.1128/mmbr.61.4.503532.1997

      Yadav, A. K., & Bachhawat, A. K. (2011). CgCYN1, a plasma membrane cystine-specific transporter of Candida glabrata with orthologues prevalent among pathogenic yeast and fungi. The Journal of biological chemistry, 286(22), 19714–19723. https://doi.org/10.1074/jbc.M111.240648

      Yen, J. L., Su, N. Y., & Kaiser, P. (2005). The yeast ubiquitin ligase SCFMet30 regulates heavy metal response. Molecular biology of the cell, 16(4), 1872–1882. https://doi.org/10.1091/mbc.e04-12-1130

    1. eLife Assessment

      This study presents DeepTX, a valuable methodological tool that integrates mechanistic stochastic models with single-cell RNA sequencing data to infer transcriptional burst kinetics at genome scale. The approach is broadly applicable and of interest to subfields such as systems biology, bioinformatics, and gene regulation. The evidence supporting the findings is solid, with appropriate validation on synthetic data and thoughtful discussion of limitations related to identifiability and model assumptions.

    2. Joint Public Review:

      In this work, the authors present DeepTX, a computational tool for studying transcriptional bursting using single-cell RNA sequencing (scRNA-seq) data and deep learning. The method aims to infer transcriptional burst dynamics-including key model parameters and the associated steady-state distributions-directly from noisy single-cell data. The authors apply DeepTX to datasets from DNA damage experiments, revealing distinct regulatory patterns: IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU alters burst frequency in human cancer cells, driving apoptosis or survival depending on dose. These findings underscore the role of burst regulation in mediating cell fate responses to DNA damage.

      The main strength of this study lies in its methodological contribution. DeepTX integrates a non-Markovian mechanistic model with deep learning to approximate steady-state mRNA distributions as mixtures of negative binomial distributions, enabling genome-scale parameter inference with reduced computational cost. The authors provide a clear discussion of the framework's assumptions, including reliance on steady-state data and the inherent unidentifiability of parameter sets, and they outline how the model could be extended to other regulatory processes.

      The revised manuscript addresses the original concerns raised by the reviewers, particularly those related to sample size requirements, distributional assumptions, and the biological interpretation of the inferred parameters. The authors have also included an extensive discussion of the limitations of the methodological framework, including the constraints associated with relying on snapshot data, as well as a broader contextualisation of DeepTX within the landscape of existing tools that link mechanistic modelling and single-cell transcriptomics.

      Overall, this work represents a valuable contribution to the integration of mechanistic models with high-dimensional single-cell data. It will be of interest to researchers in systems biology, bioinformatics, and computational modelling.

      Comments on revisions:

      We thank the authors for their thorough revision and for carefully addressing the points raised in the previous review. At this stage, the reviewers have no further concerns.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Joint Public Review:

      In this work, the authors present DeepTX, a computational tool for studying transcriptional bursting using single-cell RNA sequencing (scRNA-seq) data and deep learning. The method aims to infer transcriptional burst dynamics-including key model parameters and the associated steady-state distributions-directly from noisy single-cell data. The authors apply DeepTX to datasets from DNA damage experiments, revealing distinct regulatory patterns: IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU alters burst frequency in human cancer cells, driving apoptosis or survival depending on dose. These findings underscore the role of burst regulation in mediating cell fate responses to DNA damage.

      The main strength of this study lies in its methodological contribution. DeepTX integrates a non-Markovian mechanistic model with deep learning to approximate steady-state mRNA distributions as mixtures of negative binomial distributions, enabling genome-scale parameter inference with reduced computational cost. The authors provide a clear discussion of the framework's assumptions, including reliance on steady-state data and the inherent unidentifiability of parameter sets, and they outline how the model could be extended to other regulatory processes.

      The revised manuscript addresses many of the original concerns, particularly regarding sample size requirements, distributional assumptions, and the biological interpretation of inferred parameters. However, the framework remains limited by the constraints of snapshot data and cannot yet resolve dynamic heterogeneity or causality. The manuscript would also benefit from a broader contextualisation of DeepTX within the landscape of existing tools linking mechanistic modelling and single-cell transcriptomics. Finally, the interpretation of pathway enrichment analyses still warrants clarification.

      Overall, this work represents a valuable contribution to the integration of mechanistic models with highdimensional single-cell data. It will be of interest to researchers in systems biology, bioinformatics, and computational modelling.

      Recommendations for the authors:

      We thank the authors for their thorough revision and for addressing many of the points raised during the initial review. The revised manuscript presents an improved and clearer account of the methodology and its implications. However, several aspects would benefit from further clarification and refinement to strengthen the presentation and avoid overstatement.

      (1) Contextualization within the existing literature

      The manuscript would benefit from placing DeepTX more clearly in the context of other computational tools developed to connect mechanistic modelling and single-cell RNA sequencing data. This is an active area of research with notable recent contributions, including Sukys and Grima (bioRxiv, 2024), Garrido-Rodriguez et al. (PLOS Comp Biol, 2021), and Maizels (2024). Positioning DeepTX in relation to these and other relevant efforts would help readers appreciate its specific advances and contributions.

      We sincerely thank you for this valuable suggestion. We agree that situating DeepTX within the broader landscape of computational approaches linking mechanistic modeling and single-cell RNA sequencing data will clarify its contributions and advances. In this revised version, we have explicitly discussed the comparison and relation of DeepTX in the context of this active area using an individual paragraph in the Discussion section.

      Specifically, we mentioned that the DeepTX research paradigm contributes to a growing line of area aiming to link mechanistic models of gene regulation with scRNA-seq data. Maizels provided a comprehensive review of computational strategies for incorporating dynamic mechanisms into single-cell transcriptomics (Maizels RJ, 2024). In this context, RNA velocity is one of the most important examples as it infers short-term transcriptional trends based on splicing kinetics and deterministic ODEs model. However, such approaches are limited by their deterministic assumptions and cannot fully capture the stochastic nature of gene regulation. DeepTX can be viewed as an extension of this framework to stochastic modelling, explicitly addressing transcriptional bursting kinetics under DNA damage. Similarly, DeepCycle, developed by Sukys and Grima (Sukys A & Grima R, 2025), investigates transcriptional burst kinetics during the cell cycle, employing a stochastic age-dependent model and a neural network to infer burst parameters while correcting for measurement noise. By contrast, MIGNON integrates genomic variation data and static transcriptomic measurements into a mechanistic pathway model (HiPathia) to infer pathway-level activity changes, rather than gene-level stochastic transcriptional dynamics (Garrido-Rodriguez M et al., 2021). In this sense, DeepTX and MIGNON are complementary, with DeepTX resolving burst kinetics at the single-gene level and MIGNON emphasizing pathway responses to genomic perturbations, which could inspire future extensions of DeepTX that incorporate sequence-level information.

      (2) Interpretation of GO analysis

      The interpretation of the GO enrichment results in Figure 4D should be revised. While the text currently associates the enriched terms with signal transduction and cell cycle G2/M phase transition, the most significant terms relate to mitotic cell cycle checkpoint signaling. This distinction should be made clear in the main text, and the conclusions drawn from the GO analysis should be aligned more closely with the statistical results.

      We sincerely appreciate you for the insightful comment. We have carefully re-examined the GO enrichment results shown in Figure 4D and agree that the most significantly enriched terms correspond to mitotic cell cycle checkpoint signaling and signal transduction in response to DNA damage, rather than general G2/M phase transition processes. Accordingly, we have revised the main text to highlight the biological significance of mitotic cell cycle checkpoint signaling.

      Specifically, we now emphasize two key points: DNA damage and mitotic checkpoint activation are closely interconnected. (1) The mitotic checkpoint serves as a crucial safeguard to ensure accurate chromosome segregation and maintain genomic stability under DNA damage conditions. Activation of the mitotic checkpoint can influence cell fate decisions and differentiation potential (Kim EM & Burke DJ, 2008; Lawrence KS et al., 2015). (2) Sustained activation of the spindle assembly checkpoint (SAC) has been reported to induce mitotic slippage and polyploidization, which in turn may enhance the differentiation potential of embryonic stem cells  (Mantel C et al., 2007). These revisions ensure that our interpretation is consistent with the statistical enrichment results and better reflect the underlying biological processes implicated by the data.

      (3) Justification for training on simulated data

      The decision to train the model on simulated data should be clearly justified. While the advantage of having access to ground-truth parameters is understood, the manuscript would benefit from a discussion of the limitations of this approach, particularly in terms of generalizability to real datasets. Moreover, it is worth noting that many annotated scRNA-seq datasets are publicly available and could, in principle, be used to complement the training strategy.

      We thank you for this insightful comment. We chose to train DeepTXsolver on simulated data because no experimental dataset currently provides genome-wide transcriptional burst kinetics with known ground truth, which is essential for supervised learning. Simulation enables us to (i) generate large, fully annotated datasets spanning the biologically relevant parameter space, (ii) expose the solver to diverse bursting regimes (e.g., low/high burst frequency, small/large burst size, unimodal/bimodal distributions), and (iii) quantitatively benchmark model accuracy, parameter identifiability, and robustness prior to deployment on real scRNA-seq data.

      We acknowledge, however, that simulation-based training has inherent limitations in terms of generalizability. Real biological systems may deviate from the idealized bursting model, exhibit more complex noise structures, or display parameter distributions that differ from those in simulations. Moreover, the lack of ground-truth parameters in experimental scRNA-seq datasets prevents an absolute evaluation of inference accuracy. In the future work, publicly available annotated scRNA-seq datasets could be used to complement this simulation-based training strategy and enhance generalizability. We have revised the manuscript to explicitly discuss both the rationale for using simulated data and the potential limitations of this approach.

      (4) Benchmarking against external methods

      The performance of DeepTX is primarily compared to a prior method from the same group. To strengthen the methodological claims, it would be preferable to include benchmarking against additional established tools from the broader literature. This would offer a more objective evaluation of the performance gains attributed to DeepTX.

      We thank you for this constructive suggestion. We fully agree that benchmarking DeepTX against additional established tools from the broader literatures would provide a more comprehensive and objective evaluation of DeepTX . In the revised manuscript, we have included comparative analyses with other widely used methods, including nnRNA (From Shahrezaei group (Tang W et al., 2023)), txABC (from our group (Luo S et al., 2023)), txBurst (from Sandberg group (Larsson AJM et al., 2019)), txInfer (from Junhao group (Gu J et al., 2025)) (Supplementary Figure S4). The comparative results indicate that our method demonstrates superior performance in both efficiency and accuracy.

      (5) Interpretation of Figures 4-6

      The revised figures are clear and informative; however, the associated interpretations in the main text remain too strong relative to the type of analysis performed. For instance, in Figure 4, it is suggested that changes in burst size are linked to DNA damage-induced signalling cascades that affect cell cycle progression and fate decisions. While this is a plausible hypothesis, GO and GSEA analyses are correlative by nature and not sufficient to support such a mechanistic claim on their own. These analyses should be presented as exploratory, and the strength of the conclusions drawn should be tempered accordingly. Similar caution should be applied to the interpretations of Figures 5 and 6.

      We thank you for this important comment. In the revised manuscript, we have carefully moderated the interpretation of the GO and GSEA results in Figures 4, 5, and 6. Specifically, we now present these analyses as exploratory and emphasize their correlative nature, avoiding causal claims that go beyond the scope of the data. The text has been rephrased to highlight the observed associations rather than implying direct causal relationships.

      For Figure 4, we emphasize that while it is tempting to hypothesize that enhanced burst size may contribute to DNA damage-related checkpoint activation and thereby influence cell cycle progression and differentiation, our current results only indicate an association between burst size enhancement and pathways involved in DNA damage response and checkpoint signaling.

      For Figure 5, we emphasize that although our GO analysis cannot establish causality, the results are consistent with an association between 5-FU-induced changes in burst kinetics and pathways related to oxidative stress and apoptosis. Based on this, we propose a model outlining a potential process through which DNA damage may ultimately lead to cellular apoptosis.

      For Figure 6, we emphasize that these enrichment results suggest that high-dose 5FU treatment may be associated with processes such as telomerase activation and mitochondrial function maintenance, both of which have been implicated in cell survival and apoptosis evasion in previous experimental studies. For example, prior work indicates that hTERT translocation can activate telomerase pathways to support telomere maintenance and reduce oxidative stress, which is thought to contribute to apoptosis resistance. While our enrichment analysis cannot establish causality, the observed transcriptional bursting changes are consistent with these reported survival-associated mechanisms.

      (6) Discussion section framing

      The initial paragraphs of the discussion section make broad biological claims about the role of transcriptional bursting in cellular decision-making. While transcriptional bursting is undoubtedly relevant, the manuscript would benefit from a more cautious framing. It would be more appropriate to foreground the methodological contributions of DeepTX, and to present the biological insights as hypotheses or observations that may guide future experimental investigation, rather than as established conclusions.

      We thank you for this insightful comment. We have revised the discussion to clarify and appropriately temper our claims regarding transcriptional bursting. First, we now explicitly recognize that transcriptional bursting is one of multiple contributors to cellular variability, rather than the sole or dominant factor driving cellular decision-making. Second, we have restructured the opening of the discussion to prioritize the methodological contributions of DeepTX, highlighting its strength as a framework for inferring genomewide burst kinetics from scRNA-seq data. Finally, the biological insights derived from our analysis are now presented as correlative observations and potential hypotheses, which may inform and guide future experimental investigations, rather than as definitive mechanistic conclusions.

      Small Comments

      (1) Presentation of discrete distributions: In several figures (e.g., Figure 2B and Supplementary Figures S4, S6, and S8), the comparisons between empirical mRNA distributions and DeepTX-inferred distributions are visually represented using connecting lines, which may give the impression that continuous distributions are being compared to discrete ones. Given the focus on transcriptional bursting, a process inherently tied to discrete stochastic events, this representation could be misleading. The figure captions and visual style should be revised to clarify that all distributions are discrete and to avoid potential confusion. In general, it is recommended to avoid connecting points in discrete distributions with lines, as this can suggest interpolation or comparison with continuous distributions. This applies to Figures 2A and 2B in particular.

      We thank you for this valuable suggestion. To prevent any potential misinterpretation of discrete distributions as continuous ones, we have revised the visual representation of the empirical and DeepTXinferred mRNA distributions in Figures 2B, and Supplementary Figures S4, S6, and S8. Specifically, we have replaced the line plots with step plots, which more accurately capture the discrete nature of transcriptional bursting. Additionally, we have updated the figure captions to clearly state that all distributions are discrete.

      (2) Transcription is always a multi-step process. While the manuscript aims to model additional complexity introduced by DNA damage, the current phrasing (e.g., on page 5) could be read as implying that transcription becomes multi-step only under damage conditions. This should be clarified.

      We thank you for this helpful observation. We agree that transcription is inherently a multi-step process under all conditions. To avoid any possible misunderstanding, we have revised the text to clarify this point.

      Specifically, we now explain that many previous studies have employed simplified two-state models to approximate transcriptional dynamics, however, the gene expression process is inherently a multi-step process, which particularly cannot be neglected under conditions of DNA damage. DNA damage can result in slowing or even stopping the RNA pol II movement and cause many macromolecules to be recruited for damage repair. This process will affect the spatially localized behavior of the promoter, causing the dwell time of promoter inactivation and activation that cannot be approximated by a simple two state. Our work adopts a multi-step model because it is more appropriate for capturing the additional complexity introduced by DNA damage.

      (3) The first sentence of the discussion section overstates the importance of transcriptional bursting. While it is a key source of variability, it is not the only nor always the dominant one. Furthermore, its role in DNA damage response remains an emerging hypothesis rather than a general principle. The claims in this section should be moderated accordingly.

      We thank you for this valuable feedback. In the revised discussion, we have moderated the statements in the opening paragraph to better reflect the current understanding. Specifically, we now acknowledge that transcriptional bursting represents one of multiple sources of variability and is not always the dominant contributor. In addition, we have reframed the role of transcriptional bursting in DNA damage response as an emerging hypothesis, rather than a general principle. To further address this concern, we replaced conclusion-like statements with more cautious, hypothesis-oriented phrasing, presenting our observations as potential directions for future experimental validation.

      References

      Maizels, R.J. 2024. A dynamical perspective: moving towards mechanism in single-cell transcriptomics. Philos Trans R Soc Lond B Biol Sci 379: 20230049. DOI: https://dx.doi.org/10.1098/rstb.2023.0049, PMID: 38432314

      Sukys, A., Grima, R. 2025. Cell-cycle dependence of bursty gene expression: insights from fitting mechanistic models to single-cell RNA-seq data. Nucleic Acids Research 53. DOI: https://dx.doi.org/10.1093/nar/gkaf295, PMID: 40240003

      Garrido-Rodriguez, M., Lopez-Lopez, D., Ortuno, F.M., Peña-Chilet, M., Muñoz, E., Calzado, M.A., Dopazo, J. 2021. A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways. PLoS Computational Biology 17: e1008748. DOI: https://dx.doi.org/10.1371/journal.pcbi.1008748, PMID: 33571195

      Kim, E.M., Burke, D.J. 2008. DNA damage activates the SAC in an ATM/ATR-dependent manner, independently of the kinetochore. PLoS Genet 4: e1000015. DOI: https://dx.doi.org/10.1371/journal.pgen.1000015, PMID: 18454191

      Lawrence, K.S., Chau, T., Engebrecht, J. 2015. DNA damage response and spindle assembly checkpoint function throughout the cell cycle to ensure genomic integrity. PLoS Genet 11: e1005150.DOI: https://dx.doi.org/10.1371/journal.pgen.1005150, PMID: 25898113

      Mantel, C., Guo, Y., Lee, M.R., Kim, M.K., Han, M.K., Shibayama, H., Fukuda, S., Yoder, M.C., Pelus, L.M., Kim, K.S., Broxmeyer, H.E. 2007. Checkpoint-apoptosis uncoupling in human and mouse embryonic stem cells: a source of karyotpic instability. Blood 109: 4518-4527. DOI: https://dx.doi.org/10.1182/blood-2006-10-054247, PMID: 17289813

      Tang, W., Jørgensen, A.C.S., Marguerat, S., Thomas, P., Shahrezaei, V. 2023. Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics. Bioinformatics 39. DOI: https://dx.doi.org/10.1093/bioinformatics/btad395, PMID: 37354494

      Luo, S., Zhang, Z., Wang, Z., Yang, X., Chen, X., Zhou, T., Zhang, J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Royal Society Open Science 10: 221057. DOI: https://dx.doi.org/10.1098/rsos.221057, PMID: 37035293

      Larsson, A.J.M., Johnsson, P., Hagemann-Jensen, M., Hartmanis, L., Faridani, O.R., Reinius, B., Segerstolpe, A., Rivera, C.M., Ren, B., Sandberg, R. 2019. Genomic encoding of transcriptional burst kinetics. Nature 565: 251-254. DOI: https://dx.doi.org/10.1038/s41586-018-0836-1, PMID: 30602787

      Gu, J., Laszik, N., Miles, C.E., Allard, J., Downing, T.L., Read, E.L. 2025. Scalable inference and identifiability of kinetic parameters for transcriptional bursting from single cell data. Bioinformatics. DOI: https://dx.doi.org/10.1093/bioinformatics/btaf581, PMID: 41131798.

    1. eLife Assessment

      This study provides important insights into mural cell dynamics and vascular pathology using a zebrafish model of cerebral small vessel disease. The authors present convincing evidence that partial loss of foxf2 function results in progressive, cell-autonomous defects in pericytes accompanied by endothelial abnormalities across the lifespan. By leveraging advanced in vivo imaging and genetic approaches, the work establishes zebrafish as a powerful and relevant model for dissecting the cellular mechanisms underlying cerebral small vessel disease.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by Graff et al. investigates the function of foxf2 in zebrafish to understand the progression of cerebral small vessel disease. The authors use a partial loss of foxf2 (zebrafish possess two foxf2 genes, foxf2a and foxf2b, and the authors mainly analyze homozygous mutants in foxf2a) to investigate the role of foxf2 signaling in regulating pericyte biology. The find that the number of pericytes is reduced in foxf2a mutants and that the remaining pericytes display alterations in their morphologies. The authors further find that mutant animals can develop to adulthood but that in adult animals, both endothelial and pericyte morphologies are affected. They also show that mutant pericytes can partially repopulate the brain after genetic ablation.

      Strengths:

      The paper is well written and easy to follow. The authors now include pericyte marker gene analysis and solid quantifications of the observed phenotypes.

      Weaknesses:

      None left.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the developmental and lifelong consequences of reduced foxf2 dosage in zebrafish, a gene associated with human stroke risk and cerebral small vessel disease (CSVD). The authors show that a ~50% reduction in foxf2 function through homozygous loss of foxf2a leads to a significant decrease in brain pericyte number, along with striking abnormalities in pericyte morphology-including enlarged soma and extended processes-during larval stages. These defects are not corrected over time but instead persist and worsen with age, ultimately affecting the surrounding endothelium. The study also makes an important contribution by characterizing pericyte behavior in wild-type zebrafish using a clever pericyte-specific Brainbow approach, revealing novel interactions such as pericyte process overlap not previously reported in mammals.

      Strengths:

      This work provides mechanistic insight into how subtle, developmental changes in mural cell biology and coverage of the vasculature can drive long-term vascular pathology. The authors make strong use of zebrafish imaging tools, including longitudinal analysis in transgenic lines to follow pericyte number and morphology over larval development and then applied tissue clearing and whole brain imaging at 3 and 11 months to further dissect the longitudinal effects of foxf2a loss. The ability to track individual pericytes in vivo reveals cell-intrinsic defects and process degeneration with high spatiotemporal resolution. Their use of a pericyte-specific Zebrabow line also allows, for the first time, detailed visualization of pericyte-pericyte interactions in the developing brain, highlighting structural features and behaviors that challenge existing models based on mouse studies. Together, these findings make the zebrafish a valuable model for studying the cellular dynamics of CSVD.

      Weaknesses:

      I originally suggested quantifying pericyte coverage across brain regions to address potential lineage-specific effects due to the distinct developmental origins of forebrain (neural crest-derived) and hindbrain (mesoderm-derived) pericytes. However, I appreciate the authors' response referencing recent work from their lab (Ahuja, 2024), which demonstrates that both neural crest and mesoderm contribute to pericyte lineages in the midbrain and hindbrain. The convergence of these lineages into a shared transcriptional state by 30 hpf, as shown by their single-cell RNA-seq data, makes it unlikely that regional quantification would provide meaningful lineage-specific insight. I agree with the authors that lineage tracing experiments often suffer from low sample sizes, and their updated findings challenge earlier compartmental models of pericyte origin. I therefore appreciate their rationale for not pursuing regional quantification and consider this concern addressed. Furthermore, my other two points regarding quantification of foxf2 levels and overall vascular changes have been thoroughly addressed in the revised manuscript. These additions significantly strengthen the paper's conclusions and improve the overall rigor of the study.

    4. Reviewer #3 (Public review):

      Summary:

      The goal of the work by Graff, et al. is to model CSVD in the zebrafish using foxf2a mutants. The mutants show loss of cerebral pericyte coverage that persists through adulthood, but it seems foxf2a does not regulate the regenerative capacity of these cells. The findings are interesting and build on previous work from the group. Limitations of the work include little mechanistic insight into how foxf2a alters pericyte recruitment/differentiation/survival/proliferation in this context, and the overlap of these studies with previous work in fox2a/b double mutants. However, the data analysis is clean and compelling and the findings will contribute to the field.

      Comments on revisions:

      The authors have addressed all of my original concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents valuable findings that advance our understanding of mural cell dynamics and vascular pathology in a zebrafish model of cerebral small vessel disease. The authors provide compelling evidence that partial loss of foxf2 function leads to progressive, cell-intrinsic defects in pericytes and associated endothelial abnormalities across the lifespan, leveraging powerful in vivo imaging and genetic tools. The strength of evidence could be further improved by additional mechanistic insight and quantitative or lineage-tracing analyses to clarify how pericyte number and identity are affected in the mutant model.

      Thank you to the reviewers for insightful comments and for the time spent reviewing the manuscript. We have strengthened the data through responding to the comments.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Graff et al. investigates the function of foxf2 in zebrafish to understand the progression of cerebral small vessel disease. The authors use a partial loss of foxf2 (zebrafish possess two foxf2 genes, foxf2a and foxf2b, and the authors mainly analyze homozygous mutants in foxf2a) to investigate the role of foxf2 signaling in regulating pericyte biology. They find that the number of pericytes is reduced in foxf2a mutants and that the remaining pericytes display alterations in their morphologies. The authors further find that mutant animals can develop to adulthood, but that in adult animals, both endothelial and pericyte morphologies are affected. They also show that mutant pericytes can partially repopulate the brain after genetic ablation.

      (1) Weaknesses: The results are mainly descriptive, and it is not clear how they will advance the field at their current state, given that a publication on mice has already examined the loss of foxf2 phenotype on pericyte biology (Reyahi, 2015, Dev. Cell).

      The Reyahi paper was the earliest report of foxf2 mutant brain pericytes and remains illuminating. The work was very well technically executed. Our manuscript expands and at times, contradicts, their findings. We realized that we did not fully discuss this in our discussion, and this has now been updated. The biggest difference between the two studies is in the direction of change in pericytes after foxf2 knockout, a major finding in both papers. This is where it is important to understand the differences in methods. Reyahi et al., used a conditional knockout under Wnt1:Cre which will ablate pericytes derived from neural crest, but not those derived from mesoderm, nor will it affect foxf2 expression in endothelial cells. Our model is a full constitutive knockout of the gene in all brain pericytes and endothelial cells. For GOF, Reyahi used a transgenic model with a human FOXF2 BAC integrated into the mouse germline.

      Both studies are important. We do not know enough about human phenotypes in patients with strokeassociated human FOXF2 SNVs to know the direction of change in pericyte numbers. We showed that the SNVs reduce FOXF2 gene expression in vitro (Ryu, 2022). Here we demonstrate dosage sensitivity in fish (showing phenotypes when 1 of 4 foxf2a + foxf2b alleles are lost, Figure 1F), supporting that slight reductions of FOXF2 in humans could lead to severe brain vessel phenotypes. For this reason, our work is complementary to the previously published work and suggests that future studies should focus on understanding the role of dosage, cell autonomy, and human pericyte phenotypes with respect to FOXF2. While some experiments are parallel in mouse and fish, we go further to look at cell death and regeneration, and to understand the consequences on the whole brain vasculature.

      (2) Reyahi et al. showed that loss of foxf2 in mice leads to a marked downregulation of pdgfrb expression in perivascular cells. In contrast to expectation, perivascular cell numbers were higher in mutant animals, but these cells did not differentiate properly. The authors use a transgenic driver line expressing gal4 under the control of the pdgfrb promoter and observe a reduction in pericyte (pdgfrb-expressing) cells in foxf2a mutants. In light of the mouse data, this result might be due to a similar downregulation of pdgfrb expression in fish, which would lead to a downregulation of gal4 expression and hence reduced labelling of pericytes. The authors show a reduction of pdgfrb expression also in zebrafish in foxf2b mutants (Chauhan et al., The Lancet Neurology 2016).

      Reyahi detected more pericytes in the Wnt1:Cre mouse, while we detected fewer in the foxf2a (and foxf2a;foxf2b) mutants. This may be because of different methods. For instance, because the mouse knockout is not a constitutive Foxf2 knockout, the observed increase in pericytes may be because mesodermal-derived pericytes proliferate more highly when the neural crest-derived pericytes are absent. Or does endothelial foxf2 activate pericyte proliferation when foxf2 is lost in some pericytes? It is also possible that mouse foxf2 has a different role from its fish ortholog. Despite these differences, there are common conclusions from both models. For instance, both mouse and fish show foxf2 controls capillary pericyte numbers, albeit in different directions. Both show hemorrhage and loss of vascular stability as a result. Both papers identify the developmental window as critical for setting up the correct numbers of pericytes.  

      As the reviewer suggested, it was important to test whether pdgfrb is downregulated in fish as it is in mice. To do this, we measured expression of pdgfrb in foxf2 mutants using hybridization chain reaction (HCR) of pdgfrb in foxf2 mutants. The results show no change in pdgfrb mRNA in foxf2a mutants at two independent experiments (Fig S3). Independently, we integrated pdgfrb transgene intensity (using a single allele of the transgene so there are no dose effects) in foxf2a mutants vs. wildtype. We found no difference (Fig S3) suggesting that pdgfrb is a reliable reporter for counting pericytes in the foxf2a knockout. The reviewer is correct that we previously showed downregulation of pdgfrb in foxf2b mutants at 4 dpf using colorimetric ISH. foxf2a and foxf2b are unlinked, independent genes (~400 M years apart in evolution) and may have different regulation.

      (3) It would be important to clarify whether, also in zebrafish, foxf2a/foxf2b mutants have reduced or augmented numbers of perivascular cells and how this compares to the data in the mouse.  

      We discuss methodological differences between Reyahi and our work in point (1) above. The reduction in pericytes in foxf2a;foxf2b mutants has been previously published (Ryu, 2022, Supplemental Figure 1) and shown again here in Supplemental Figure 2). Numbers are reduced in double mutants up to 10 dpf, suggesting no recovery. Further, in response to reviewer comments, we have quantified pericytes in the whole fish brain (Figure 3E-G) and show reduced pericytes in the adult, reduced vessel network length, and importantly that the pericyte density is reduced. In aggregate, our data shows pericyte reduction at 5 developmental stages from embryo through adult. The reason for different results from the mouse is unknown and may reflect a technical difference (constitutive vs Wnt1:Cre) or a species difference.  

      (4) The authors should perform additional characterization of perivascular cells using marker gene expression (for a list of markers, see e.g., Shih et al. Development 2021) and/or genetic lineage tracing.

      This is a good point. We have added HCR analysis of additional markers. Results show co-expression of foxf2a, foxf2b, nduf4la2 and pdgfrb in brain pericytes (Fig 2, Fig S3).

      (5) The authors motivate using foxf2a mutants as a model of reduced foxf2 dosage, "similar to human heterozygous loss of FOXF2". However, it is not clear how the different foxf2 genes in zebrafish interact with each other transcriptionally. Is there upregulation of foxf2b in foxf2a mutants and vice versa? This is important to consider, as Reyahi et al. showed that foxf2 gene dosage in mice appears to be important, with an increase in foxf2 gene dosage (through transgene expression) leading to a reduction in perivascular cell numbers.

      We agree that dosage is a very important concept and show phenotypes in foxf2a heterozygotes (Fig 1F). To test the potential compensation from foxf2b, we have added qPCR for foxf2b in foxf2a mutants as well as HCR of foxf2b in foxf2a mutants (Fig S3C,D). There is no change in foxf2b expression in foxf2a mutants. We discuss dosage in our discussion.

      (6) Figures 3 and 4 lack data quantification. The authors describe the existence of vascular defects in adult fish, but no quantifiable parameters or quantifications are provided. This needs to be added.

      This query was technically challenging to address, but very worthwhile. We have not seen published methods for quantifying brain pericytes along with the vascular network (certainly not in zebrafish adults), so we developed new methods of analyzing whole brain vascular parameters of cleared adult brains (Figure S6) using a combination of segmentation methods for pericytes, endothelium and smooth muscle. We have added another author (David Elliott) as he was instrumental in designing methods. We find a significant decrease in vessel network length in foxf2a mutants at 3 month and 6 months (Figures 3F and 4G). Similarly, we show a lower number of brain pericytes in foxf2a mutants (Figure 3E). Finally, we added whole brain analysis of smooth muscle coverage (Figure 4) and show no change in vSMC number or coverage of vessels at 5 and 10 dpf or adult, respectively, pointing to pericytes being the cells most affected. Thank you, this query pushed us in a very productive direction. These methods will be extremely useful in the future!

      (7) The analysis of pericyte phenotypes and morphologies is not clear. On page 6, the authors state: "In the wildtype brain, adult pericytes have a clear oblong cell body with long, slender primary processes that extend from the cytoplasm with secondary processes that wrap around the circumference of the blood vessel." Further down on the same page, the authors note: "In wildtype adult brains, we identified three subtypes of pericytes, ensheathing, mesh and thin-strand, previously characterized in murine models." In conclusion, not all pericytes have long, slender primary processes, but there are at least three different sub-types? Did the authors analyze how they might be distributed along different branch orders of the vasculature, as they are in the mouse?

      We have reworded the text on page 5/6 to be clearer that embryonic pericytes are thin strand only. Additional pericyte subtypes develop later are seen in the mature vasculature of the adult. We could not find a way to accurately analyze pericyte subtypes in the adult brain. The imaging analysis to count pericytes used soma as machine learning algorithms have been developed to count nuclei but not analyze processes.

      (8) Which type of pericyte is affected in foxf2a mutant animals? Can the authors identify the branch order of the vasculature for both wildtype and mutant animals and compare which subtype of pericyte might be most affected? Are all subtypes of pericytes similarly affected in mutant animals? There also seems to be a reduction in smooth muscle cell coverage.

      Please see the response to (7) about pericyte subtypes. In response to the reviewer’s query, we have now analyzed vSMCs in the embryonic and adult brain. In the embryonic brain we see no statistical differences in vSMC number at 5 and 10 dpf (Figure 4). In the adult, vSMC length (total length of vSMCs in a brain) and vSMC coverage (proportion of brain vessels with vSMCs) are not significantly different. This data is important because it suggests that foxf2a has a more important role in pericytes than in vSMCs.

      (9) Regarding pericyte regeneration data (Figure 7): Are the values in Figure 7D not significantly different from each other (no significance given)?

      Any graphs missing bars have no significance and were left off for clarity. We have stated this in the statistical methods.  

      (10) In the discussion, the authors state that "pericyte processes have not been studied in zebrafish".

      Ando et al. (Development 2016) studied pericyte processes in early zebrafish embryos, and Leonard et al. (Development 2022) studied zebrafish pericytes and their processes in the developing fin. We apologize, this was not meant to say that pericyte processes had not been studied before, we have reworded this to make clear the intent of the sentence. We were trying to emphasize that we are the first to quantify processes at different stages, especially  in foxf2 mutants. Processes change morphology over development, especially after 5 dpf, something that our data captures. Our images are of stages that have not been previously characterized. We added a reference to Mae et al., who found similar process length changes in a mouse knockout of a different gene, and to Leonard who previously showed overlap of processes in a different context in fish.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the developmental and lifelong consequences of reduced foxf2 dosage in zebrafish, a gene associated with human stroke risk and cerebral small vessel disease (CSVD). The authors show that a ~50% reduction in foxf2 function through homozygous loss of foxf2a leads to a significant decrease in brain pericyte number, along with striking abnormalities in pericyte morphologyincluding enlarged soma and extended processes-during larval stages. These defects are not corrected over time but instead persist and worsen with age, ultimately affecting the surrounding endothelium. The study also makes an important contribution by characterizing pericyte behavior in wild-type zebrafish using a clever pericyte-specific Brainbow approach, revealing novel interactions such as pericyte process overlap not previously reported in mammals.

      Strengths:

      This work provides mechanistic insight into how subtle, developmental changes in mural cell biology and coverage of the vasculature can drive long-term vascular pathology. The authors make strong use of zebrafish imaging tools, including longitudinal analysis in transgenic lines to follow pericyte number and morphology over larval development, and then applied tissue clearing and whole brain imaging at 3 and 11 months to further dissect the longitudinal effects of foxf2a loss. The ability to track individual pericytes in vivo reveals cell-intrinsic defects and process degeneration with high spatiotemporal resolution. Their use of a pericyte-specific Zebrabow line also allows, for the first time, detailed visualization of pericytepericyte interactions in the developing brain, highlighting structural features and behaviors that challenge existing models based on mouse studies. Together, these findings make the zebrafish a valuable model for studying the cellular dynamics of CSVD.

      Weaknesses:

      (11) While the findings are compelling, several aspects could be strengthened. First, quantifying pericyte coverage across distinct brain regions (forebrain, midbrain, hindbrain) would clarify whether foxf2a loss differentially impacts specific pericyte lineages, given known regional differences in developmental origin, with forebrain pericytes being neural crest-derived and hindbrain pericytes being mesoderm-derived.

      In recently published work from our lab, we published that both neural crest and mesodermal cells contribute to pericytes in both the mid and hindbrain, and could not confirm earlier work suggesting more rigid compartmental origins (Ahuja, 2024). In the Ahuja, 2024 paper we noted that lineage experiments are often limited by n’s which is why this may not have been discovered before. This makes us skeptical that counting different regions will allow us to interpret data about neural crest and mesoderm. Further, Ahuja 2024 shows that pericyte intermediate progenitors from both mesoderm and neural crest are indistinguishable at 30 hpf through single cell sequencing and have converged on a common phenotype.  

      (12) Second, measuring foxf2b expression in foxf2a mutants would better support the interpretation that total FOXF2 dosage is reduced in a graded fashion in heterozygote and homozygote foxf2a mutants.

      We have done both qPCR for foxf2b in foxf2a mutants and HCR (quantitative ISH). This is now reported in Fig S3. 

      (13) Finally, quantifying vascular density in adult mutants would help determine whether observed endothelial changes are a downstream consequence of prolonged pericyte loss. Correlating these vascular changes with local pericyte depletion would also help clarify causality.

      We have added this data to Figure 3 and 4. Please also see response (6).

      Reviewer #3 (Public review):

      Summary:

      The goal of the work by Graff et al. is to model CSVD in the zebrafish using foxf2a mutants. The mutants show loss of cerebral pericyte coverage that persists through adulthood, but it seems foxf2a does not regulate the regenerative capacity of these cells. The findings are interesting and build on previous work from the group. Limitations of the work include little mechanistic insight into how foxf2a alters pericyte recruitment/differentiation/survival/proliferation in this context, and the overlap of these studies with previous work in fox2a/b double mutants. However, the data analysis is clean and compelling, and the findings will contribute to the field.

      (14) Please make Figures 5C and 5E red-green colorblind friendly.

      Thank you. We have changed the colors to light blue and yellow to be colorblind friendly.

      Reviewer #3 (Recommendations for the authors):

      (15) I'm not sure this reviewer totally agrees with the assessment that foxf2a loss of function, while foxf2b remains normal, is the same as FOXF2 heterozygous loss of function in humans. The discussion of the gene dosage needs to be better framed, and the authors should carry out qPCR to show that foxf2b levels are not altered in the foxf2a mutant background.

      We have added data on foxf2b expression in foxf2a mutants to Fig S3. We have updated the results.

      (16) Figure 4/SF7- is the aneurysm phenotype derived from the ECs or pericytes? Cell-type-specific rescues would be interesting to determine if phenotypes are rescued, especially the developmental phenotypes (it is appreciated that carrying out rescue experiments until adulthood is complex). When is the earliest time point that aneurysm-like structures are seen?

      This is a fascinating question, especially as we show that endothelial cells (vessel network length) are affected in the adult mutants. The foxf2a mutants that we work with here are constitutive knockouts. While a strategy to rescue foxf2a in specific lineages is being developed in the laboratory this will require a multi-generation breeding effort to get drivers, transgenes and mutants on the same background, and these fish are not currently available. Thank you for this comment- it is something we want to follow up on.

      (17) Figure 5 - This is very nice analysis.

      Thank you! We think it is informative too.

      (18) Figure 6 - needs to contain control images

      We have added wildtype images to figure 6A.

      (19) Figure 7- vessel images should be shown to demonstrate the specificity of NTR treatment to the pericytes.

      We have added the vessel images to Figure 7. We apologize for the omission.

    1. eLife Assessment

      This valuable study uses fiber photometry, implantable lenses, and optogenetics, to show that a subset of subthalamic nucleus neurons are active during movement, and that active but not passive avoidance depends in part on STN projections to substantia nigra. The strength of the evidence for these claims is solid and this paper may be of interest to basic and applied behavioural neuroscientists working on movement or avoidance.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a robust set of experiments that provide new insights into the role of STN neurons during active and passive avoidance tasks. These forms of avoidance have received comparatively less attention in the literature than the more extensively studied escape or freezing responses, despite being extremely relevant to human behaviour and more strongly influenced by cognitive control.

      Strengths:

      Understanding the neural infrastructure supporting avoidance behaviour would be a fundamental milestone in neuroscience. The authors employ sophisticated methods to delineate the role of STN neurons during avoidance behaviours. The work is thorough and the evidence presented is compelling. Experiments are carefully constructed, well-controlled, and the statistical analyses are appropriate.

    3. Reviewer #2 (Public review):

      Summary:

      Zhou, Sajid et al. present a study investigating the STN involvement in signaled movement. They use fiber photometry, implantable lenses, and optogenetics during active avoidance experiments to evaluate this. The data are useful for the scientific community and the overall evidence for their claims is solid, but many aspects of the findings are confusing. The authors present a huge collection of data, it is somewhat difficult to extract the key information and the meaningful implications resulting from these data.

      Strengths:

      The study is comprehensive in using many techniques and many stimulation powers and frequencies and configurations.

    4. Reviewer #3 (Public review):

      Summary:

      The authors use calcium recordings from STN to measure STN activity during spontaneous movement and in a multi-stage avoidance paradigm. They also use optogenetic inhibition and lesion approaches to test the role of STN during the avoidance paradigm. The paper reports a large amount of data and makes many claims, some seem well supported to this Reviewer, others not so much.

      Strengths:

      Well-supported claims include data showing that during spontaneous movements, especially contraversive ones, STN calcium activity is increased using bulk photometry measurements. Single-cell measures back this claim but also show that it is only a minority of STN cells that respond strongly, with most showing no response during movement, and a similar number showing smaller inhibitions during movement.

      Photometry data during cued active avoidance procedures show that STN calcium activity sharply increases in response to auditory cues, and during cued movements to avoid a footshock. Optogenetic and lesion experiments are consistent with an important role for STN in generating cue-evoked avoidance. And a strength of these results is that multiple approaches were used.

      [Editors' note: The authors provided a good explanation regarding the difference between interpreting 'caution' in the healthy vs impaired situation, and this addressed one of the remaining major concerns from the last round of review.]

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      One possible remaining conceptual concern that might require future work is determining whether STN primarily mediates higher-level cognitive avoidance or if its activation primarily modulates motor tone.

      Our results using viral and electrolytic lesions (Fig. 11) and optogenetic inhibition of STN neurons (Fig. 10) show that signaled active avoidance is virtually abolished, and this effect is reproduced when we selectively inhibit STN fibers in the midbrain (Fig. 12). Inhibition of STN projections in either the substantia nigra pars reticulata (SNr) or the midbrain reticular tegmentum (mRt) eliminates cued avoidance responses while leaving escape responses intact. Importantly, mice continue to escape during US presentation after lesions or during photoinhibition, demonstrating that basic motor capabilities and the ability to generate rapid defensive actions are preserved.

      These findings argue against the idea that STN’s role in avoidance reflects a nonspecific suppression or facilitation of motor tone, even if the STN also contributes to general movement control. Instead, they show that STN output is required for generating “cognitively” guided cued actions that depend on interpreting sensory information and applying learned contingencies to decide when to act. Thus, while STN activity can modulate movement parameters, the loss-of-function results point to a more selective role in supporting cued, goal-directed avoidance behavior rather than a general adjustment of motor tone.

      Reviewer #2 (Public review):

      All previous weaknesses have been addressed. The authors should explain how inhibition of the STN impairing active avoidance is consistent with the STN encoding cautious action. If 'caution' is related to avoid latency, why does STN lesion or inhibition increase avoid latency, and therefore increase caution? Wouldn't the opposite be more consistent with the statement that the STN 'encodes cautious action'?

      The reviewer’s interpretation treats any increase in avoidance latency as evidence of “more caution,” but this holds only when animals are performing the avoidance behavior normally. In our intact animals, avoidance rates remain high across AA1 → AA2 → AA3, and the active avoidance trials (CS1) used to measure latency are identical across tasks (e.g., in AA2 the only change is that intertrial crossings are punished). Under these conditions, changes in latency genuinely reflect adjustments in caution, because the behavior itself is intact, actions remain tightly coupled to the cue, and the trials are identical.

      This logic does not apply when STN function is disrupted. STN inhibition or lesions reduce avoidance to near chance levels; the few crossings that do occur are poorly aligned to the CS and many likely reflect random movement rather than a cued avoidance response. Once performance collapses, latency can no longer be assumed to reflect the same cognitive process. Thus, interpreting longer latencies during STN inactivation as “more caution” would be erroneous, and we never make that claim.

      A simple analogy may help clarify this distinction. Consider a pedestrian deciding when to cross the street after a green light. If the road is deserted (like AA1), the person may step off the curb quickly. If the road is busy with many cars that could cause harm (like AA2), they may wait longer to ensure that all cars have stopped. This extra hesitation reflects caution, not an inability to cross. However, if the pedestrian is impaired (e.g., cannot clearly see the light, struggles to coordinate movements, or cannot reliably make decisions), a delayed crossing would not indicate greater caution—it would reflect a breakdown in the ability to perform the behavior itself. The same principle applies to our data: we interpret latency as “caution” only when animals are performing the active avoidance behavior normally, success rates remain high, and the trial rules are identical. Under STN inhibition or lesion, when active avoidance collapses, the latency of the few crossings that still occur can no longer be interpreted as reflecting caution. We have added these points to the Discussion.

      Reviewer #3 (Public review):

      Original Weaknesses:

      I found the experimental design and presentation convoluted and some of the results over-interpreted.

      We appreciate the reviewer’s comment, but the concern as stated is too general for us to address in a concrete way. The revised manuscript has been substantially reorganized, with simplified terminology, streamlined figures, and removal of an entire set of experiments to avoid over-interpretation. We are confident that the experimental design and results are now presented clearly and without extrapolation beyond the data. If there are specific points the reviewer finds convoluted or over-interpreted, we would be happy to address them directly.

      As presented, I don't understand this idea that delayed movement is necessarily indicative of cautious movements. Is the distribution of responses multi-modal in a way that might support this idea; or do the authors simply take a normal distribution and assert that the slower responses represent 'caution'? Even if responses are multi-modal and clearly distinguished by 'type', why should readers think this that delayed responses imply cautious responding instead of say: habituation or sensitization to cue/shock, variability in attention, motivation, or stress; or merely uncertainty which seems plausible given what I understand of the task design where the same mice are repeatedly tested in changing conditions. This relates to a major claim (i.e., in the title).

      We appreciate the reviewer’s question and address each component directly.

      (1) What we mean by “caution” and how it is operationalized

      In our study, caution is defined operationally as a systematic increase in avoidance latency when the behavioral demand becomes higher, while the trial structure and required response remain unchanged. Specifically, CS1 trials are identical in AA1, AA2, and AA3. Thus, when mice take longer to initiate the same action under more demanding contexts, the added time reflects additional evaluation before acting—consistent with longestablished interpretations of latency shifts in cognitive psychology (see papers by Donders, Sternberg, Posner) and interpretations of deliberation time in speed-accuracy tradeoff literature.

      (2) Why this interpretation does not rely on multi-modal response distributions We do not claim that “cautious” responses form a separate mode in the latency distribution. The distributions are unimodal, and caution is inferred from conditiondependent shifts in these distributions across identical trials, not from the existence of multiple peaks (see Zhou et al, 2022). Latency shifts across conditions with identical trial structure are widely used as behavioral indices of deliberation or caution.

      (3) Why alternative explanations (habituation/sensitization, motivation, attention, stress, uncertainty) do not account for these latency changes

      Importantly, nothing changes in CS1 trials between AA1 and AA2 with respect to the cue, shock, or required response. Therefore:

      - Habituation/sensitization to the cue or shock cannot explain the latency shift (the stimuli and trial type are unchanged). We have previously examined cue-evoked orienting responses and their habituation in detail (Zhou et al., 2023), and those measurements are dissociable from the latency effects described here.

      - Motivation or attention are unlikely to change selectively for identical CS1 trials when the task manipulation only adds a contingency to intertrial crossings.

      - Uncertainty also does not increase for CS1 trials, they remain fully predictable and unchanged between conditions.

      - Stress is too broad a construct to be meaningful unless clearly operationalized; moreover, any stress differences that arise from task structure would covary with caution rather than replace the interpretation.

      (4) Clarifying “types” of responses

      The reviewer’s question about “response types” appears to conflate behavioral latencies with the neuronal response “types” defined in the manuscript. The term “type” in this paper refers to neuronal activation derived from movement-based clustering, not to distinct behavioral categories of avoidance, which we term modes.

      In sum, we interpret increased CS1 latency as “caution” only when performance remains intact and trial structure is identical between conditions; under those criteria, latency reliably reflects additional cognitive evaluation before acting, rather than nonspecific changes in sensory processing, motivation, etc.

      Related to the last, I'm struggling to understand the rationale for dividing cells into 'types' based their physiological responses in some experiments.

      There is longstanding precedent in systems neuroscience for classifying neurons by their physiological response patterns, because neurons that respond similarly often play similar functional roles. For example, place cells, grid cells, direction cells, in vivo, and regular spiking, burst firing, and tonic firing in vitro are all defined by characteristic activity patterns in response to stimuli rather than anatomy or genetics alone. In the same spirit, our classifications simply reflect clusters of neurons that exhibit similar ΔF/F dynamics around behaviorally relevant events, such as movement sensitivity or avoidance modes. This is a standard analytic approach used in many studies. Thus, our rationale is not arbitrary: the “classes” and “types” arise from data-driven clustering of physiological responses, consistent with widespread practice, and they help reveal functional distinctions within the STN that would otherwise remain obscured.

      In several figures the number of subjects used was not described. This is necessary. Also necessary is some assessment of the variability across subjects.

      All the results described include the number of animals. To eliminate uncertainty, we now also include this information in figure legends.

      The only measure of error shown in many figures relates trial-to-trial or event variability, which is minimal because in many cases it appears that hundreds of trials may have been averaged per animal, but this doesn't provide a strong view of biological variability (i.e., are results consistent across animals?).

      The concern appears to stem from a misunderstanding of what the mixed-effects models quantify. The figure panels often show session-averaged traces for clarity, all statistical inferences in the paper are made at the level of animals, not trials. Mixed-effects modeling is explicitly designed for hierarchical datasets such as ours, where many trials are nested within sessions, which are themselves nested within animals.

      In our models, animal is the clustering (random) factor, and sessions are nested within animals, so variability across animals is directly estimated and used to compute the population-level effects. This approach is not only appropriate but is the most stringent and widely recommended method for analyzing behavioral and neural data with repeated measures. In other words, the significance tests and confidence intervals already fully incorporate biological variability across animals.

      Thus, although hundreds of trials per animal may be illustrated for visualization, the inferences reflect between-animal consistency, not within-animal trial repetition. The fact that the mixed-effects results are robust across animals supports the biological reliability of the findings.

      It is not clear if or how spread of expression outside of target STN was evaluated, and if or how or how many mice were excluded due to spread or fiber placements. Inadequate histological validation is presented and neighboring regions that would be difficult to completely avoid, such as paraSTN may be contributing to some of the effects.

      The STN is a compact structure with clear anatomical boundaries, and our injections were rigorously validated to ensure targeting specificity. As detailed in the Methods, every mouse underwent histological verification, and injections were quantified using the Brain Atlas Analyzer app (available on OriginLab), which we developed to align serial sections to the Allen Brain Atlas. This approach provides precise, slice-by-slice confirmation of viral spread. We have performed thousands of AAV injections and probe implants in our lab, incorporating over the years highly reliable stereotaxic procedures with multiple depth and angle checks and tools. For this study specifically, fewer than 10% of mice were excluded due to off-target expression or fiber/lesion placement. None of the included cases showed spread into adjacent structures.

      Regarding paraSTN: anatomically, paraSTN is a very small extension contiguous with STN. Our study did not attempt to dissociate subregions within STN, and the viral expression patterns we report fall within the accepted boundaries of STN. Importantly, none of our photometry probes or miniscope lenses sampled paraSTN, so contributions from that region are extremely unlikely to account for any of our neural activity results.

      Finally, our paper employs five independent loss-of-function approaches—optogenetic inhibition of STN neurons, selective inhibition of STN projections to the midbrain (in two sites: SNr and mRt), and STN lesions (electrolytic and viral). All methods converge on the same conclusion, providing strong evidence that the effects we report arise from manipulation of STN itself rather than from neighboring regions.

      Raw example traces are not provided.

      We do not think raw traces are useful here. All figures contain average traces to reflect the average activity of the estimated populations, which are already clustered per classes and types.

      The timeline of the spontaneous movement and avoidance sessions were not clear, nor the number of events or sessions per animal and how this was set. It is not clear if there was pre-training or habituation, if many or variable sessions were combined per animal, or what the time gaps between sessions was, or if or how any of these parameters might influence interpretation of the results.

      As noted, we have enhanced the description of the sessions, including the number of animals and sessions, which are daily and always equal per animals in each group of experiments. The sessions are part of the random effects in the model. In addition, we now include schematics to facilitate understanding of the procedures.  

      Comments on revised version:

      The authors removed the optogenetic stimulation experiments, but then also added a lot of new analyses. Overall the scope of their conclusions are essentially unchanged. Part of the eLife model is to leave it to the authors discretion how they choose to present their work. But my overall view of it is unchanged. There are elements that I found clear, well executed, and compelling. But other elements that I found difficult to understand and where I could not follow or concur with their conclusions.

      We respectfully disagree with the assertion that the scope of our conclusions remains unchanged. The revised manuscript differs in several fundamental ways:

      (1) Removal of all optogenetic excitation experiments

      These experiments were a substantial portion of the original manuscript, and their removal eliminated an entire set of claims regarding the causal control of cautious responding by STN excitation. The revised manuscript no longer makes these claims.

      (2) Addition of analyses that directly address the reviewers’ central concerns The new analyses using mixed-effects modeling, window-specific covariates, and movement/baseline controls were added precisely because reviewers requested clearer dissociation of sensory, motor, and task-related contributions. These additions changed not only the presentation but the interpretation of the neural signals. We now conclude that STN encodes movement, caution, and aversive signals in separable ways—not that it exclusively or causally regulates caution.

      (3) Clear narrowing of conclusions

      Our current conclusions are more circumscribed and data-driven than in the original submission. For example, we removed all claims that STN activation “controls caution,” relying instead on loss-of-function data showing that STN is necessary for performing cued avoidance—not for generating cautious latency shifts. This is a substantial conceptual refinement resulting directly from the review process.

      (4) Reorganization to improve clarity

      Nearly every section has been restructured, including terminology (mode/type/class), figure organization, and explanations of behavioral windows. These revisions were implemented to ensure that readers can follow the logic of the analyses.

      We appreciate the reviewer’s recognition that several elements were clear and compelling. For the remaining points they found difficult to understand, we have addressed each one in detail in the response and revised the manuscript accordingly. If there are still aspects that remain unclear, we would welcome explicit identification of those points so that we can clarify them further.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Show individual data points on bar plots

      - partially addressed. Individual data points are still not shown.

      Wherever feasible, we display individual data points (e.g., Figures 1 and 2) to convey variability directly. However, in cases where figures depict hundreds of paired (repeatedmeasures) data points, showing all points without connecting them would not be appropriate, while linking them would make the figures visually cluttered and uninterpretable. All plots and traces include measures of variability (SEM), and the raw data will be shared on Dryad. When error bars are not visible, they are smaller than the trace thickness or bar line—for example, in Figure 5B, the black circles and orange triangles include error bars, but they are smaller than the symbol size.

      Also, to minimize visual clutter, only a subset of relevant comparisons is highlighted with asterisks, whereas all relevant statistical results, comparisons, and mouse/session numbers are fully reported in the Results section, with statistical analyses accounting for the clustering of data within subjects and sessions.

      (2) The active avoidance experiments are confusing when they are introduced in the results section. More explanation of what paradigms were used and what each CS means at the time these are introduced would add clarity. For example AA1, AA2 etc are explained only with references to other papers, but a brief description of each protocol and a schematic figure would really help.

      - partially addressed. A schematic figure showing the timeline would still be helpful.

      As suggested, we have added an additional panel to Fig. 5A with a schematic describing

      AA1-3 tasks. In addition, the avoidance protocols are described briefly but clearly in the Results section (second paragraph of “STN neurons activate during goal-directed avoidance contingencies”) and in greater detail in the Methods section. As stated, these tasks were conducted sequentially, and mice underwent the same number of sessions per procedure, which are indicated. All relevant procedural information has been included in these sections. Mice underwent daily sessions and learnt these tasks within 1-2 sessions, progressing sequentially across tasks with an equal number of sessions per task (7 per task), and the resulting data were combined and clustered by mouse/session in the statistical models.

      (3) How do the Class 1, 2, 3 avoids relate to Class 1 , 2, 3 neural types established in Figure 3? It seems like they are not related, and if that is the case they should be named something different from each other to avoid confusion.

      -not sufficiently addressed. The new naming system of neural 'classes' and 'types' helps with understanding that these are completely different ways of separating subpopulations within the STN. However, it is still unclear why the authors re-type the neurons based on their relation to avoids, when they classify the neurons based on their relationship to speed earlier. And it is unclear whether these neural classes and neural types have anything to do with each other. Are the neural Types related to the neural classes in any way? and what is the overlap between neural types vs classes? Which separation method is more useful for functionally defining STN populations?

      The remaining confusion stems from treating several independent analyses as if they were different versions of the same classification. In reality, each analysis asks a distinct question, and the resulting groupings are not expected to overlap or correspond. We clarify this explicitly below.

      - Movement onset neuron classes (Class A, B, C; Fig. 3):

      These classes categorize neurons based on how their ΔF/F changes around spontaneous movement onset. This analysis identifies which neurons encode the initiation and direction of movement. For instance, Class B neurons (15.9%) were inhibited as movement slowed before onset but did not show sharp activation at onset, whereas Class C neurons (27.6%) displayed a pronounced activation time-locked to movement initiation. Directional analyses revealed that Class C neurons discharged strongly during contraversive turns, while Class B neurons showed a weaker ipsiversive bias. Because neurons were defined per session and many of these recordings did not include avoidance-task sessions, these movement-onset classes were not used in the avoidance analyses.

      - Movement-sensitivity neuron classes (Class 1, 2, 3, 4; Fig. 7):

      These classes categorize neurons based on the cross-correlation between ΔF/F and head speed, capturing how each neuron’s activity scales with movement features across the entire recording session. This analysis identifies neurons that are strongly speed-modulated, weakly speed-modulated, or largely insensitive to movement. These movement-sensitivity classes were then carried forward into the avoidance analyses to ask how neurons with different kinematic relationships participate during task performance; for example, whether neurons that are insensitive to movement nonetheless show strong activation during avoidance actions.

      - Avoidance modes (Mode 1, 2, 3; Fig. 8)

      Here we classify actions, not neurons. K-means clustering is applied to the movementspeed time series during CS1 active avoidance trials only, which allows us to identify distinct action modes or variants—fast-onset versus delayed avoidance responses. This action-based classification ensures that we compare neural activity across identical movements, eliminating a major confound in studies that do not explicitly separate action variants. First, we examine how population activity differs across these avoidance modes, reflecting neural encoding of the distinct actions themselves. Second, within each mode, we then classify neurons into “types,” which simply describes how different neurons activate during that specific avoidance action (as noted next).

      - Neuron activation types within each mode (Type a, b, c; Fig.9)

      This analysis extends the mode-based approach by classifying neuronal activation patterns only within each specific avoidance mode. For each mode, we apply k-means clustering to the ΔF/F time series to identify three activation types—e.g., neurons showing little or no response, neurons showing moderate activation, and neurons showing strong or sharply timed activation. Because all trials within a mode have identical movement profiles, these activation types capture the variability of neural responses to the same avoidance behavior. Importantly, these activation “types” (a, b,

      c) are not global neuron categories. They do not correspond to, nor are they intended to map onto, the movement-based neuron classes defined earlier. Instead, they describe how neurons differ in their activation during a particular behavioral mode—that is, within a specific set of behaviorally matched trials. Because modes are defined at the trial level, the neurons contributing to each mode can differ: some neurons have trials belonging to one mode, others to two or all three. Thus, Type a/b/c groupings are not fixed properties of neurons. To prevent confusion, we refer to them explicitly as neuronal activation types, emphasizing that they characterize mode-specific response patterns rather than global cell identities.

      In conclusion, the categorizations serve entirely different analytical purposes and should not be interpreted as competing classifications. The mode-specific “types” do not reclassify or replace the movement-sensitivity classes; they capture how neurons differ within a single, well-defined avoidance action, while the movement classes reflect how neurons relate to movements in general. Each classification relates to different set of questions and overlap between them is not expected.

      To make this as clear as possible we added the following paragraph to the Results:  

      “To avoid confusion between analyses, it is important to note that the movement-sensitivity classes defined here (Class 1–4; Fig. 7) are conceptually distinct from both the movementonset classes (Class A–C; Fig. 3) and the neuronal activation “types” introduced later in the avoidance-mode analysis. The Class 1–4 grouping reflects how neurons relate to movement across the entire session, based on their cross-correlation with speed. The onset classes A–C capture neural activity specifically around spontaneous movement initiation during general exploration. In contrast, the later activation “types” are derived within each avoidance mode and describe how neurons differ in their activation patterns during identical CS1 avoidance responses. These classifications answer different questions about STN function and are not intended to correspond to one another.”

      (4) Similarly having 3 different cell types (a,b,c) in the active avoidance seems unrelated to the original classification of cell types (1,2,3), and these are different for each class of avoid. This is very confusing and it is unclear how any of these types relate to each other. Presumable the same mouse has all three classes of avoids, so there are recording from each cell during each type of avoid. So the authors could compare one cell during each avoid and determine whether it relates to movement or sound or something else. It is interesting that types a,b,c have the exact same proportions in each class of avoid, and really makes it important to investigate if these are the exact same cells or not. Also, these mice could be recorded during open field so the original neural classification (class 1, 2,3) could be applied to these same cells and then the authors can see whether each cell type defined in the open field has different response to the different avoid types. As it stands, the paper simply finds that during movement and during avoidance behaviors different cells in the STN do different things. - Similarly, the authors somewhat addressed the neural types issue, but figure 9 still has 9 different neural types and it is unclear whether the same cells that are type 'a' in mode 1 avoids are also type 'a' in mode 2 avoids, or do some switch to type b? Is there consistency between cell types across avoid modes? The authors show that type 'c' neurons are differentially elevated in mode 3 vs 2, but also describes neurons as type '2c' and statistically compare them to type '1c' neurons. Are these the same neurons? or are type 2c neurons different cells vs type 1c neurons? This is still unclear and requires clarification to be interpretable.

      We believe the remaining confusion arises from treating the different classification schemes as if they were alternative labels applied to the same neurons, when in fact they serve entirely separate analytical purposes and may not include the same neurons (see previous point). Because these classifications answer different questions, they are not expected to overlap, nor is overlap required for the interpretations we draw. It is therefore not appropriate to compare a neuron’s “type” in one avoidance mode to its movement class, or to ask whether types a/b/c across different modes are “the same cells,” since modes are defined by trial-level movement clustering rather than by neuron identity. Importantly, Types a/b/c are not intended as a new global classification of neurons; they simply summarize the variability of neuronal responses within each behaviorally matched mode. We agree that future studies could expand our findings, but that is beyond the already wide scope of the present paper. Our current analyses demonstrate a key conceptual point: when movement is held constant (via modes), STN neurons still show heterogeneous, outcome- and caution-related patterns, indicating encoding that cannot be reduced to movement alone.

      Relatedly, was the association with speed used to define each neural "class" done in the active avoidance context or in a separate (e.g. open field) experiment? This is not clear in the text.

      The cross-correlation classes were derived from the entire recording session, which included open-field and avoidance tasks recordings. The tasks include long intertrial periods with spontaneous movements. We found no difference in classes when we include only a portion of the session, such as the open field or if we exclude the avoidance interval where actions occur.

      Finally, in figure 7, why is there a separate avoid trace for each neural class? With the GRIN lens, the authors are presumably getting a sample of all cell types during each avoid, so why do the avoids differ depending on the cell type recorded?

      The entire STN population is not recorded within a single session; each session contributes only a subset of neurons to the dataset. Consequently, each neural class is composed of neurons drawn from partially non-overlapping sets of sessions, each with its own movement traces. For this reason, we plot avoidance traces separately for each neural class to maintain strict within-session correspondence between neural activity and the behavior collected in the same sessions. This prevents mixing behavioral data across sessions that did not contribute neurons to that class and ensures that all neural– behavioral comparisons remain appropriately matched. We have clarified this rationale in the revised manuscript. We note that averaging movement across classes—as is often done—would obscure these distinctions and would not preserve the necessary correspondence between neural activity and behavior. This is also clarified in Results.

      (5) The use of the same colors to mean two different things in figure 9 is confusing. AA1 vs AA2 shouldn't be the same colors as light-naïve vs light signaling CS.

      -addressed, but the authors still sometimes use the same colors to mean different things in adjacent figures (e.g. the red, blue, black colors in figure 1 and figure 2 mean totally different things) and use different colors within the same figure to represent the same thing (Figure 9AB vs Figure 9CD). This is suboptimal.

      Following the reviewer’s suggestion, in Figure 2, we changed the colors, so readers do not assume they are related to Fig. 1.

      In Figure 9, we changed the colors in C,D to match the colors in A,B.

      (6) The exact timeline of the optogenetics experiments should be presented as a schematic for understandability. It is not clear which conditions each mouse experienced in which order. This is critical to the interpretation of figure 9 and the reduction of passive avoids during STN stimulation. Did these mice have the CS1+STN stimulation pairing or the STN+US pairing prior to this experiment? If they did, the stimulation of the STN could be strongly associated with either punishment or with the CS1 that predicts punishment. If that is the case, stimulating the STN during CS2 could be like presenting CS1+CS2 at the same time and could be confusing. The authors should make it clear whether the mice were naïve during this passive avoid experiment or whether they had experienced STN stimulation paired with anything prior to this experiment.

      -addressed

      (7) Similarly, the duration of the STN stimulation should be made clear on the plots that show behavior over time (e.g. Figure 9E).

      -addressed

      (8) There is just so much data and so many conditions for each experiment here. The paper is dense and difficult to read. It would really benefit readability if the authors put only the key experiments and key figure panels in the main text and moved much of the repetative figure panels to supplemental figures. The addition of schematic drawings for behavioral experiment timing and for the different AA1, AA2, AA3 conditions would also really improve clarity.

      -partially addressed. The paper is still dense and difficult to read. No experimental schematics were added.

      As suggested, we now added the schematic to Fig. 5A.  

      New Comments:

      (9) Description of the animals used and institutional approval are missing from the methods.

      The information on animal strains and institutional approval is already included in the manuscript. The first paragraph of the Methods section states:

      “… All procedures were reviewed and approved by the institutional animal care and use committee and conducted in adult (>8 weeks) male and female mice. …”

      Additionally, the next subsection, “Strains and Adeno-Associated Viruses (AAVs),” fully specifies all mouse lines used. We therefore believe that the required descriptions of animals and institutional approval are already present and meet standard reporting.

    1. eLife Assessment

      The authors combine a modeling approach, using a digital twin, with electrophysiological evidence in two species to assess the role of inhibition in shaping selectivity in the visual cortex. The results provide an important advance beyond the classic view of sensory coding by proving compelling evidence that many neurons in visual areas exhibit dual-feature selectivity. Overall, the work exceptionally showcases how in silico experiments can generate concrete hypotheses about neuronal coding that are difficult to discover experimentally.

    2. Reviewer #1 (Public review):

      This manuscript used deep learning to highlight the role of inhibition in shaping selectivity in primary and higher visual cortex. The findings hint at hitherto unknown axes of structured inhibition operating in cortical networks with a potentially key role in object recognition.

      The multi-species approach of testing the model in macaque and mouse is excellent, as it improves the chances that the observed findings are a general property of mammalian visual cortex. However, it would be useful to delineate any notable differences between these species, which are to be expected given their lifestyle.

      The overall performance of the model appears to be excellent in V1, with over 80% performance, but it falls substantially in V4. It would be important to consider the implications of this finding; for example, in the context of studying temporal lobe structures that are central to recognizing objects. Would one expect that model performance decreases further here, and what measures could be taken to avoid this? Or is this type of model better restricted to V1 or even LGN?

      While the manuscript delineates novel axes of inhibitory interactions, it remains unclear what exactly these axes are and how they arise. What are the steps that need to be taken to make progress along these lines?

    3. Reviewer #2 (Public review):

      The classic view of sensory coding states that (excitatory) neurons are active to some preferred stimuli and otherwise silent. In contrast, inhibitory neurons are considered broadly tuned. Due to the gigantic potential image space, it is hard to comprehensively map the tuning of individual neurons. In this tour de force study, Franke et al. combine electrophysiological recordings in macaque (V1, V4) and mouse (V1, LM, LI) visual cortex with large-scale screens based on digital twin models, as well as beautiful systems identification (most/least activating stimuli). Based on these digital twins, they discover dual-feature selectivity (which they validate both in macaques and mice). Dual-feature selectivity involves a bidirectional modulation of firing rates around an elevated baseline. Neurons are excited by specific preferred features and systematically suppressed by distinct, non-preferred features. This tuning was identified by excellently combining advances in AI & high-throughput ephys.

      The study is comprehensive and convincing. Overall, this work showcases how in silico experiments can generate concrete hypotheses about neuronal coding that are difficult to discover experimentally, but that can be experimentally validated! I think this work is of substantial interest to the neuroscience community. I'm sure it will motivate many future experimental and computational studies. In particular, it will be of great interest to understand when and how the brain leverages dual-feature selectivity. The discussion of the article is already an interesting starting point for these considerations.

      Strengths:

      (1) Using computational models to predict neuronal responses allowed them to go through millions of images, which may not be possible in vivo.

      (2) The cross-species and cross-area consistency of the results is another major strength. Pointing out that the results may be a fundamental strategy of mammalian cortical processing.

      (3) They show that the feature causing peak excitation in one neuron often drives suppression in another. This may be an efficient coding scheme where the population covers the visual manifold. I'd like to understand better why the authors believe that this shows that there are low-dimensional subspaces based on preferred and non-preferred stimulus features (vs. many more, but some axes are stronger).

    4. Author response:

      We thank the reviewers for their constructive and helpful feedback on our manuscript. We are delighted that they found the study to be "comprehensive and convincing" and a "tour de force" in its combination of electrophysiological recordings with large-scale digital twin screening. We appreciate that the reviewers highlighted the strengths of our multi-species approach and the "cross-species and cross-area consistency" of the results, noting that the work showcases how in silico experiments can generate concrete, experimentally validatable hypotheses.

      The reviewers also raised several important points that we plan to address in the final version of the manuscript to improve clarity and interpretation. These center on:

      Model performance in V4: Reviewer #1 raised questions regarding the comparative drop in model performance in V4 and the implications for the validity of the results (including the use of "high confidence" neurons and a request for clarification on the number of animals in the V4 dataset).

      Species differences: Both reviewers noted the value of the macaque-mouse comparison but requested a more explicit delineation of the differences between these species given their distinct ethological niches.

      The nature of inhibitory dimensions: The reviewers asked for further details on how to identify these inhibitory dimensions and the specific relationship between excitation and inhibition. We believe unraveling these mechanisms represents an exciting direction for future work, and we will explicitly mention this in the Discussion section of the final manuscript, alongside a clearer contextualization with prior literature.

      Technical clarifications: Reviewer #2 requested clarifications on specific technical details, such as the skewness thresholds used for sparsity analysis.

      In the final version of the manuscript, we will address these points by adding necessary clarifications to the text—including confirming the animal cohort details—explicitly contrasting the mouse and macaque data to highlight coding differences, and expanding our discussion. We will also ensure all technical inquiries, such as those regarding skewness and reference citations, are fully resolved.

      We believe addressing these points will significantly strengthen the manuscript.

    1. eLife Assessment

      This paper represents a valuable contribution to our understanding of how LFP oscillations and beta band coordination between the hippocampus and prefrontal cortex of rats may relate to learning. Enthusiasm for the reported results was moderated by the concern that some key analyses need to be done, and highly relevant details about task, data, and statistics were missing. Consequently, the reviewers considered the evidence to be incomplete in this version of the manuscript.

    2. Reviewer #1 (Public review):

      Wang, Zhou et al. investigated coordination between the prefrontal cortex (PFC) and the hippocampus (Hp), during reward delivery, by analyzing beta oscillations. Beta oscillations are associated with various cognitive functions, but their role in coordinating brain networks during learning is still not thoroughly understood. The authors focused on the changes in power, peak frequencies, and coherence of beta oscillations in two regions when rats learn a spatial task over days. Inconsistent with the authors' hypothesis, beta oscillations in those two regions during reward delivery were not coupled in spectral or temporal aspects. They were, however, able to show reverse changes in beta oscillations in PFC and Hp as the animal's performance got better. The authors were also able to show a small subset of cell populations in PFC that are modulated by both beta oscillations in PFC and sharp wave ripples in Hp. A similarly modulated cell population was not observed in Hp. These results are valuable in pointing out distinct periods during a spatial task when two regions modulate their activity independently from each other.

      The authors included a detailed analysis of the data to support their conclusions. However, some clarifications would help their presentation, as well as help readers to have a clear understanding.

      (1) The crucial time point of the analysis is the goal entry. However, it needs a better explanation in the methods or in figures of what a goal entry in their behavioral task means.

      (2) Regarding Figure 2, the authors have mentioned in the methods that PFC tetrodes have targeted both hemispheres. It might be trivial, but a supplementary graph or a paragraph about differences or similarities between contralateral and ipsilateral tetrodes to Hp might help readers.

      (3) The authors have looked at changes in burst properties over days of training. For the coincidence of beta bursts between PFC and Hp, is there a change in the coincidence of bursts depending on the day or performance of the animal?

      (4) Regarding the changes in performance through days as well as variance of the beta burst frequency variance (Figures 3C and 4C); was there a change in the number of the beta bursts as animals learn the task, which might affect variance indirectly?

      (5) In the behavioral task, within a session, animals needed to alternate between two wells, but the central arm (1) was in the same location. Did the authors alternate the location of well number 1 between days to different arms? It is possible that having well number 1 in the same location through days might have an effect on beta bursts, as they would get more rewards in well number 1?

      (6) The animals did not increase their performance in the F maze as much as they increased it in the Y maze. It would be more helpful to see a comparison between mazes in Figure 5 in terms of beta burst timing. It seems like in Y maze, unrewarded trials have earlier beta bursts in Y maze compared to F maze. Also, is there a difference in beta burst frequencies of rewarded and unrewarded trials?

      (7) For individual cell analysis, the authors recorded from Hp and the behavioral task involved spatial learning. It would be helpful to readers if authors mention about place field properties of the cells they have recorded from. It is known that reward cells firing near reward locations have a higher rate to participate in a sharp wave ripple. Factoring in the place field properties of the cells into the analysis might give a clearer picture of the lack of modulation of HP cells by beta and sharp wave ripples.

    3. Reviewer #2 (Public review):

      (1) When presenting the power spectra for the representative example (Figure 1), it would be appropriate to display a broader frequency band-including delta, theta, and gamma (up to ~100 Hz), rather than only the beta band. What was the rat's locomotor state (e.g., running speed) after entering the reward location, during which the LFPs were recorded? If the rats stopped at the goal but still consumed the reward (i.e., exhibited very low running speed), theta rhythms might still occasionally occur, and sharp-wave ripples (SWRs) could be observed during rest. Do beta bursts also occur during navigation prior to goal entry? It would be beneficial to display these rhythmic activities continuously across both the navigation and goal entry phases. Additionally, given that the hippocampal theta rhythm is typically around 7-8 Hz, while a peak at approximately 15-16 Hz is visible in the power spectra in Figure 1C, the authors should clarify whether the 22 Hz beta activity represents a genuine oscillation rather than a harmonic of the theta rhythm.

      (2) The authors claim that beta activity is independent between CA1 and PFC, based on the low coherence between these regions. However, it is challenging to discern beta-specific coherence in CA1; instead, coherence appears elevated across a broader frequency band (Figure 2 and Figure 2-1D). An alternative explanation could be that the uncoupled beta between CA1 and PFC results from low local beta coherence within CA1 itself.

      (3) In Figure 2-1E-F, visual inspection of the box plots reveals minimal differences between PFC-Ind and PFC-Coin/CA1-Coin conditions, despite reported statistical significance. It may be necessary to verify whether the significance arises from a large sample size.

      (4) In Figure 3 and Figure 4, although differences in power and frequency appear to change significantly across days, these changes are not easily discernible by visual inspection. It is worth considering whether these variations are related to increased task familiarity over days, potentially accompanied by higher running speeds.

      (5) The stronger spiking modulation by local beta oscillations shown in Figure 6 could also be interpreted in the context of uncoupled beta between CA1 and PFC. In this analysis, only spikes occurring during beta bursts should be included, rather than all spikes within a trial. The authors should verify the dataset used and consider including a representative example illustrating beta modulation of single-unit spiking.

      (6) As observed in Figure 7D, CA1 beta bursts continue to occur even after 2.5 seconds following goal entry, when SWRs begin to emerge. Do these oscillations alternate over time, or do they coexist with some form of cross-frequency coupling?

    4. Reviewer #3 (Public review):

      Summary:

      This paper explored the role of beta rhythms in the context of spatial learning and mPFC-hippocampal dynamics. The authors characterized mPFC and hippocampal beta oscillations, examining how their coordination and their spectral profiles related to learning and prefrontal neuronal firing. Rats performed two tasks, a Y-maze and an F-maze, with the F-maze task being more cognitively demanding. Across learning, prefrontal beta oscillation power increased while beta frequency decreased. In contrast, hippocampal beta power and beta frequency decreased. This was particularly the case for the well-performed and well-learned Y-maze paradigm. The authors identified the timing of beta oscillations, revealing an interesting shift in beta burst timing relative to reward entry as learning progressed. They also discovered an interesting population of prefrontal neurons that were tuned to both prefrontal beta and hippocampal sharp-wave ripple events, revealing a spectrum of SWR-excited and SWR-inhibited neurons that were differentially phase locked to prefrontal beta rhythms.

      In sum, the authors set out to examine how beta rhythms and their coordination were related to learning and goal occupancy. The authors identified a set of learning and goal-related correlates at the level of LFP and spike-LFP interactions, but did not report on spike-behavioral correlates.

      Strengths:

      Pairing dual recordings of medial prefrontal cortex (mPFC) and CA1 with learning of spatial memory tasks is a strength of this paper. The authors also discovered an interesting population of prefrontal neurons modulated by both beta and CA1 sharp-wave ripple (SWR) events, showing a relationship between SWR-excited and SWR-inhibited neurons and beta oscillation phase.

      Weaknesses:

      The authors report on a task where rats were performing sub-optimally (F-maze), weakening claims. Likewise, it is questionable as to whether mPFC and hippocampus are dually required to perform a no-delay Y-maze task at day 5, where rats are performing near 100%. There would be little reason to suspect strong oscillatory coupling when task performance is poor and/or independent of mPFC-HPC communication (Jones and Wilson, 2005), potentially weakening conclusions about independent beta rhythms. Moreover, there is little detail provided about sample sizes and how data sampling is being performed (e.g., rats, sessions, or trials), raising generalizability concerns.

    5. Author response:

      Public Reviews:.

      Reviewer #1 (Public review):

      Wang, Zhou et al. investigated coordination between the prefrontal cortex (PFC) and the hippocampus (Hp), during reward delivery, by analyzing beta oscillations. Beta oscillations are associated with various cognitive functions, but their role in coordinating brain networks during learning is still not thoroughly understood. The authors focused on the changes in power, peak frequencies, and coherence of beta oscillations in two regions when rats learn a spatial task over days. Inconsistent with the authors' hypothesis, beta oscillations in those two regions during reward delivery were not coupled in spectral or temporal aspects. They were, however, able to show reverse changes in beta oscillations in PFC and Hp as the animal's performance got better. The authors were also able to show a small subset of cell populations in PFC that are modulated by both beta oscillations in PFC and sharp wave ripples in Hp. A similarly modulated cell population was not observed in Hp. These results are valuable in pointing out distinct periods during a spatial task when two regions modulate their activity independently from each other.

      The authors included a detailed analysis of the data to support their conclusions. However, some clarifications would help their presentation, as well as help readers to have a clear understanding.

      (1) The crucial time point of the analysis is the goal entry. However, it needs a better explanation in the methods or in figures of what a goal entry in their behavioral task means.

      We appreciate Reviewer 1 pointing out this shortcoming and will clarify the description in the revised manuscript. Each goal is located at the end of the arm, and is equipped with a reward delivery unit. The unit has an infrared sensor. The rat breaks the infrared beam when it enters the goal.

      (2) Regarding Figure 2, the authors have mentioned in the methods that PFC tetrodes have targeted both hemispheres. It might be trivial, but a supplementary graph or a paragraph about differences or similarities between contralateral and ipsilateral tetrodes to Hp might help readers.

      We will provide the requested analysis in the full revision. We saw both hemispheres had similar properties.

      (3) The authors have looked at changes in burst properties over days of training. For the coincidence of beta bursts between PFC and Hp, is there a change in the coincidence of bursts depending on the day or performance of the animal?

      We will provide the requested analysis in the full revision.

      (4) Regarding the changes in performance through days as well as variance of the beta burst frequency variance (Figures 3C and 4C); was there a change in the number of the beta bursts as animals learn the task, which might affect variance indirectly?

      The analysis we can do here is to control for differences in the number of bursts for each category (days/performance quintile) by resampling the data to match the burst count between categories.

      (5) In the behavioral task, within a session, animals needed to alternate between two wells, but the central arm (1) was in the same location. Did the authors alternate the location of well number 1 between days to different arms? It is possible that having well number 1 in the same location through days might have an effect on beta bursts, as they would get more rewards in well number 1?

      The central arm remained the same across days since we needed the animals to learn the alternation task. In our experience, the animal needs a few days to learn the alternation rule when we switch the central arm location. For this experiment, we were interested in the initial learning process, and we kept the central constant. Switching the central arm location is a great suggestion for a follow up experiment where we can understand the effects of reward contingency change has on beta bursts.

      (6) The animals did not increase their performance in the F maze as much as they increased it in the Y maze. It would be more helpful to see a comparison between mazes in Figure 5 in terms of beta burst timing. It seems like in Y maze, unrewarded trials have earlier beta bursts in Y maze compared to F maze. Also, is there a difference in beta burst frequencies of rewarded and unrewarded trials?

      We will add this analysis in the revised manuscript.

      (7) For individual cell analysis, the authors recorded from Hp and the behavioral task involved spatial learning. It would be helpful to readers if authors mention about place field properties of the cells they have recorded from. It is known that reward cells firing near reward locations have a higher rate to participate in a sharp wave ripple. Factoring in the place field propertiesd of the cells into the analysis might give a clearer picture of the lack of modulation of HP cells by beta and sharp wave ripples.

      This is a great suggestion, and we will address this in the full revision.

      Reviewer #2 (Public review):

      We thank Reviewer 2 for their helpful comments and will address these in full in the revision. These are great suggestions to provide greater detail on the spectral and behavioral data at the goal.

      (1) When presenting the power spectra for the representative example (Figure 1), it would be appropriate to display a broader frequency band-including delta, theta, and gamma (up to ~100 Hz), rather than only the beta band.

      We will show more examples of power spectra with a wider frequency range. We did examine the wider spectra and noticed power in the beta frequency band was more prominent than others.

      What was the rat's locomotor state (e.g., running speed) after entering the reward location, during which the LFPs were recorded?

      We will add the time aligned speed profile to the spectra and raw data examples. Because goal entry is defined as the time the animals break the infrared beam at the goal (response to Reviewer 1), the rat would have come to a stop.

      If the rats stopped at the goal but still consumed the reward (i.e., exhibited very low running speed), theta rhythms might still occasionally occur, and sharp-wave ripples (SWRs) could be observed during rest.

      We typically find low theta power in the hippocampus after the animal reaches the goal location and as it consumes reward. Reviewer 2 is correct about occasional theta power at the goal. We have observed this but mostly before the animal leaves the goal location. We did find SWRs during goal periods. One example is shown in Fig. 7A.

      Do beta bursts also occur during navigation prior to goal entry?

      We did not find consistent beta bursts in PFC or CA1 on approach to goal entry. We can provide the analyses in our full revision. In our initial exploratory analysis, we found beta bursts was most prominent after goal entry, which led us to focus on post-goal entry beta for this manuscript. However, beta oscillations in the hippocampus during locomotion or exploration has been reported (Ahmed & Mehta, 2012; Berke et al., 2008; França et al., 2014; França et al., 2021; Iwasaki et al., 2021; Lansink et al., 2016; Rangel et al., 2015).

      It would be beneficial to display these rhythmic activities continuously across both the navigation and goal entry phases. Additionally, given that the hippocampal theta rhythm is typically around 7-8 Hz, while a peak at approximately 15-16 Hz is visible in the power spectra in Figure 1C, the authors should clarify whether the 22 Hz beta activity represents a genuine oscillation rather than a harmonic of the theta rhythm.

      To ensure we fully address this concern, we can provide further spectral analysis in our revised manuscript to show theta power in CA1 is reduced after goal entry. We were initially concerned about the possibility that the 22Hz power in CA1 may be a harmonic rather than a standalone oscillation band. If these are harmonics of theta, we should expect to find coincident theta at the time of bursts in the beta frequency. In Fig. 1B, Fig. 2A, we show examples of the raw LFP traces from CA1. Here, the detected bursts are not accompanied by visible theta frequency activity. For PFC, we do not always see persistent theta frequency oscillations like CA1. In PFC, we found beta bursts were frequent and visually identifiable when examining the LFP. We provided examples of the PFC LFP (Fig. 1B, Fig. 1-1, and Fig. 2A). In these cases, we see clear beta frequency oscillations lasting several cycles and these are not accompanied by any oscillations in the theta frequency in the LFP trace.

      (2) The authors claim that beta activity is independent between CA1 and PFC, based on the low coherence between these regions. However, it is challenging to discern beta-specific coherence in CA1; instead, coherence appears elevated across a broader frequency band (Figure 2 and Figure 2-1D). An alternative explanation could be that the uncoupled beta between CA1 and PFC results from low local beta coherence within CA1 itself.

      This is a legitimate concern, and we used three methods to characterize coherence and coordination between the two regions. First, we calculated coherence for tetrode pairs for times when the animal was at goals (Fig. 2B), which provides a general estimation of coherence across frequencies but lack any temporal resolution. Second, we calculated burst aligned coherence (Fig. 2-1), which provides temporal resolution relative to the burst, but the multi-taper method is constrained by the time-frequency resolution trade off. Third, we quantified the timing between the burst peaks (Fig. 2D), which will describe timing differences but the peaks for the bursts may not be symmetric. Thus, each method has its own caveats, but we drew our conclusion from the combination of results from these three analyses, which pointed to similar conclusions.

      Reviewer 2 is correct in pointing out the uniformly high coherence within CA1 across the frequency range we examined. When we inspected the raw LFP across multiple tetrodes in CA1, they were similar to each other (Fig. 2A). This likely reflects the uniformity in the LFP across recording sites in CA1, which is what we saw with coherence values across the frequency range (Fig. 2B). We found CA1 coherence between tetrode pairs within CA1 across the range, were statistically higher, compared to tetrode pairs in PFC (Fig. 2B and C), thus our results are unlikely to be explained by low beta coherence within CA1 itself. The burst aligned coherence using a multi-taper method also supports this. The coherence values within CA1 at the time of CA1 bursts is ~0.8-0.9.

      (3) In Figure 2-1E-F, visual inspection of the box plots reveals minimal differences between PFC-Ind and PFC-Coin/CA1-Coin conditions, despite reported statistical significance. It may be necessary to verify whether the significance arises from a large sample size.

      We will include the sample sizes for each of the boxplots, these should be the same as the power comparison in Fig. 2-1 A-C. The LFP within a one second window centered around the bursts are usually very similar, and the multi-taper method will return high coherence values. The p-values from statistical comparisons between the boxes are corrected using the Benjamini-Hochberg method.

      (4) In Figure 3 and Figure 4, although differences in power and frequency appear to change significantly across days, these changes are not easily discernible by visual inspection. It is worth considering whether these variations are related to increased task familiarity over days, potentially accompanied by higher running speeds.

      We agree with Reviewer 2 that familiarity increases across days, and the animal is likely running faster. The analysis for Fig. 3 and 4 includes only data from periods when the animal was at the goal and was not moving. We used linear mixed effects models to quantify the relationship between power, frequency and day or behavioral quintile.

      (5) The stronger spiking modulation by local beta oscillations shown in Figure 6 could also be interpreted in the context of uncoupled beta between CA1 and PFC. In this analysis, only spikes occurring during beta bursts should be included, rather than all spikes within a trial. The authors should verify the dataset used and consider including a representative example illustrating beta modulation of single-unit spiking.

      We agree with Reviewer 2 that the stronger modulation to local beta is another piece of evidence indicating uncoupled beta between the two regions. We appreciate this suggestion and will add examples illustrating beta modulation for single units. We want to clarify the spikes were only from periods when the animal is at the goal location on each trial and does not include the running period between goals.

      (6) As observed in Figure 7D, CA1 beta bursts continue to occur even after 2.5 seconds following goal entry, when SWRs begin to emerge. Do these oscillations alternate over time, or do they coexist with some form of cross-frequency coupling?

      This is a very interesting and helpful suggestion. Although we found SWRs generally appear later than beta bursts, it is possible the two are related on a finer timescale pointing to coordination. Our cross-correlation analysis between PFC and CA1 beta bursts only showed the relationship on the timescale of seconds. We will show a higher time-resolution version of this analysis in the revision.

      Reviewer #3 (Public review):

      Summary:

      This paper explored the role of beta rhythms in the context of spatial learning and mPFC-hippocampal dynamics. The authors characterized mPFC and hippocampal beta oscillations, examining how their coordination and their spectral profiles related to learning and prefrontal neuronal firing. Rats performed two tasks, a Y-maze and an F-maze, with the F-maze task being more cognitively demanding. Across learning, prefrontal beta oscillation power increased while beta frequency decreased. In contrast, hippocampal beta power and beta frequency decreased. This was particularly the case for the well-performed and well-learned Y-maze paradigm. The authors identified the timing of beta oscillations, revealing an interesting shift in beta burst timing relative to reward entry as learning progressed. They also discovered an interesting population of prefrontal neurons that were tuned to both prefrontal beta and hippocampal sharp-wave ripple events, revealing a spectrum of SWR-excited and SWR-inhibited neurons that were differentially phase locked to prefrontal beta rhythms.

      In sum, the authors set out to examine how beta rhythms and their coordination were related to learning and goal occupancy. The authors identified a set of learning and goal-related correlates at the level of LFP and spike-LFP interactions, but did not report on spike-behavioral correlates.

      Strengths:

      Pairing dual recordings of medial prefrontal cortex (mPFC) and CA1 with learning of spatial memory tasks is a strength of this paper. The authors also discovered an interesting population of prefrontal neurons modulated by both beta and CA1 sharp-wave ripple (SWR) events, showing a relationship between SWR-excited and SWR-inhibited neurons and beta oscillation phase.

      Weaknesses:

      Moreover, there is little detail provided about sample sizes and how data sampling is being performed (e.g., rats, sessions, or trials), raising generalizability concerns.

      We appreciate Reviewer 3’s thoughtful suggestions for making our claims convincing. We will include information about sample sizes and address each detailed recommendation in the revised manuscript.

      The authors report on a task where rats were performing sub-optimally (F-maze), weakening claims.

      Our experiment was designed to allow us to examine within the same animal, a well-performed task (Y) and a less well-performed task (F). This contrast allows us to determine differences in neural correlates. We can further dissect the relevant differences to take advantage of this experiment design.

      Likewise, it is questionable as to whether mPFC and hippocampus are dually required to perform a no-delay Y-maze task at day 5, where rats are performing near 100%.

      We agree with Reviewer 3 that the mPFC and hippocampus may not be required when the animal reaches stable performance on day 5 (Deceuninck & Kloosterman, 2024). The data we collected spans the full range of early learning (day 1) to proficiency (day 5). We wanted to understand the dynamics of beta across these learning stages.

      Recent studies suggest mPFC and hippocampus are likely to be needed, in some capacity, for learning continuous spatial alternation tasks on a range of maze geometries. Lesions, inactivation or waking activity perturbation of hippocampus or hippocampus and mPFC on the W maze alternation task slowed learning (Jadhav et al., 2012; Kim & Frank, 2009; Maharjan et al., 2018). More recently, optogenetic silencing of mPFC after sharp wave ripples on the Y maze alternation affected performance when the center arm was switched (den Bakker et al., 2023). The Y and F mazes in our study both share the continuous alternation rule, where the animal needed to avoid visiting a previously visited location on the outbound choice relative to the center, and always return to the center location.

      Further, the performance characteristics on the outbound and inbound components of our Y task is similar to the W task. We have analyzed the “inbound” and “outbound” performance of the animals on the Y maze alternation task, and they are similar to the W maze alternation task. The “inbound” or reference location component is learned quickly whereas the ”outbound”, alternation component is learned slowly. We can add this analysis to the revised manuscript.

      There would be little reason to suspect strong oscillatory coupling when task performance is poor and/or independent of mPFC-HPC communication (Jones and Wilson, 2005) potentially weakening conclusions about independent beta rhythms.

      Although many studies have examined the oscillatory coupling properties at the theta frequency between mPFC-HPC (Hyman et al., 2005; Jones & Wilson, 2005; Siapas et al., 2005), our understanding of beta frequency coordination between the two regions is less established, especially at goal locations. Beta frequency coordination at goal locations may or may not follow similar properties to theta frequency coupling. In this manuscript we are reporting the properties of goal-location beta frequency activity in mPFC-HPC networks. We are not aware of prior work describing these properties at this stage of a spatial navigation task, especially their coordination in time.

      References

      Ahmed, O. J., & Mehta, M. R. (2012). Running speed alters the frequency of hippocampal gamma oscillations. J Neurosci, 32(21), 7373-7383. https://doi.org/10.1523/JNEUROSCI.5110-11.2012

      Berke, J. D., Hetrick, V., Breck, J., & Greene, R. W. (2008). Transient 23-30 Hz oscillations in mouse hippocampus during exploration of novel environments. Hippocampus, 18(5), 519-529. https://doi.org/10.1002/hipo.20435

      Deceuninck, L., & Kloosterman, F. (2024). Disruption of awake sharp-wave ripples does not affect memorization of locations in repeated-acquisition spatial memory tasks. Elife, 13. https://doi.org/10.7554/eLife.84004

      den Bakker, H., Van Dijck, M., Sun, J. J., & Kloosterman, F. (2023). Sharp-wave-ripple-associated activity in the medial prefrontal cortex supports spatial rule switching. Cell Rep, 42(8), 112959. https://doi.org/10.1016/j.celrep.2023.112959

      França, A. S., do Nascimento, G. C., Lopes-dos-Santos, V., Muratori, L., Ribeiro, S., Lobão-Soares, B., & Tort, A. B. (2014). Beta2 oscillations (23-30 Hz) in the mouse hippocampus during novel object recognition. Eur J Neurosci, 40(11), 3693-3703. https://doi.org/10.1111/ejn.12739

      França, A. S. C., Borgesius, N. Z., Souza, B. C., & Cohen, M. X. (2021). Beta2 Oscillations in Hippocampal-Cortical Circuits During Novelty Detection. Front Syst Neurosci, 15, 617388. https://doi.org/10.3389/fnsys.2021.617388

      Hyman, J. M., Zilli, E. A., Paley, A. M., & Hasselmo, M. E. (2005). Medial prefrontal cortex cells show dynamic modulation with the hippocampal theta rhythm dependent on behavior. Hippocampus, 15(6), 739-749. https://doi.org/10.1002/hipo.20106

      Iwasaki, S., Sasaki, T., & Ikegaya, Y. (2021). Hippocampal beta oscillations predict mouse object-location associative memory performance. Hippocampus, 31(5), 503-511. https://doi.org/10.1002/hipo.23311

      Jadhav, S. P., Kemere, C., German, P. W., & Frank, L. M. (2012). Awake hippocampal sharp-wave ripples support spatial memory. Science (New York, N.Y.), 336(6087), 1454-1458. https://doi.org/10.1126/science.1217230

      Jones, M. W., & Wilson, M. A. (2005). Theta Rhythms Coordinate Hippocampal–Prefrontal Interactions in a Spatial Memory Task. PLoS Biology, 3(12). https://doi.org/10.1371/journal.pbio.0030402

      Kim, S. M., & Frank, L. M. (2009). Hippocampal Lesions Impair Rapid Learning of a Continuous Spatial Alternation Task. PLoS ONE, 4(5). https://doi.org/10.1371/journal.pone.0005494

      Lansink, C. S., Meijer, G. T., Lankelma, J. V., Vinck, M. A., Jackson, J. C., & Pennartz, C. M. (2016). Reward Expectancy Strengthens CA1 Theta and Beta Band Synchronization and Hippocampal-Ventral Striatal Coupling. J Neurosci, 36(41), 10598-10610. https://doi.org/10.1523/JNEUROSCI.0682-16.2016

      Maharjan, D. M., Dai, Y. Y., Glantz, E. H., & Jadhav, S. P. (2018). Disruption of dorsal hippocampal - prefrontal interactions using chemogenetic inactivation impairs spatial learning. Neurobiol Learn Mem, 155, 351-360. https://doi.org/10.1016/j.nlm.2018.08.023

      Rangel, L. M., Chiba, A. A., & Quinn, L. K. (2015). Theta and beta oscillatory dynamics in the dentate gyrus reveal a shift in network processing state during cue encounters. Front Syst Neurosci, 9, 96. https://doi.org/10.3389/fnsys.2015.00096

      Siapas, A. G., Lubenov, E. V., & Wilson, M. A. (2005). Prefrontal Phase Locking to Hippocampal Theta Oscillations. Neuron, 46(1), 141-151. https://doi.org/10.1016/j.neuron.2005.02.028.

    1. eLife Assessment

      This important study uncovers a previously unrecognized light-responsive pathway in C. elegans that depends on live food bacteria and is mediated by the bZIP factors ZIP-2/CEBP-2 and the cytochrome P450 enzyme, CYP-14A5. The authors show that this bacteria-linked pathway modulates long-term memory and can be harnessed as a low-cost light-inducible expression system, opening new directions for sensory biology and genetic engineering in worms. The exact means by which live bacteria modulate light signal that activates ZIP-2/CEBP-2 in the worm remains to be elucidated. The evidence supporting the pathway's role uses multiple genetic, transcriptional, and behavioural assays, and is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to understand how animals respond to visible light in an animal without eyes. To do so they used the C. elegans model, which lacks eyes, but nonetheless exhibits robust responses to visible light at several wavelengths. Here, the authors report a promoter that is activated by visible light and independent of known pathways of light resposnes.

      Strengths:

      The authors convincingly demonstrate that visible light activates the expression of the cyp-14A5 promoter driven gene expression in a variety of contexts and report the finding that this pathway is activated via the ZIP-2 transcriptionally regulated signaling pathway.

      Weaknesses:

      Because the ZIP-2 pathway has been reported to activated predominantly by changes in the bacterial food source of C. elegans -- or exposure of animals to pathogens -- it remains unclear if visible light activates a pathway in C. elegans (animals) or if visible light potentially is sensed by the bacteria on the plate which also lack eyes. Specifically, it is possible that the the plates are seeded with excess E. coli, that E. coli is altered by light in some way and in this context alters its behavior in such a way that activates a known bacterially responsive pathway in the animals. Consistent with this possibility the authors found that heat-killed bacteria prevented the reporter activation in animals. This weakness would not affect the ability to use this novel discovery as a tool, which would still be useful to the field.

    3. Reviewer #2 (Public review):

      Summary:

      Ji, Ma and colleagues report the discovery of a mechanism in C. elegans that mediates transcriptional responses to low intensity light stimuli. They find that light-induced transcription requires a pair of bZIP transcription factors and induces expression of a cytochrome P450 effector. This unexpected light-sensing mechanism is required for physiologically relevant gene expression that controls behavioral plasticity. The authors further show that this mechanism can be co-opted to create light-inducible transgenes.

      Strengths:

      The authors rigorously demonstrate that ambient light stimuli regulate gene expression via a mechanism that requires the bZIP factors ZIP-2 and CEBP-2. Transcriptional responses to light stimuli are measured using transgenes and using measurements of endogenous transcripts. The study shows proper genetic controls for these effects. The study shows that this light-response does not require known photoreceptors, is tuned to specific wavelengths, and is highly unlikely to be an artifact of temperature-sensing. The study further shows that the function of ZIP-2 and CEBP-2 in light-sensing can be distinguished from their previously reporter role in mediating transcriptional responses to pathogenic bacteria. The study includes experiments that demonstrate that regulatory motifs from a known light-response gene can be used to confer light-regulated gene expression, demonstrating sufficiency and suggesting an application of these discoveries in engineering inducible transgenes. Finally, the study shows that ambient light and the transcription factors that transduce it into gene expression changes are required to stabilize a learned olfactory behavior, suggesting a physiological function for this mechanism.

      Weaknesses:

      The study implies but does not show that the effects of ambient light on stabilizing a learned olfactory behavior are through the described pathway. To show this clearly, the authors should determine whether ambient light has any further effects on learning in mutants lacking CYP-14A5, ZIP-2, or CEBP-2.

    4. Reviewer #3 (Public review):

      Ji et al. report a novel and interesting light-induced transcriptional response pathway in the eyeless roundworm Caenorhabditis elegans that involves a cytochrome P450 family protein (CYP-14A5) and functions independently from previously established photosensory mechanisms. The authors also demonstrate the potential for this pathway to enable robust light-induced control of gene expression and behavior, albeit with some restrictions. Despite the limitations of this tool, including those presented by the authors, it could prove useful for the community. Overall, the evidence supporting the claims of the authors is convincing, and the authors' work suggests numerous interesting lines of future inquiry.

      (1) Although the exact mechanisms underlying photoactivation of this pathway remain unclear, light-dependent induction of CYP-14A5 requires bZIP transcription factors ZIP-2 and CEBP-2 that have been previously implicated in worm responses to pathogens. Notably, this light response requires live food bacteria, suggesting a microbial contribution to this phenomenon. The nature of the microbial contribution to the light response is unknown but very interesting.

      (2) The authors suggest that light-induced CYP-14A5 activity in the C. elegans hypoderm can unexpectedly and cell-non-autonomously contribute to retention of an olfactory memory. How retention of the olfactory memory is enhanced by light generally remains unclear. Additional experiments, including verification of light-dependent changes in CYP-14A5 levels in the olfactory memory behavioral setup, appropriate would help further interpret these otherwise interesting results.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors set out to understand how animals respond to visible light in an animal without eyes. To do so, they used the C. elegans model, which lacks eyes, but nonetheless exhibits robust responses to visible light at several wavelengths. Here, the authors report a promoter that is activated by visible light and independent of known pathways of light responses.

      Strengths:

      The authors convincingly demonstrate that visible light activates the expression of the cyp-14A5 promoter-driven gene expression in a variety of contexts and report the finding that this pathway is activated via the ZIP-2 transcriptionally regulated signaling pathway.

      Weaknesses:

      Because the ZIP-2 pathway has been reported to be activated predominantly by changes in the bacterial food source of C. elegans -- or exposure of animals to pathogens -- it remains unclear if visible light activates a pathway in C. elegans (animals) or if visible light potentially is sensed by the bacteria on the plate, which also lack eyes. Specifically, it is possible that the plates are seeded with excess E. coli, that E. coli is altered by light in some way, and in this context, alters its behavior in such a way that activates a known bacterially responsive pathway in the animals. This weakness would not affect the ability to use this novel discovery as a tool, which would still be useful to the field, but it does leave some questions about the applicability to the original question of how animals sense light in the absence of eyes.

      Thank you for the insightful questions and suggestions. We have now performed a key experiment requested. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. We now include this interesting new result in the paper and revised discussion on the bacteria-modulated mechanism but note that this bacterial requirement does not alter the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity likely influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the intrinsic regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50. Accordingly, we have revised the Results and Discussion to reflect the appropriate scope.

      Reviewer #2 (Public review):

      Summary:

      Ji, Ma, and colleagues report the discovery of a mechanism in C. elegans that mediates transcriptional responses to low-intensity light stimuli. They find that light-induced transcription requires a pair of bZIP transcription factors and induces expression of a cytochrome P450 effector. This unexpected light-sensing mechanism is required for physiologically relevant gene expression that controls behavioral plasticity. The authors further show that this mechanism can be co-opted to create light-inducible transgenes.

      Strengths:

      The authors rigorously demonstrate that ambient light stimuli regulate gene expression via a mechanism that requires the bZIP factors ZIP-2 and CEBP-2. Transcriptional responses to light stimuli are measured using transgenes and using measurements of endogenous transcripts. The study shows proper genetic controls for these effects. The study shows that this light-response does not require known photoreceptors, is tuned to specific wavelengths, and is highly unlikely to be an artifact of temperature-sensing. The study further shows that the function of ZIP-2 and CEBP-2 in light-sensing can be distinguished from their previously reported role in mediating transcriptional responses to pathogenic bacteria. The study includes experiments that demonstrate that regulatory motifs from a known light-response gene can be used to confer light-regulated gene expression, demonstrating sufficiency and suggesting an application of these discoveries in engineering inducible transgenes. Finally, the study shows that ambient light and the transcription factors that transduce it into gene expression changes are required to stabilize a learned olfactory behavior, suggesting a physiological function for this mechanism.

      Weaknesses:

      The study implies but does not show that the effects of ambient light on stabilizing a learned olfactory behavior are through the described pathway. To show this clearly, the authors should determine whether ambient light has any effect on mutants lacking CYP-14A5, ZIP-2, or CEBP-2. Other minor edits to the text and figures are suggested.

      We appreciate the reviewer’s comment. Our study indeed implies that ambient light stabilizes learned olfactory behavior through effects on the described pathway. Importantly, the existing data already address this point. Mutants lacking CYP-14A5, ZIP-2, or CEBP-2 display impaired olfactory memory even when exposed to ambient light, indicating that these genes are required for the behavioral effect of light. Consistent with this, ambient light robustly induces cyp-14A5p::GFP in wild-type animals but fails to do so in zip-2 and cebp-2 mutants, demonstrating that light-dependent transcriptional activation is blocked upstream in these pathway mutants. Together, these results support the conclusion that ambient light acts through the ZIP-2 → CEBP-2 → CYP-14A5 pathway to stabilize memory. Minor textual and figure revisions have been made where helpful to clarify this point.

      Reviewer #3 (Public review):

      Ji et al. report a novel and interesting light-induced transcriptional response pathway in the eyeless roundworm Caenorhabditis elegans that involves a cytochrome P450 family protein (CYP-14A5) and functions independently from previously established photosensory mechanisms. Although the exact mechanisms underlying photoactivation of this pathway remain unclear, light-dependent induction of CYP-14A5 requires bZIP transcription factors ZIP-2 and CEBP-2 that have been previously implicated in worm responses to pathogens. The authors then suggest that light-induced CYP-14A5 activity in the C. elegans hypoderm can unexpectedly and cell-non-autonomously contribute to retention of an olfactory memory. Finally, the authors demonstrate the potential for this pathway to enable robust light-induced control of gene expression and behavior, albeit with some restrictions. Overall, the evidence supporting the claims of the authors is convincing, and the authors' work suggests numerous interesting lines of future inquiry.

      (1) The authors determine that light, but not several other stressors tested (temperature, hypoxia, and food deprivation), can induce transcription of cyp-15A5. The authors use these experiments to suggest the potential specificity of the induction of CYP-14A5 by light. Given the established relationship between light and oxidative stress and the authors' later identification of ZIP-2, testing the effect of an oxidative stressor or pathogen exposure on transcription of cyp-14A5 would further strengthen the validity of this statement and potentially shed some insight into the underlying mechanisms.

      We appreciate the reviewer’s thoughtful suggestion. We would like to clarify that the “specificity” we refer to is the strong and preferential induction of cyp-14A5 by light among pathogen or detoxification-related genes, rather than an assertion that cyp-14A5 is exclusively light-responsive. This does not preclude the possibility that cyp-14A5 can also be activated under other conditions. Indeed, prior work from the Troemel laboratory has identified cyp-14A5 as one of many pathogen-inducible genes, consistent with its role in stress physiology. Our data show that classical pathogen-responsive genes (e.g., irg-1) are not induced by light, whereas cyp-14A5 is strongly induced, highlighting the selective engagement of this cytochrome P450 by light under the conditions tested. We have revised the text to clarify this point.

      (2) The authors suggest that short-wavelength light more robustly increases transcription of cyp-14A5 compared to equally intense longer wavelengths (Figure 2F and 2G). Here, however, the authors report intensities in lux of wavelengths tested. Measurements of and reporting the specific spectra of the incident lights and their corresponding irradiances (ideally, in some form of mW/mm2 - see Ward et al., 2008, Edwards et al., 2008, Bhatla and Horvitz, 2015, De Magalhaes Filho et al., 2018, Ghosh et al., 2021, among others, for examples) is critical for appropriate comparisons across wavelengths and facilitates cross-checking with previous studies of C. elegans light responses. On a related and more minor note, the authors place an ultraviolet shield in front of a visible light LED to test potential effects of ultraviolet light on transcription of cyp-14A5. A measurement of the spectrum of the visible light LED would help confirm if such an experiment was required. Regardless, the principal conclusions the authors made from these experiments will likely remain unchanged.

      Thank you. We have revised the text to clarify this point. “Using controlled light versus dark conditions, we confirmed the finding from an integrated cyp-14A5p::GFP reporter and observed its robust widespread GFP expression in many tissues induced by moderate-intensity (500-3000 Lux, 16-48 hr duration) LED light exposure (Fig. 1A). The photometric Lux range is approximately 0.1–0.60 mW/cm<sup>2</sup> in radiometric (total radiant power) metric given the spectrum of the LED light source.”

      (3) The authors report an interesting observation that animals exposed to ambient light (~600 lux) exhibit significantly increased memory retention compared to those maintained in darkness (Figure 4). Furthermore, light deprivation within the first 2-4 hours after learning appears to eliminate the effect of light on memory retention. These processes depend on CYP-14A5, loss of which can be rescued by re-expression of cyp-14A5 in mutant animals using a hypoderm-specific- and non-light-inducible- promoter. Taken together, the authors argue convincingly that hypodermal expression of cyp-14A5 can contribute to the retention of the olfactory memory. More broadly, these experiments suggest that cell-non-autonomous signaling can enhance retention of olfactory memory. How retention of the olfactory memory is enhanced by light generally remains unclear. In addition, the authors' experiments in Figure 1B demonstrate - at least by use of the transcriptional reporter - that light-dependent induction of cyp-14A5 transcription at 500 - 1000 lux is minimal and especially so at short duration exposures. Additional experiments, including verification of light-dependent changes in CYP-14A5 levels in the olfactory memory behavioral setup, would help further interpret these otherwise interesting results.

      We thank the reviewer for these thoughtful comments. We agree that understanding how light enhances memory retention at a mechanistic level is an important direction for future work. Regarding the light intensities used in Figure 1B, we would like to clarify that 500–1000 lux does produce a measurable and statistically significant induction of cyp-14A5p::GFP, although the magnitude is lower than that observed at higher intensities. We interpret this modest induction as physiologically relevant: intermediate light levels appear sufficient to engage the CYP-14A5–dependent program required for memory stabilization, whereas stronger light intensities are detrimental to learning and reduce behavioral performance. Thus, the behavioral paradigm uses a light regime that activates the pathway without introducing stress-associated confounders.

      (4) The experiments in Figure 4 nicely validate the usage of the cyp-14A5 promoter as a potential tool for light-dependent induction of gene expression. Despite the limitations of this tool, including those presented by the authors, it could prove useful for the community.

      Thank you and we agree. In addition, we have included in the revised manuscript the single-copy integration strains based on UAS-GAL4 that produced similar results as transgenic strains and will be even more flexible and useful for the community.

      Recommendations for the authors:

      Reviewing Editor Comments:

      While appreciating the quality and presentation of this important study, we had two major concerns that the authors need to address.

      (1) Bacteria-versus-worm origin:

      To rule out a bacterially derived stimulus, we suggest testing whether cyp-14A5p::GFP is inducible without bacteria (or killed bacteria). Checking whether the canonical immune reporters irg-5p::GFP and gst-4p::GFP are also light-inducible will further clarify this point.

      We have now performed the key experiment requested by the reviewers. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. Importantly, this requirement does not alter any of the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50.

      We included the data (Fig. 2D) to show that the canonical immune reporter irg-1p::GFP is not induced by the light condition that robustly induced cyp-14A5p::GFP, and gst-4p::GFP is only very mildly induced (Fig. S1J).

      (2) Pathway-behaviour link:

      The behavioural relevance of the newly described pathway is intriguing, but it needs direct support. Ideally, this would require comparing memory in WT, zip-2-/-, cebp-2-/-, and cyp-14A5-/- under both dark and light conditions. But at the very least, it would require testing if constitutive CYP-14A5 rescue in the dark bypasses the requirement of light.

      We respectfully submit that additional experiments are not required to support the behavioral conclusions. Our model posits that cyp-14A5 is required but not sufficient for memory stabilization, one component within a broader set of light-induced genes. Thus, constitutive hypodermal expression of cyp-14A5 would not be expected to bypass the requirement for ambient light. The existing data are fully consistent with this framework and conclusions of the paper.

      Reviewer #1 (Recommendations for the authors):

      Overall, I think this paper is interesting to the field of C. elegans researchers at a minimum, as a light-inducible gene expression system might have a variety of uses throughout the diverse research paradigms that use this model system. With that said, I have a couple of suggestions that I think would substantially impact the ability to interpret these findings, which might be useful for broader implications of the study.

      (1) Most importantly, the supplemental table of RNA-seq data should likely be updated and discussed further beyond the cyp-14A5 findings. First, the authors report 7,902 genes are differentially expressed in response to light and then break these into upregulated and downregulated genes. But there are only 1,785 upregulated genes and 3,632 downregulated genes. This adds up to 5417 genes, but doesn't match the 7,902 genes reported to change, and I could not find in the text if some other filters were applied that might explain this not adding up.

      Thank you for this helpful comment. We agree that the exact numbers depend on statistical thresholds and are therefore somewhat arbitrary. To avoid implying unwarranted precision, we have revised the text to state that “thousands of genes are differentially regulated by light.”

      (2) Among the upregulated genes in response to light are irg-5, irg-4, irg-6, irg-8, and gst-4. Indeed, all of these well-studied genes (or most) show even more induction by light than cyp-14A5. It is my opinion that this result needs further criticism as there are existing GFP reporters for gst-4 and irg-5 that are similarly well studied to irg-1, which is in the paper (and is not upregulated). In my opinion, the authors should test if they see activation of the irg-4 and gst-4 GFP reporters by light as well. This would not only validate their RNA-seq but might provide more important evidence for the field, as these other reporters are not considered light-inducible previously. If they are, several major studies might be impacted by this.

      Thank you for the comments. We have irg-1p::GFP and gst-4p::GFP in the lab but did not find other reporters for the genes mentioned from CGC. Neither of the two reporters showed light induction (Figs. 2D and S1J) as strongly as cyp-14A5p::GFP. It is possible that irg-1 and gst-4 RNA levels are up-regulated but not reflected in our transgenic reporters that used their promoters to drive GFP expression. Stronger light induction of cyp-14A5p::GFP is unlikely caused by the multi-copy nature of the transgene since newly generated single-copy integration strains based on the UAS-GAL4 system produced similar robust results for light induction (Fig. S1I and see Method).

      (3) Along the same lines, if at least 4 (and likely more) well characterized immune response genes are activated by light and these genes are known to mostly respond to differences in C. elegans bacterial food source/diet, then it stands to reason that maybe in this experimental context the light is not acting on "animals" at all, but rather triggering changes in E. coli (i.e. changing E. coli metabolism or pathogenicity like properties). If true, then perhaps the light affects bacteria in such a way that it activates a previously known bacterial pathogen response mechanism. This should be easy to test by seeing if this reporter is still activated by light in the presence of diverse bacterial diets, which are available from the CGC (CeMBio collection, for example). This is likely very important to the conclusions of the manuscript as it relates to animals sensing light, but might not be as important to the use of this system as a tool.

      Thank you for the insightful questions and suggestions. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. Importantly, this requirement does not alter any of the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50. We have revised the Results and Discussion to reflect the appropriate scope of our study and implications of the new findings.

      (4) Lastly, it seems unlikely that nearly half the C. elegans genome is transcriptionally regulated by light (or nearly half of the detected genes in the RNA-seq results). It seems likely that this list of 7,902 genes contains false positives. I would suggest upping some sort of filter, like moving to padj < 0.01 instead of 0.05, or adding a 4-fold change filter (2-fold and 0.01 still results in near 5000+ genes changing, which might explain the difference in up and down genes just being due to different padj filters. Along these lines, it is worth noting that the padj is generated using DESeq2 it appears and one of the first assumptions of DESeq2 is that the median expressed genes do not change, and there is a normalization. However, if MOST genes do change in expression, then one of the fundamental assumptions of DESeq2 is not valid, and thus would mean it might not be an appropriate analysis tool - perhaps there is some other normalization that could be done before running DESeq2 due to some other noise present in the RNA-seq runs?

      Thank you for this helpful comment. We agree that the exact numbers depend on statistical thresholds and are therefore somewhat arbitrary. To avoid implying unwarranted precision, we have revised the text to state that “thousands of genes are differentially regulated by light.”

      (5) Minor point - I would delete the reference to ER in line 92. While most CYPs do localize to the ER, the images shown are not clearly ER and probably do not have enough resolution to make claims about subcellular localization. To me, it would be easier to just delete this claim as it is not required for the main claims of the manuscript.

      Reference deleted.

      Reviewer #2 (Recommendations for the authors):

      I have one request for clarification that likely requires additional data. Figure 3 shows that ambient light stabilizes learned changes to chemotaxis and further shows that CYP-14A5 has a similar function. The implication is that light promotes CYP-14A5 expression, which somehow promotes memory consolidation. The authors should test whether memory consolidation in cyp-15A5, zip-2, or cebp-2 mutants is no longer affected by ambient light.

      It is also possible to test whether forced expression of CYP14A5 can bypass the effect of 'no light' conditions on memory consolidation.

      Thank you for the comments. We respectfully submit that additional experiments are not required to support the behavioral conclusions. Our model posits that cyp-14A5 is required but not sufficient for memory stabilization, one component within a broader set of light-induced genes. Thus, constitutive hypodermal expression of cyp-14A5 would not be expected to bypass the requirement for ambient light. The existing data are fully consistent with this framework and conclusions of the paper.

      I have several minor suggestions relating to the text and figures.

      (1) In the introduction, the authors assert that little is known about non-visual light sensing and then list many examples of molecular mechanisms of non-visual light-sensing. They should emphasize that non-visual light sensing is important and accomplished by diverse molecular mechanisms.

      Agree and revised accordingly.

      (2) Check spacing between gene names (line 109).

      Corrected.

      (3) There should be a new paragraph break when the uORF experiments are described (line 146).

      Corrected.

      (4) 'Phenoptosis' is an esoteric word. Please define it (line 206).

      Corrected.

      (5) 'p' in the transgene name cyp-14A5p::nlp-22 is in italics, unlike the rest of the manuscript.

      Corrected.

      (6) 'Acknowledgment' should be 'Acknowledgments' (line 384).

      Corrected.

      (7) The color map in panel 1B should have units.

      It was arbitrary unit (now added) to highlight relative not absolute differences.

      (8) In panel 1E, it is confusing to have 'DARK' denoted by reddish bars and 'LIGHT' denoted by bluish bars. Perhaps 'DARK' is black/dark grey and 'LIGHT' is white?

      Corrected.

      (9) In panel 1D, it takes a minute to find the purple diamond. Please mark up the volcano plot to make it easier.

      Corrected.

      Reviewer #3 (Recommendations for the authors):

      The authors generally present convincing experiments detailing interesting results in a well-written manuscript.

      One quick note: the same Bhatla and Horvitz (2015) papers appear to be cited twice [line 52].

      Corrected.

    1. eLife Assessment

      This important study presents a methodologically rigorous framework for stability-guided fine-mapping, extending PICS and generalizing to methods such as SuSiE, supported by comprehensive simulations and functional enrichment analyses. The evidence is now convincing, demonstrating improved causal variant recovery and offering a robust alternative for cross-population fine-mapping. The approach will be of particular interest to statistical geneticists, computational biologists, and biomedical researchers who rely on fine-mapping to interpret genetic association signals.

    2. Reviewer #1 (Public review):

      Aw et al. have proposed that utilizing stability analysis can be useful for fine-mapping of cross populations. In addition, the authors have performed extensive analyses to understand the cases where the top eQTL and stable eQTL are the same or different via functional data.

      Comments on revisions:

      The authors have answered all my concerns.

    3. Reviewer #2 (Public review):

      Aw et al presents a new stability-guided fine-mapping method by extending the previously proposed PICS method. They applied their stability-based method to fine-map cis-eQTLs in the GEUVADIS dataset and compared it against residualization-based approaches. They evaluated the performance of the proposed method using publicly available functional annotations and demonstrated that the variants identified by their stability-based method show enrichment for these functional annotations.

      The authors have substantially strengthened the manuscript by addressing the major concerns raised in the initial review. I acknowledge that they have conducted comprehensive simulation studies to show the performance of their proposed approach and that they have extended their approach to SuSiE ("Stable SuSiE") to demonstrate the broader applicability of the stability-guided principle beyond PICS.

      One remaining question is the interpretation of matching variants with very low stable posterior probabilities (~0), which the authors have analyzed in detail but without fully conclusive findings. I agree with the authors that this event is relatively rare and the current sample size is limited but this might be something to keep in mind for future studies.

    4. Author response:

      The following is the authors’ response to the latest reviews:

      "One remaining question is the interpretation of matching variants with very low stable posterior probabilities (~0), which the authors have analyzed in detail but without fully conclusive findings. I agree with the authors that this event is relatively rare and the current sample size is limited but this might be something to keep in mind for future studies."

      Fine-mapping stabilityon matching variants with very low stable posterior probability

      We thank Reviewer 2 for encouraging us to think more about how low stable posterior probability matching variants can be interpreted. We describe a few plausible interpretations, even though – as Reviewer 2 and we have both acknowledged – our present experiments do not point to a clear and conclusive account.

      One explanation is that the locus captured by the variant might not be well-resolved, in the sense that many correlated variants exist around the locus. Thus, the variant itself is unlikely causal, but the set of variants in high LD with it may contain the true causal variant, or it's possible that the causal variant itself was not sequenced but lies in that locus. A comparison of LD patterns across ancestries at the locus would be helpful here.

      Another explanation rests on the following observation. For a variant to be matching between top and stable PICS and to also have very small stable PP, it has to have the largest PP after residualization on the ALL slice but also have positive PP with gene expression on many other slices. In other words, failing to control for potential confounders shrinks the PP. If one assumes that the matching variant is truly causal, then our observation points to an example of negative confounding (aka suppressor effect). This can occur when the confounders (PCs) are correlated with allele dosage at the causal variant in a different direction than their correlation with gene expression, so that the crude association between unresidualized gene expression and causal variant allele dosage is biased toward 0.

      Although our present study does not allow us to systematically confirm either interpretation – since we found that matching variants were depleted in causal variants in our simulations, violating the second argument, but we also found functional enrichment in analyses of GEUVADIS data though only 17 matching variants with low stable PP were reported – we believe a larger-scale study using larger cohort sizes (at least 1000 individuals per ancestry) and many more simulations (to increase yield of such cases) would be insightful.

      ———

      The following is the authors’ response to the original reviews:

      Reviewer #1:

      Major comments:

      (1) It would be interesting to see how much fine-mapping stability can improve the fine-mapping results in cross-population. One can simulate data using true genotype data and quantify the amount the fine-mapping methods improve utilizing the stability idea.

      We agree, and have performed simulation studies where we assume that causal variants are shared across populations. Specifically, by mirroring the simulation approach described in Wang et al. (2020), we generated 2,400 synthetic gene expression phenotypes across 22 autosomes, using GEUVADIS gene expression metadata (i.e., gene transcription start site) to ensure largely cis expression phenotypes were simulated. We additionally generated 1,440 synthetic gene expression phenotypes that incorporate environmental heterogeneity, to motivate our pursuit of fine-mapping stability in the first place (see Response to Reviewer 2, Comment 6). These are described in Results section “Simulation study”:

      We evaluated the performance of the PICS algorithm, specifically comparing the approach incorporating stability guidance against the residualization approach that is more commonly used — similar to our application to the real GEUVADIS data. We additionally investigated two ways of “combining” the residualization and stability guidance approaches: (1) running stability-guided PICS on residualized phenotypes; (2) prioritizing matching variants returned by both approaches. See Response to Reviewer 2, Comment 5.

      (2) I would be very interested to see how other fine-mapping methods (FINEMAP, SuSiE, and CAVIAR) perform via the stability idea.

      Thank you for this valuable comment. We ran SuSiE on the same set of simulated datasets. Specifically, we ran a version that uses residualized phenotypes (supposedly removing the effects of population structure), and also a version that incorporates stability. The second version is similar to how we incorporate stability in PICS. We investigated the performance of Stable SuSiE in a similar manner to our investigation of PICS. First we compared the performance relative to SuSiE that was run on residualized phenotypes. Motivated by our finding in PICS that prioritizing matching variants improves causal variant recovery, we did the same analysis for SuSiE. This analysis is described in Results section “Stability guidance improves causal variant recovery in SuSiE.”

      We reported overall matching frequencies and causal variant recovery rates of top and stable variants for SuSiE in Figures 2C&D.

      Frequencies with which Stable and Top SuSiE variants match, stratified by the simulation parameters, are summarized in Supplementary File 2C (reproduced for convenience in Response to Reviewer 2, Comment 3). Causal variant recovery rates split by the number of causal variants simulated, and stratified by both signal-to-noise ratio and the number of credible sets included, are reported in Figure 2—figure supplements 16-18. We reproduce Figure 2—figure supplement 18 (three causal variants scenario) below for convenience. Analogous recovery rates for matching versus non-matching top or stable variants are reported in Figure 2—figure supplements 19, 21 and 23.

      (3) I am a little bit concerned about the PICS's assumption about one causal variant. The authors mentioned this assumption as one of their method limitations. However, given the utility of existing fine-mapping methods (FINEMAP and SuSiE), it is worth exploring this domain.

      Thank you for raising this fair concern. We explored this domain, by considering simulations that include two and three causal variants (see Response to Reviewer 2, Comment 3). We looked at how well PICS recovers causal variants, and found that each potential set largely does not contain more than one causal variant (Figure 2—figure supplements 20 and 22). This can be explained by the fact that PICS potential sets are constructed from variants with a minimum linkage disequilibrium to a focal variant. On the other hand, in SuSiE, we observed multiple causal variants appearing in lower credible sets when applying stability guidance (Figure 2—figure supplements 21 and 23). A more extensive study involving more fine-mapping methods and metrics specific to violation of the one causal variant assumption could be pursued in future work.

      Reviewer #2:

      Aw et al. presents a new stability-guided fine-mapping method by extending the previously proposed PICS method. They applied their stability-based method to fine-map cis-eQTLs in the GEUVADIS dataset and compared it against what they call residualization-based method. They evaluated the performance of the proposed method using publicly available functional annotations and claimed the variants identified by their proposed stability-based method are more enriched for these functional annotations.

      While the reviewer acknowledges the contribution of the present work, there are a couple of major concerns as described below.

      Major:

      (1) It is critical to evaluate the proposed method in simulation settings, where we know which variants are truly causal. While I acknowledge their empirical approach using the functional annotations, a more unbiased, comprehensive evaluation in simulations would be necessary to assess its performance against the existing methods.

      Thank you for this point. We agree. We have performed a simulation study where we assume that causal variants are shared across populations (see response to Reviewer 1, Comment 1). Specifically, by mirroring the simulation approach described in Wang et al. (2020), we generated 2,400 synthetic gene expression phenotypes across 22 autosomes, using GEUVADIS gene expression metadata (i.e., gene transcription start site) to ensure cis expression phenotypes were simulated.

      (2) Also, simulations would be required to assess how the method is sensitive to different parameters, e.g., LD threshold, resampling number, or number of potential sets.

      Thank you for raising this point. The underlying PICS algorithm was not proposed by us, so we followed the default parameters set (LD threshold, r<sup>2</sup> \= 0.5; see Taylor et al., 2021 Bioinformatics) to focus on how stability considerations will impact the existing fine-mapping algorithm. We attempted to derive the asymptotic joint distribution of the p-values, but it was too difficult. Hence, we used 500 permutations because such a large number would allow large-sample asymptotics to kick in. However, following your critical suggestion we varied the number of potential sets in our analyses of simulated data. We briefly mention this in the Results.

      “In the Supplement, we also describe findings from investigations into the impact of including more potential sets on matching frequency and causal variant recovery…”

      A detailed write-up is provided in Supplementary File 1 Section S2 (p.2):

      “The number of credible or potential sets is a parameter in many fine-mapping algorithms. Focusing on stability-guided approaches, we consider how including more potential sets for stable fine-mapping algorithms affects both causal variant recovery and matching frequency in simulations…

      Causal variant recovery. We investigate both Stable PICS and Stable SuSiE. Focusing first on simulations with one causal variant, we observe a modest gain in causal variant recovery for both Stable PICS and Stable SuSiE, most noticeably when the number of sets was increased from 1 to 2 under the lowest signal-to-noise ratio setting…”

      We observed that increasing the number of potential sets helps with recovering causal variants for Stable PICS (Figure 2—figure supplements 13-15). This observation also accounts for the comparable power that Stable PICS has with SuSiE in simulations with low signal-to-noise ratio (SNR), when we increase the number of credible sets or potential sets (Figure 2—figure supplements 10-12).

      (3) Given the previous studies have identified multiple putative causal variants in both GWAS and eQTL, I think it's better to model multiple causal variants in any modern fine-mapping methods. At least, a simulation to assess its impact would be appreciated.

      We agree. In our simulations we considered up to three causal variants in cis, and evaluated how well the top three Potential Sets recovered all causal variants (Figure 2—figure supplements 13-15; Figure 2—figure supplement 15). We also reported the frequency of variant matches between Top and Stable PICS stratified by the number of causal variants simulated in Supplementary File 2B and 2C. Note Supplementary File 2C is for results from SuSiE fine-mapping; see Response to Reviewer 1, Comment 2.

      Supplementary File 2B. Frequencies with which Stable and Top PICS have matching variants for the same potential set. For each SNR/ “No. Causal Variants” scenario, the number of matching variants is reported in parentheses.

      Supplementary File 2C. Frequencies with which Stable and Top SuSiE have matching variants for the same credible set. For each SNR/ “No. Causal Variants” scenario, the number of matching variants is reported in parentheses.

      (4) Relatedly, I wonder what fraction of non-matching variants are due to the lack of multiple causal variant modeling.

      PICS handles multiple causal variants by including more potential sets to return, owing to the important caveat that causal variants in high LD cannot be statistically distinguished. For example, if one believes there are three causal variants that are not too tightly linked, one could make PICS return three potential sets rather than just one. To answer the question using our simulation study, we subsetted our results to just scenarios where the top and stable variants do not match. This mimics the exact scenario of having modeled multiple causal variants but still not yielding matching variants, so we can investigate whether these non-matching variants are in fact enriched in the true causal variants.

      Because we expect causal variants to appear in some potential set, we specifically considered whether these non-matching causal variants might match along different potential sets across the different methods. In other words, we compared the stable variant with the top variant from another potential set for the other approach (e.g., Stable PICS Potential Set 1 variant vs Top PICS Potential Set 2 variant). First, we computed the frequency with which such pairs of variants match. A high frequency would demonstrate that, even if the corresponding potential sets do not have a variant match, there could still be a match between non-corresponding potential sets across the two approaches, which shows that multiple causal variant modeling boosts identification of matching variants between both approaches — regardless of whether the matching variant is in fact causal.

      Low frequencies were observed. For example, when restricting to simulations where Top and Stable PICS Potential Set 1 variants did not match, about 2-3% of variants matched between the Potential Set 1 variant in Stable PICS and Potential Sets 2 and 3 variants in Top PICS; or between the Potential Set 1 variant in Top PICS and Potential Sets 2 and 3 variants in Stable PICS (Supplementary File 2D). When looking at non-matching Potential Set 2 or Potential Set 3 variants, we do see an increase in matching frequencies (between 10-20%) between Potential Set 2 variants and other potential set variants between the different approaches. However, these percentages are still small compared to the matching frequencies we observed between corresponding potential sets (e.g., for simulations with one causal variant this was 70-90% between Top and Stable PICS Potential Set 1, and for simulations with two and three causal variants this was 55-78% and 57-79% respectively).

      We next checked whether these “off-diagonal” matching variants corresponded to the true causal variants simulated. Here we find that the causal variant recovery rate is mostly less than the corresponding rate for diagonally matching variants, which together with the low matching frequency suggests that the enrichment of causal variants of “off-diagonal” matching variants is much weaker than in the diagonally matching approach. In other words, the fraction of non-matching (causal) variants due to the lack of multiple causal variant modeling is low.

      We discuss these findings in Supplementary File 1 Section S2 (bottom of p.2).

      (5) I wonder if you can combine the stability-based and the residualization-based approach, i.e., using the residualized phenotypes for the stability-based approach. Would that further improve the accuracy or not?

      This is a good idea, thank you for suggesting it. We pursued this combined approach on simulated gene expression phenotypes, but did not observe significant gains in causal variant recovery (Figure 2B; Figure 2—figure supplements 2, 13 and 15). We reported this Results “Searching for matching variants between Top PICS and Stable PICS improves causal variant Recovery.”

      “We thus explore ways to combine the residualization and stability-driven approaches, by considering (i) combining them into a single fine-mapping algorithm (we call the resulting procedure Combined PICS); and (ii) prioritizing matching variants between the two algorithms. Comparing the performance of Combined PICS against both Top and Stable PICS, however, we find no significant difference in its ability to recover causal variants (Figure 2B)...”

      However, we also confirmed in our simulations that prioritizing matching variants between the two approaches led to gains in causal variant recovery (Figure 2D; Figure 2—figure supplements 4, 19, 20 and 22). We reported this Results “Searching for matching variants between Top PICS and Stable PICS improves causal variant Recovery.”

      “On the other hand, matching variants between Top and Stable PICS are significantly more likely to be causal. Across all simulations, a matching variant in Potential Set 1 is 2.5X as likely to be causal than either a non-matching top or stable variant (Figure 2D) — a result that was qualitatively consistent even when we stratified simulations by SNR and number of causal variants simulated (Figure 2—figure supplements 19, 20 and 22)...”

      This finding is consistent with our analysis of real GEUVADIS gene expression data, where we reported larger functional significance of matching variants relative to non-matching variants returned by either Top of Stable PICS.

      (6) The authors state that confounding in cohorts with diverse ancestries poses potential difficulties in identifying the correct causal variants. However, I don't see that they directly address whether the stability approach is mitigating this. It is hard to say whether the stability approach is helping beyond what simpler post-hoc QC (e.g., thresholding) can do.

      Thank you for raising this fair point. Here is a model we have in mind. Gene expression phenotypes (Y) can be explained by both genotypic effects (G, as in genotypic allelic dosage) and the environment (E): Y = G + E. However, both G and E depend on ancestry (A), so that Y = G|A+E|A. Suppose that the causal variants are shared across ancestries, so that (G|A=a)=G for all ancestries a. Suppose however that environments are heterogeneous by ancestry: (E|A=a) = e(a) for some function e that depends non-trivially on a. This would violate the exchangeability of exogenous E in the full sample, but by performing fine-mapping on each ancestry stratum, the exchangeability of exogenous E is preserved. This provides theoretical justification for the stability approach.

      We next turned to simulations, where we investigated 1,440 simulated gene expression phenotypes capturing various ways in which ancestry induces heterogeneity in the exogenous E variable (simulation details in Lines 576-610 of Materials and Methods). We ran Stable PICS, as well as a version of PICS that did not residualize phenotypes or apply the stability principle. We observed that (i) causal variant recovery performance was not significantly different between the two approaches (Figure 2—figure supplements 24-32); but (ii) disagreement between the approaches can be considerable, especially when the signal-to-noise ratio is low (Supplementary File 2A). For example, in a set of simulations with three causal variants, with SNR = 0.11 and E heterogeneous by ancestry by letting E be drawn from N(2σ,σ<sup>2</sup>) for only GBR individuals (rest are N(0,σ<sup>2</sup>)), there was disagreement between Potential Set 1 and 2 variants in 25% of simulations — though recovery rates were similar (Probability of recovering at least one causal variant: 75% for Plain PICS and 80% for Stable PICS). These points suggest that confounding in cohorts can reduce power in methods not adjusting or accounting for ancestral heterogeneity, but can be remedied by approaches that do so. We report this analysis in Results “Simulations justify exploration of stability guidance”

      In the current version of our work, we have evaluated, using both simulations and empirical evidence, different ways to combine approaches to boost causal variant recovery. Our simulation study shows that prioritizing matching variants across multiple methods improves causal variant recovery. On GEUVADIS data, where we might not know which variants are causal, we already demonstrated that matching variants are enriched for functional annotations. Therefore, our analyses justify that the adverse consequence of confounding on reducing fine-mapping accuracy can be mitigated by prioritizing matching variants between algorithms including those that account for stability.

      (7) For non-matching variants, I wonder what the difference of posterior probabilities is between the stable and top variants in each method. If the difference is small, maybe it is due to noise rather than signal.

      We have reported differences in posterior probabilities returned by Stable and Top PICS for GEUVADIS data; see Figure 3—figure supplement 1. For completeness, we compute the differences in posterior probabilities and summarize these differences both as histograms and as numerical summary statistics.

      Potential Set 1

      - Number of non-matching variants = 9,921

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 1.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 1.

      Potential Set 2

      - Number of non-matching variants = 14,454

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 2.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 2.

      Potential Set 3

      - Number of non-matching variants = 16,814

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 3.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 3.

      We also compared the difference in posterior probabilities between non-matching variants returned by Stable PICS and Top PICS for our 2,400 simulated gene expression phenotypes. Focusing on just Potential Set 1 variants, we find two equally likely scenarios, as demonstrated by two distinct clusters of points in a “posterior probability-posterior probability” plot. The first is, as pointed out, a small difference in posterior probability (points lying close to y=x). The second, however, reveals stable variants with very small posterior probability (of order 4 x 10<sup>–5</sup> to 0.05) but with a non-matching top variant taking on posterior probability well distributed along [0,1]. Moving down to Potential Sets 2 and 3, the distribution of pairs of posterior probabilities appears less clustered, indicating less tendency for posterior probability differences to be small ( Figure 2—figure supplement 8).

      Here are the histograms and numerical summary statistics.

      Potential Set 1

      - Number of non-matching variants = 663 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 4.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 4.

      Potential Set 2

      Number of non-matching variants = 1,429 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 5.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 5.

      Potential Set 3

      - Number of non-matching variants = 1,810 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 6.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 6.

      (8) It's a bit surprising that you observed matching variants with (stable) posterior probability ~ 0 (SFig. 1). What are the interpretations for these variants? Do you observe functional enrichment even for low posterior probability matching variants?

      Thank you for this question. We have performed a thorough analysis of matching variants with very low stable posterior probability, which we define as having a posterior probability < 0.01 (Supplementary File 1 Section S11). Here, we briefly summarize the analysis and key findings.

      Analysis

      First, such variants occur very rarely — only 8 across all three potential sets in simulations, and 17 across all three potential sets for GEUVADIS (the latter variants are listed in Supplementary 2E). We begin interpreting these variants by looking at allele frequency heterogeneity by ancestry, support size — defined as the number of variants with positive posterior probability in the ALL slice* — and the number of slices including the stable variant (i.e., the stable variant reported positive posterior probability for the slice).

      *Note that the stable variant posterior probability need not be at least 1/(Support Size). This is because the algorithm may have picked a SNP that has a lower posterior probability in the ALL slice (i.e., not the top variant) but happens to appear in the most number of other slices (i.e., a stable variant).

      For variants arising from simulations, because we know the true causal variants, we check if these variants are causal. For GEUVADIS fine-mapped variants, we rely on functional annotations to compare their relative enrichment against other matching variants that did not have very low stable posterior probability.

      Findings

      While we caution against generalizing from observations reported here, which are based on very small sample sizes, we noticed the following. In simulations, matching variants with very low stable posterior probability are largely depleted in causal variants, although factors such as the number of slices including the stable variant may still be useful. In GEUVADIS, however, these variants can still be functionally enriched. We reported three examples in Supplementary File 1 Section S11 (pp. 8-9 of Supplement), where the variants were enriched in either VEP or biologically interpretable functional annotations, and were also reported in earlier studies. We partially reproduce our report below for convenience.

      “However, we occasionally found variants that stand out for having large functional annotation scores. We list one below for each potential set.

      - Potential Set 1 reported the variant rs12224894 from fine-mapping ENSG00000255284.1 (accession code AP006621.3) in Chromosome 11. This variant stood out for lying in the promoter flanking region of multiple cell types and being relatively enriched for GC content with a 75bp flanking region. This variant has been reported as a cis eQTL for AP006632 (using whole blood gene expression, rather than lymphoblastoid cell line gene expression in this study) in a clinical trial study of patients with systemic lupus erythematosus (Davenport et al., 2018). Its nearest gene is GATD1, a ubiquitously expressed gene that codes for a protein and is predicted to regulate enzymatic and catabolic activity. This variant appeared in all 6 slices, with a moderate support size of 23.

      - Potential Set 2 reported the variant rs9912201 from fine-mapping ENSG00000108592.9 (mapped to FTSJ3) in Chromosome 17. Its FIRE score is 0.976, which is close to the maximum FIRE score reported across all Potential Set 2 matching variants. This variant has been reported as a SNP in high LD to a GWAS hit SNP rs7223966 in a pan-cancer study (Gong et al., 2018). This variant appeared in all 6 slices, with a moderate support size of 32.

      - Potential Set 3 reported the variant rs625750 from fine-mapping ENSG00000254614.1 (mapped to CAPN1-AS1, an RNA gene) in Chromosome 11. Its FIRE score is 0.971 and its B statistic is 0.405 (region under selection), which lie at the extreme quantiles of the distributions of these scores for Potential Set 3 matching variants with stable posterior probability at least 0.01. Its associated mutation has been predicted to affect transcription factor binding, as computed using several position weight matrices (Kheradpour and Kellis, 2014). This variant appeared in just 3 slices, possibly owing to the considerable allele frequency difference between ancestries (maximum AF difference = 0.22). However, it has a small support size of 4 and a moderately high Top PICS posterior probability of 0.64.

      To summarize, our analysis of GEUVADIS fine-mapped variants demonstrates that matching variants with very low stable posterior probability could still be functionally important, even for lower potential sets, conditional on supportive scores in interpretable features such as the number of slices containing the stable variant and the posterior probability support size…”

    1. eLife Assessment

      This manuscript presents useful insights into the molecular basis underlying the positive cooperativity between the co-transported substrates (galactoside sugar and sodium ion) in the melibiose transporter MelB. Building on years of previous studies, this convincing study improves on the resolution of previously published structures and reports the presence of a water molecule in the sugar binding site that would appear to be key for its recognition, introduces further structures bound to different substrates, and utilizes binding and transport assays, as well as HDX-MS and molecular dynamics simulations to further understand the positive cooperativity between sugar and the co-transported sodium cation. The work will be of interest to biologists and biochemists working on cation-coupled symporters, which mediate the transport of a wide range of solutes across cell membranes.

    2. Reviewer #1 (Public review):

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remains unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen deuterium exchange (HDX) mass spectrometry. They first report 4 different crystal structures of galactose derivatives to explore molecular recognition showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      The results from the crystallography appear sensible, though the resolution of the data is low with only the structure with NPG better than 3Å. Support for the conclusion of the water molecule in the binding site, as interpreted from the density, is given by MD studies.

      The HDX also appears to be well done and is explained reasonably well in the revision.

    3. Reviewer #3 (Public review):

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelBSt) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na⁺, H⁺, or Li⁺, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na+ by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na+ and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26 forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na+, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      The revised manuscript shows clear improvement, and the authors have addressed my concerns in a satisfactory manner. Of note, I noticed two mistakes that should be corrected:

      - page 11. Unless I am mistaken, the sentence "In contrast, Na+ alone or with melibiose primarily caused deprotections" should be corrected with "protections". The authors may wish to verify this sentence and also the previous one in the main text.

      - Figure 8 displays two cytoplasmic gates (one of them should be periplasmic)

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This manuscript presents useful insights into the molecular basis underlying the positive cooperativity between the co-transported substrates (galactoside sugar and sodium ion) in the melibiose transporter MelB. Building on years of previous studies, this work improves on the resolution of previously published structures and reports the presence of a water molecule in the sugar binding site that would appear to be key for its recognition, introduces further structures bound to different substrates, and utilizes HDX-MS to further understand the positive cooperativity between sugar and the co-transported sodium cation. Although the experimental work is solid, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion, as well as a clearer description of the new insight that is obtained in relation to previous studies. The work will be of interest to biologists and biochemists working on cation-coupled symporters, which mediate the transport of a wide range of solutes across cell membranes.

      We express our gratitude to the associate editor, review editor, and reviewers for their favorable evaluation of this manuscript, as well as their constructive comments and encouragement. Their feedback has been integrated to fortify the evidence, refine the data analysis, and elevate the presentation of the results, thereby enhancing the overall quality and clarity of the manuscript.

      A brief summary of the modifications in this revision:

      (a) We performed four new experiments: 1) intact cell [<sup>3</sup>H]raffinose transport assay; 2) intact cell p-nitrophenol detection to demonstrate α-NPG transport; 3) ITC binding assay for the D59C mutant; and 4) molecular dynamics to simulate the water-1 in sugar-binding site and the dynamics of side chains in the Na<sup>+</sup>- and melibiose-binding pockets. All data consistently support the conclusion draw in this article.

      (b) We have added a new figure to show the apo state dynamics (the new Fig. 5a,b) and annotated the amino acid residue positions and marked positions in sugar- or Na<sup>+</sup>-binding pockets.

      (c) As suggested by reviewer-3, we have moved the individual mapping of ligand effects on HDX data to the main figure, combined with the residual plots, and marked the amino-acid residue positions.

      (d) We have added more deuterium uptake plots to cover all residues in the sugar- or Na<sup>+</sup>-binding pockets in the current figure 7 (previously figure 6).

      (e) We have added a new figure 8 showing the positions at the well-studied cytoplasmic gating salt-bridge network and other loops likely important for conformational changes, along with a membrane topology marked with the HDX data. We have added a new figure 9 from MD simulations.

      Reviewer #1:

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remain unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen-deuterium exchange (HDX) mass spectrometry.

      (1) They first report 4 different crystal structures of galactose derivatives to explore molecular recognition, showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      We thank you for understanding what we've presented in this manuscript.

      (2) The results from the crystallography appear sensible, though the resolution of the data is low, with only the structure with NPG better than 3Å. However, it is a bit difficult to understand what novel information is being brought out here and what is known about the ligands. For instance, are these molecules transported by the protein or do they just bind? They measure the affinity by ITC, but draw very few conclusions about how the affinity correlates with the binding modes. Can the protein transport the trisaccharide raffinose?

      The four structures with bound sugars of different sizes were used to identify the binding motif on both the primary substrate (sugar) and the transporter (MelB<sub>St</sub>). Although the resolutions of the structures complexed with melibiose, raffinose, or a-MG are relatively low, the size and shape of the densities at each structure are consistent with the corresponding sugar molecules, which provide valuable data for confirming the pose of the bound sugar proposed previously. In this revision, we further refine the α-NPG-bound structure to 2.60 Å. The identified water-1 in this study further confirms the orientation of C4-OH. Notably, this transporter does not recognize or transport glucosides in which the orientation of the C4-OH at the glucopyranosyl ring is opposite. To verify the water in the sugar-binding site, we initiated a new collaborative study using MD simulations. Results showed that Wat-1 exhibited nearly full occupancy when melibiose was present, regardless of whether Na<sup>+</sup> was bound at the cation-binding site.

      As detailed in the Summary, we added two additional sets of transport assays and confirmed that raffinose and α-NPG are transportable substrates of MelB<sub>St</sub>. For α-NPG transport, we measured the end products of the process—enzyme hydrolysis and membrane diffusion of p-nitrophenol released from intracellular α-NPG.

      As a bonus, based on the WT-like downhill α-NPG transport activity by the D59C uniporter mutant that failed in active transport against a sugar concentration gradient, we further emphasized that the sugar translocation pathway is isolated from the cation-binding site. The new data strongly support the allosteric effects of cation binding on sugar-binding affinity. Thank you for this helpful suggestion.

      A meaningful analysis of ITC data heavily depends on the quality of the data. My laboratory has extensive experience with ITC and has gained rich, insightful mechanistic knowledge of MelB<sub>St</sub>. Because of the low affinity in raffinose and a-MG, unfortunately, no further information can be convincingly obtained. Therefore, we did not dissect the enthalpic and entropic contributions but focused on the Kd value and binding stoichiometry.

      (3) The HDX also appears to be well done; however, in the manuscript as written, it is difficult to understand how this relates to the overall mechanism of the protein and the conformational changes that the protein undergoes.

      We are sorry for not presenting our data clearly in the initial submission. In this revised manuscript, we have made numerous improvements, as described in the Summary. These enhancements in the HDX data analysis provided new mechanistic insights into the allosteric effects, leading us to conclude that protein dynamics and conformational transitions are coupled with sugar-binding affinity. Na<sup>+</sup> binding restricts protein conformational flexibility, thereby increasing sugar-binding affinity. The HDX study revealed that the major dynamic region includes a sugar-binding residue, Arg149, which also plays a gating role. Structurally, this dual-function residue undergoes significant displacement during the sugar-affinity-coupled conformational transition, thereby coupling the sugar binding and structural dynamics.

      Reviewer #2:

      This manuscript from Hariharan, Shi, Viner, and Guan presents x-ray crystallographic structures of membrane protein MelB and HDX-MS analysis of ligand-induced dynamics. This work improves on the resolution of previously published structures, introduces further sugar-bound structures, and utilises HDX to explore in further depth the previously observed positive cooperatively to cotransported cation Na<sup>+</sup>. The work presented here builds on years of previous study and adds substantial new details into how Na<sup>+</sup> binding facilitates melibiose binding and deepens the fundamental understanding of the molecular basis underlying the symport mechanism of cation-coupled transporters. However, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion.

      We appreciate this reviewer's time in reading our previous articles related to this manuscript.

      Comments on Crystallography and biochemical work:

      (1) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      This figure is a stereo view of a density map created in cross-eye style. In this revision, we changed this figure to Fig. 3 and showed only the density for sugar and water-1. 

      (2) It is slightly unclear what the ITC measurements add to this current manuscript. The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but it is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend. Additionally, the D59C mutant utilised here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium.

      Thank this reviewer for your helpful suggestions. We have performed the suggested ITC measurements with the D59C mutant. The purpose of the ITC experiments was to demonstrate that MelB<sub>St</sub> can bind raffinose and α-MG to support the crystal structures.

      Comments on HDX-MS work:

      While the use of HDX-MS to deepen the understanding of ligand allostery is an elegant use of the technique, this reviewer advises the authors to refer to the Masson et al. (2019) recommendations for the HDX-MS article (https://doi.org/10.1038/s41592-019-0459-y) on how to best present this data. For example:

      All authors value this reviewer's comments and suggestions, which have been included in this revision.

      (1) The Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilised protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      Yes, a lipid/detergent removal step was included in this study and previous ones, and this information was clearly described in the Methods.

      (2) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We have updated the Table S2 and addressed the reviewer’ request for the details of HDX experiments.

      (3) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We have prepared and presented deuterium uptake time-course plots for any peptides with ΔD > threshold in Fig. S5a-c.

      (4) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, this reviewer understands that working with dynamic transporters can lead to increased data variation; a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We appreciate this comment and have cited the suggested article on the hybrid significance method. We fully acknowledge that using a cutoff of P < 0.05 can increase the likelihood of false-positive identifications. By applying multiple levels of statistical testing, we determined that P < 0.05 is an appropriate threshold for this study. The threshold values were presented in the residual plots and explained in the text. For the previous Fig. 6 (renamed Fig. S4b in the current version), we have reported the P value. *, < 0.05; **, < 0.01. (The text for 0.01 was not visible in the previous version. Sorry for the confusion.)

      (5) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity. The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there is no difference between the dynamics of each site.

      The current Table S3 (combined from previous Tables S3 and S4 as suggested) was prepared to provide an overall view of the dynamic regions with SD values provided. For other questions, if we understand correctly, this reviewer asked us to comment on the effects of solvent accessibility or hydrophobic regions on the overall dynamics outside the binding residues of the peptides that cover them. Since HDX rates are influenced by two linked factors: solvent accessibility and hydrogen-bonding interactions that reflect structural dynamics, poor solvent accessibility in buried regions should result in low deuterium uptakes. The peptides in our dataset that include the Na<sup>+</sup>-binding site showed lower HDX, likely due to limited solvent accessibility and lower structural stability. It is unclear what this reviewer meant by "increased dynamics at peptides covering the Na binding site on overall more dynamic helices." We did not observe increased dynamics in peptides covering the Na<sup>+</sup>-binding site; instead, all Na<sup>+</sup>-binding residues and nearby sugar-binding residues have lower degrees of deuteriation.

      (6) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Thanks for this suggestion. The previous datasets were collected in the presence of Na<sup>+</sup>. In the current study, we also have two Na<sup>+</sup>-containing datasets. Both showed similar results: the multiple overlapping peptides covering the sugar-binding residues on helices I and V have higher HDX rates than those peptides covering the Na<sup>+</sup>-binding residues, even when Na<sup>+</sup> was present.

      (7) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Thank you for this suggestion. Comparing HDX-MS between the WT and the D59C mutant is certainly interesting, especially with the increasing amount of structural, biochemical, and biophysical data now available for this mutant. However, due to limited resources, we might consider it later.

      (8) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      We have shown that Li<sup>+</sup> also works positively with melibiose. Li<sup>+</sup> binds to MelB<sub>St</sub> with a higher affinity than Na<sup>+</sup> and modifies MelB<sub>St</sub> differently. It is important to study this thoroughly and separately. To answer the second question, H<sup>+</sup> is a weak coupling cation with little effect on melibiose binding. Since its pKa is around 6.5, only a small population of MelB<sub>St</sub> is protonated at pH 7.5. The order of sugar-binding cooperativity is highest with Na<sup>+</sup>, then Li<sup>+</sup>, and finally H<sup>+</sup>.

      (9) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to the OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      The sugar translocation free-energy landscape simulations showed that both helix bundles move relative to the membrane plane. This analysis aimed to clarify a hypothesis in the field—that the MFS transporter can use an asymmetric mode to perform the conformational transition between inward- and outward-facing states. In the case of MelB<sub>St</sub>, we clearly demonstrated that both domains move and each helix bundle moves as a unit. So only a small number of helices and loops showed labeling changes. Thanks for the suggestion about comparing with XylE. We have included that in the discussion.

      (10) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate, this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Thank this reviewer for reading the single-molecule force spectroscopy (SMFS) study on MelB<sub>St</sub>.  The C3 loop mentioned in this SMFS article is partially covered in the dataset Mel or Mel plus Na<sup>+</sup> vs. apo, and there is more coverage in the Na<sup>+</sup> vs. apo dataset. In either condition, no deprotection was detected. The labeling time point might not be long enough to detect it.

      Reviewer #3:

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelB<sub>St</sub>) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na<sup>+</sup>, H<sup>+</sup>, or Li<sup>+</sup>, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na<sup>+</sup> by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na<sup>+</sup> and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26, which forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na<sup>+</sup>, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      Thank this reviewer for your positive comments.

      Weaknesses:

      (1) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373.

      A water molecule can be modeled at a resolution ranging from 2.4 to 3.2 Å, and the quality of the model depends on the map quality and water location. In this revision, we refined the resolution to 2.6 Å using the same dataset and also performed all-atom MD simulations. All results support the occupancy of water-1 in the sugar-bound MelB<sub>St</sub>.

      (2) Site-directed mutagenesis could help strengthen the conclusions of the authors. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr121, Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the claims of the authors regarding the allosteric communication between the two substrate-binding sites.

      The authors thank this reviewer for the thoughtful suggestions. MelB<sub>St</sub> has been subjected to Cys-scanning mutagenesis (https://doi.org/10.1016/j.jbc.2021.101090). Placing a Cys residue at Gln372 significantly decreased the transport initial rate, accumulation, and melibiose fermentation, with minimal effect on protein expression, as shown in Figure 2 of this JBC article, which could support its role in the binding pocket. The T373C mutant retained most of the WT's activities. Our previous studies showed that Thr121 is only responsible for Na<sup>+</sup> binding in MelB<sub>St</sub>, and mutations decreased protein stability; now, HDX reveals that this is the rigid position. Additionally, our previous studies indicated that Arg295 is another conformationally important residue. In this version, we have added more HDX analysis to explore the relationship between the two substrate-binding sites with conformational dynamics, especially focusing on the gating salt-bridge network including Arg295, which has provided meaningful new insights.

      (3) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. It would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter more visible. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      Thanks for this important question. We have added more discussion of the deprotected data and prepared a new Fig. 8b to highlight the melibiose-binding-induced flexibility in several loops, especially the gating area on both sides of the membrane. We also proposed that these changes might facilitate the formation of the transition-competent state. The overall effects induced by substrate binding are relatively small, and the datasets for apo and Na were collected separately, so comparing melibiose&Na<sup>+</sup> versus Na<sup>+</sup> might not be as precise. In fact, the Na<sup>+</sup> effects on the sugar-binding site can be clearly seen in the deuterium uptake plots shown in Figures 7-8, by comparing the first and last panels.

      (4) For non-specialists, it would be beneficial to better introduce and explain the choice of using D59C for the structural analyses.

      Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this thermostable mutant D59C selectively abolishes all cation binding and associated cotransport activities, but it maintains intact sugar binding and exhibits conformational transition as the WT, as demonstrated by electroneutral transport reactions including α-NPG transport showed in this articles, and melibiose exchange and fermentation showed previously. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport, which supports the conclusion that the Na<sup>+</sup> functions as allosteric activator.

      (5) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226). I would recommend indicating more of them in areas where deuterium changes are substantial.

      We appreciate this comment and have modified the plots by marking the residue position as well as labeled several peptides of significant HDX in the Fig 5b. We also provided a deuteriation map based on peptide coverage (Fig. 5a).

      (6) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      This is an intriguing mechanistic question. In this HDX study, we found that the cation-binding pocket and nearby sugar-binding residues are conformationally rigid, while some sugar-binding residues farther from the cation-binding pocket are flexible. We concluded that conformational dynamics regulate sugar-binding affinity, but the increase in Na-binding affinity caused by melibiose is not related to protein dynamics. Our previous interpretation based on structural data remains our preferred explanation; therefore, the bound melibiose physically prevents the release of Na<sup>+</sup> or Li<sup>+</sup> from the cation-binding pocket. We also proposed the mechanism of intracellular NA<sup>+</sup> release in the 2024 JBC paper (https://doi.org/10.1016/j.jbc.2024.107427); after sugar release, the rotamer change of Asp55 will help NA<sup>+</sup> exit the cation pocket into the empty sugar pocket, and the negative membrane potential inside the cell will further facilitate movement from MelB<sub>St</sub> to the cytosol.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) It would help the reader if the previous work were introduced more clearly, and if the results of the experiments reported in this manuscript were put into the context of the previous work. Lines 283-296 discuss observations that are similar to previous reported structures as well as novel interpretations. It would help the reader to be clearer about what the new observations are.

      Thank you for the important comment. We have revised accordingly by adding related citations and words “as showed previously” when we stated our previous observations.

      (2) The affinity by ITC is measured for various ligands, but very few conclusions are drawn about how the affinity correlates with the binding modes. Are the other ligands that are investigated in this study transported by the protein, or do they just bind? Can the protein transport the trisaccharide raffinose? The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but this is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend.

      Additionally, the D59C mutant utilized here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium. For non-specialists, please better introduce and explain the choice of using D59C for the structural analyses.

      Thank you for the meaningful comments. We have comprehensively addressed all the concerns and suggestions as listed in the summary of this revision. Notably, the D59C mutant does not catalyze any electrogenic melibiose transport involved in a cation transduction but catalyze downhill transport location of the galactosides, as shown by the downhill α-NPG transport assay in Fig. 1a. The intact downhill transport results from D59C mutant further supports the allosteric coupling between the cation- and sugar-binding sites.

      The binding isotherm and poor affinity of the ITC measurements do not support to further analyze the binding mode since none showed sigmoidal curve, so the enthalpy change cannot be accurately determined. But authors thank this comment.

      (3) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #1.

      (4) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373. Please change line 278 to state "this OH-4 water molecule is likely part of sugar binding".

      We have addressed these concerns in the response to the Public Reviews at reviewer-3 #1.

      (5) Line 290-296: The Thr121 is not represented in any figures, while the Lys377 is. Their relative positioning between sugar water and sodium is not made clear by any figure.

      Thanks for this comment. This information has been clearly presented in the Figs. 7-8. Lys377 is closer to the cation site and related far from the sugar-binding site.

      (6) Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilized protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      (7) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (8) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (9) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, and this reviewer understands that working with dynamic transporters can lead to increased data variation, a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (10) The table (S3) and figure (S4) showing uncovered residues is an unclear interpretation of the data; this would be better given as a peptide sequence coverage heat map. This would also be more informative for the redundancy in covered regions, too. In this way, S3 and S4 can be combined.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (11) Residual plots in Figure 5 could be improved by a topological map to indicate how peptide number resembles the protein amino acid sequence.

      Thanks for the request, due to the figure 6 is big so that we add a transmembrane topology plot colored with the HDX results in Fig. 8c.

      (12) The presentation of data in S5 could be clarified. Does the number of results given in the brackets indicate overlapping peptides? What are the lengths of each of these peptides? Classical HDX data presentation utilizes blue for protection and red for deprotection. The use of yellow ribbons to show protection in non-sugar binding residues takes some interpretation and could be clarified by also depicting in a different blue. I also don't see the need to include ribbon and cartoon representation when also using colors to depict protection and deprotection. The authors should change or clarify this choice.

      We have moved this figure into the current Fig. 6b as suggested by Reviewer-3. To address your questions listed in the figure legend, the number of results shown in brackets indeed indicates overlapping peptides. What are the lengths of each of these peptides? The sequences of each peptide are shown in Figures 7-8 and are also included in Supplemental Figure S5. Regarding the use of color, both blue and green were used to distinguish peptides protecting the substrate-binding site from other regions. The ribbon and cartoon representations are provided for clarity, as the cartoon style hides many helices.

      (13) In Table S5, the difference between valid points and protection is unclear. And what is indicated by numbers in brackets or slashes? Additionally, it should be highlighted again here that single-residue information is inferred from peptide-level data. By value, are the authors referring to peptide-level differential data?

      Please review our responses in the Public Reviews at reviewer-2 #5.

      (14) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity? The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there isn't a difference between the dynamics of each site.

      Please review our responses in the Public Reviews at reviewer-2 #5.

      (15) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Please review our responses in the Public Reviews.

      (16) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to the OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences instead between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      Please review our responses in the Public Reviews.

      (17) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Please review our responses in the Public Reviews.

      (18) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation more visible to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. You would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      Please review our responses in the Public Reviews.

      (19) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226); I would recommend indicating more of them, in areas where deuterium changes are substantial.

      Please review our responses in the Public Reviews.

      (20) Figure 6, please indicate in the legend what the black and blue lines are (I assume black is for the apo?)

      We are sorry that we did not make it clear. Yes, the black was used for apo state and blue was used for all bound states

      (21) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      Please review our responses in the Public Reviews.

      Addressing the following three points would strengthen the manuscript, but also involve a significant amount of additional experimental work. If the authors decide not to carry out the experiments described below, they can still improve the assessment by focusing on points (1-21) described above.

      (22) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Please review our responses in the Public Reviews.

      (23) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      Please review our responses in the Public Reviews.

      (24) Site-directed mutagenesis could help strengthen the conclusions. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr 121 and Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the authors' claims regarding allosteric communication between the two substrate-binding sites.

      Please review our responses in the Public Reviews.

    1. eLife Assessment

      This important study uses standard single-cell RNA-seq analyses combined with methods from the social sciences to reduce heterogeneity in gene expression in Drosophila imaginal wing disc cells treated with 4000 rads of ionizing radiation. The use of this methodology from social sciences is novel in Drosophila and allows them to identify a subpopulation of cells that is disproportionately responsible for much of the radiation-induced gene expression. Their compelling analyses reveal genes that are expressed regionally after irradiation, including ligands and transcription factors that have been associated with regeneration, as well as others whose roles in response to irradiation are unknown. This paper would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyze transcription in single cells before and after 4000 rads of ionizing radiation. They use Seuratv5 for their analyses, which allows them to show that most of the genes cluster along the proximal-distal axis. Due to the high heterogeneity in the transcripts, they use the Herfindahl-Hirschman index (HHI) from Economics, which measures market concentration. Using the HHI, they find that genes involved in several processes (like cell death, response to ROS, DNA damage response (DDR)) are relatively similar across clusters. However, ligands activating the JAK/STAT, Pvr, and JNK pathways and transcription factors Ets21C and dysf are upregulated regionally. The JAK/STAT ligands Upd1,2,3 require p53 for their upregulation after irradiation, but the normal expression of Upd1 in unirradiated discs is p53-independent. This analysis also identified a cluster of cells that expressed tribbles, encoding a factor that downregulates mitosis-promoting String and Twine, that appears to be G2/M arrested and expressed numerous genes involved in apoptosis, DDR, the aforementioned ligands and TFs. As such, the tribbles-high cluster contains much of the heterogeneity.

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs and analyzing the data with Seurat v5 and HHI.<br /> (2) These data will be informative for the field.<br /> (3) Most of the data is well-presented.<br /> (4) The literature is appropriately cited.

      Weaknesses

      The authors have addressed my concerns in the revised article.

    3. Reviewer #2 (Public review):

      This manuscript investigates the question of cellular heterogeneity using the response of Drosophila wing imaginal discs to ionizing radiation as a model system. A key advance here is the focus on quantitatively expressing various measures of heterogeneity, leveraging single-cell RNAseq approaches. To achieve this goal, the manuscript creatively uses a metric from the social sciences called the HHI to quantify the spatial heterogeneity of expression of individual genes across the identified cell clusters. Inter- and intra-regional levels of heterogeneity are revealed. Some highlights include identification of spatial heterogeneity in expression of ligands and transcription factors after IR. Expression of some of these genes shows dependence on p53. An intriguing finding, made possible by using an alternative clustering method focusing on cell cycle progression, was the identification of a high-trbl subset of cells characterized by concordant expression of multiple apoptosis, DNA damage repair, ROS related genes, certain ligands and transcription factors, collectively representing HIX genes. This high-trbl set of cells may correspond to an IR-induced G2/M arrested cell state.

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      The authors responded well to my suggestions for improvement, which were incorporated in the revised version of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Cruz and colleagues report a single cell RNA sequencing analysis of irradiated Drosophila larval wing discs. This is a pioneering study because prior analyses used bulk RNAseq analysis so differences at single cell resolution were not discernable. To quantify heterogeneity in gene expression, the authors make clever use of a metric used to study market concentration, the Herfindahl-Hirschman Index. They make several important observations including region-specific gene expression coupled with heterogeneity within each region and the identification of a cell population (high Trbl) that seems disproportionately responsible for radiation-induced gene expression.

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occurs in response to uniform induction of damage by X-rays in a single layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration and development.

      Weaknesses:

      The authors have addressed my concerns adequately with changes made in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewing Editor Comment:

      The reviewers felt that the study could be improved by (1) better integrating the results with the existing literature in the field

      (1) In the Introduction and Results section of the manuscript, we had made every attempt to cite the relevant literature. (Reviewer 1 stated that “The literature is appropriately cited”). We agree with the Reviewing Editor that rather than simply cite the relevant literature, we could have done a better job of integrating our findings with what has been previously discovered by others. We have attempted to do this in the revised manuscript. Also, we have included many additional citations in the Introduction and in the first section of the Results where work by others has provided a framework for interpreting our single-cell studies.

      and (2) manipulating Trib expression and analyzing the expression of 1-2 HIX genes.

      (2) We are grateful for this suggestion. As suggested by the Reviewing Editor we have attempted to increase and decrease trbl expression and assess the effect on expression of two genes, Swim and CG15784.

      We increased trbl levels in the wing pouch using rn-Gal4, tub-Gal80<sup>ts</sup> and UAS-trbl. By transferring larvae for 24 h from 18oC to 31oC, we were able to induce trbl expression in the wing pouch. When these larvae were irradiated at 4000 rad, we found reduced levels of apoptosis in the wing pouch of discs that overexpressed trbl (Figure 7-figure supplement 1). This indicated that upregulation of trbl is radioprotective. Consistent with our findings, others have previously shown that upregulation of trbl and stalling in the G2 phase of the cells cycle protects cells from JNK-induced apoptosis (Cosolo et al., 2019, PMID:30735120) or that downregulating the G2/M progression promoting factor string protects cells from X-ray radiation induced apoptosis (Ruiz-Losada et al., 2021, PMID:34824391).

      As suggested by the Reviewing Editor, we also examined the effect of trbl overexpression on the induction of two “highly induced by X-ray irradiation (HIX)” gene, Swim and CG15784. Increasing trbl expression had no effect on the induction of Swim and only a modest decrease in the induction of CG15784 (Figure 7-figure supplement 2). Thus, increasing trbl expression, is in itself, insufficient to promote HIX gene expression indicating that other factors are necessary for HIX gene induction.

      We also attempted to reduce trbl expression, using three different RNAi lines. While some of these lines have been used previously by others to reduce trbl expression under unirradiated conditions (Cosolo et al., 2019, PMID:30735120), we nevertheless wanted to check if they reduced trbl induction following irradiation. For each of the three lines, we observed no obvious reduction in trbl RNA following irradiation when visualized using HCR (Author response image 1). Thus, any effects on gene expression that we observe could not be attributed to a decrease in trbl expression. We have therefore included the images showing a lack of knockdown in this Response to Reviews document but not included these experiments in the revised manuscript.

      Author response image 1.

      RNA in situ hybridizations using the hybridization chain reaction performed using probes to trbl. In A-F, the RNAi is expressed using nubbin-Gal4. In G-I the RNAi is expressed using rn-Gal4, tub-Gal80<sup>ts</sup>. white-RNAi was used as a control (A, B, G, H). Three different RNAi lines directed against trbl were tested: Vienna lines VDRC 106774 (C, D) and VDRC 22113 (E, F), and Bloomington line BL42523. In no case was a reduction in trbl RNA upregulation in the wing pouch following 4000 rad observed, except for one disc (n = 6) of VDRC 106774 crossed to nubbin-gal4.

      Reviewer #1 (Public review):

      Summary:

      The authors analyze transcription in single cells before and after 4000 rads of ionizing radiation. They use Seuratv5 for their analyses, which allows them to show that most of the genes cluster along the proximal-distal axis. Due to the high heterogeneity in the transcripts, they use the Herfindahl-Hirschman index (HHI) from Economics, which measures market concentration. Using the HHI, they find that genes involved in several processes (like cell death, response to ROS, DNA damage response (DDR)) are relatively similar across clusters. However, ligands activating the JAK/STAT, Pvr, and JNK pathways and transcription factors Ets21C and dysf are upregulated regionally. The JAK/STAT ligands Upd1,2,3 require p53 for their upregulation after irradiation, but the normal expression of Upd1 in unirradiated discs is p53-independent. This analysis also identified a cluster of cells that expressed tribbles, encoding a factor that downregulates mitosis-promoting String and Twine, that appears to be G2/M arrested and expressed numerous genes involved in apoptosis, DDR, the aforementioned ligands, and TFs. As such, the tribbles-high cluster contains much of the heterogeneity.

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented

      (4) The literature is appropriately cited.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      For each condition at least 5 discs were imaged but we imaged up to 15 discs in some cases. We tried to choose a representative disc for each condition after looking at all of them. All discs imaged under each condition are shown below; the disc chosen for the figure is indicated with an asterisk. All scale bars are 100 mm.

      Author response image 2.

      Images for discs shown in Manuscript Figure 1panels B, C

      Author response image 3.

      Images for discs shown in Manuscript Figure 1panels D, E

      Author response image 4.

      Images used in Manuscript Figure 1, F, G

      Author response image 5.

      Images used in Manuscript Figure 1H, I

      Author response image 6.

      Images used in Manuscript Figure 1J, K

      Author response image 7.

      Images used in Manuscript Figure 1L, M

      (2) Some of the figures are unclear.

      It is unclear to us exactly which figures the Reviewer is referring to. Perhaps this is the same issue mentioned below in “Recommendations for the authors”. We address it below.

      Reviewer #1 (Recommendations for the authors):

      (1) Regarding Figure 1, what is stained in blue? Is it DAPI? If so, this should be added to the figure legend.

      Thank you for pointing out this omission. This has been addressed in the revised manuscript.

      It is very difficult to see blue on black, so could the authors please outline the discs?

      Alternatively, they could show DAPI in green and the markers (pH2Av, etc) in magenta.

      We used DAPI (blue) as a way of outlining the discs. While we appreciate the reviewer’s concern, after reviewing the images, we found that the blue is clearly visible when the document is viewed on the screen. It is less obvious if the document is printed on some kinds or printers. Since boosting this channel would make the signal from the channels more difficult to see, we left the images as they were.

      (2) Figure 3, Figure Supplement 2, panel B. It is not possible to read the gene names in the panel's current form. Please break this up into 4 lines (as much as possible from the current 2).

      Thank you for this suggestion. We have done this in the revised manuscript.

      Reviewer #2 (Public review):

      This manuscript investigates the question of cellular heterogeneity using the response of Drosophila wing imaginal discs to ionizing radiation as a model system. A key advance here is the focus on quantitatively expressing various measures of heterogeneity, leveraging single-cell RNAseq approaches. To achieve this goal, the manuscript creatively uses a metric from the social sciences called the HHI to quantify the spatial heterogeneity of expression of individual genes across the identified cell clusters. Inter- and intra-regional levels of heterogeneity are revealed. Some highlights include the identification of spatial heterogeneity in the expression of ligands and transcription factors after IR. Expression of some of these genes shows dependence on p53. An intriguing finding, made possible by using an alternative clustering method focusing on cell cycle progression, was the identification of a high-trbl subset of cells characterized by concordant expression of multiple apoptosis, DNA damage repair, ROS-related genes, certain ligands, and transcription factors, collectively representing HIX genes. This high-trbl set of cells may correspond to an IR-induced G2/M arrested cell state.

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      Thank you for your assessment of the work.

      Reviewer #2 (Recommendations for the authors):

      I suggest two major points for improvement:

      (1) It is important to test whether manipulation of trbl levels (i.e., overexpression, knockdown, mutation) would result in measurable biological outcomes after IR, such as altered HIX gene expression, altered cell cycle progression, or both. This may help disentangle the question of whether high trbl expression and correlated HIX gene expression are a cause or consequence of G2/M stalling.

      We have described these experiments at the beginning of this Response to Reviews document when addressing the comments made by the Reviewing Editor. Please see Figure 7, figure supplements 1 and 2. These experiments suggest that upregulation of trbl offers some protection from radiation-induced death, yet it is itself insufficient to induce expression of two HIX genes tested. As we have also described earlier, three different RNAi lines tested did not reduce trbl upregulation after irradiation.

      (2) A more extensive characterization of the high-trbl cell state would also be appropriate, particularly in terms of their relationship to the cell cycle.

      We attempted to address this issue in two ways. First, we used the expression of a trbl-gfp transgene and RNA in-situ hybridization experiments to visualize the distribution of the high-trbl cells (shown in new manuscript figure, Figure 6-figure supplement 3). When examining trbl RNA in irradiated discs, there is no obvious demarcation between cells that express high levels of trbl and other cells. This is also apparent in the UMAP shown in Figure 6A and A’. Most cells seem to express trbl; cells in the “high trbl” cluster simply express more trbl than others. We observed cells expressing trbl and PCNA as well as cells expressing only one of those two genes at detectable levels. Thus, it was not possible to distinguish the “high trbl” cells from other cells by this approach.

      We decided instead to focus on examining the expression of other cell-cycle genes in the high-trbl cluster. We have added a paragraph in the Results section that details our findings. Many transcriptional changes are indeed consistent with stalling in G2 such as high levels of trbl and low levels of string (stg). Additionally, that the cells are likely in G2 is consistent with reduced levels of genes that are normally expressed at other stages of the cell cycle: G1 genes such as E2f1 and Dp, S-phase genes such as several Mcm genes, PCNA and RnrS, and genes that encode mitotic proteins such as polo, Incenp and claspin. There are however, several anomalies such as slightly increased expression of the early-G1 cyclin, CycD, and the retinoblastoma ortholog Rbf. Thus, at least as assessed by the transcriptome, this cluster may not correspond to a cell state that is found under normal physiological conditions.

      (3) Minor: p. 12, line 3. Figure 5A is mentioned, but it seems that it should be 4A instead.

      Thank you for pointing this out. We have addressed this in our revisions.

      Reviewer #3 (Public review):

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNA-seq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to 57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Work by others (Ruiz-Losada et al., 2021, PMID:34824391) has shown that almost 80% of cells have a 4C DNA content 4 h after 4,000 rad X-ray irradiation. The high-trbl cluster accounts for only 18% of cells and can therefore account for a minority of cells with a 4C DNA content.

      Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs.

      We expect that clusters 1 and 2 are largely comprised of cells in G2/M. Together, these clusters are marked by some genes previously found to be higher in FACS separated G2 cells compared to G1 cells (Liang et al., 2014, PMID: 24684830). These genes include Det, aurA, and ana1. Strangely, cluster 0 is not strongly marked by any of the 175 cell cycle genes used in our clustering (eff being the strongest marker) and has a lower-than-average expression of 165/175 cell cycle genes. Cluster 0 is however marked by the genes ac and sc, which are known to be expressed in proneuronal cell clusters interspersed throughout the disc that stall in G2 and form mitotically quiescent domains (Usui & Kimura 1992, Development, 116 (1992), pp. 601-610 (no PMID); Nègre et al., 2003, PMID: 12559497). Given these observations, we hypothesize that cluster 0 is largely comprised of stalled G2 cells like those found in ac/sc-expressing proneural clusters.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread as we now show in Figure 6-figure supplement 3.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high-trbl cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. Another possibility is that dysf upregulation might be acutely sensitive to the developmental stage of the disc. This would require experiments with very precisely-staged larvae. We have not investigated this further as it is not a central issue in our paper.

      Reviewer #3 (Recommendations for the authors):

      Please check the color-coding in Figure 1A. The region marked as pouch appears to include hinge folds that express Zfh2 (a hinge marker) in Figure 2A (even after accounting for low Zfh2 expression in part of the pouch).

      We have corrected this and have marked the pouch region based on the analysis of expression of different hinge and pouch markers by Ayala-Camargo et al. 2013 (PMID 2398534).

      The statement 'Furthermore, within tissues, stem cells are most sensitive while differentiated cells are relatively radioresistant' needs to be qualified, as there are differences in radiosensitivity of adult versus embryonic stem cells (e.g., PMID: 30588339)

      We thank the reviewer for bringing this point to our attention and for pointing us to an article that addresses this issue in detail. We appreciate that our statement was rather simplistic – we have modified it and added two additional references.

    1. eLife Assessment

      This important study, which tackles the challenge of analyzing genome integrity and instability in unicellular pathogens by introducing a novel single-cell genomics approach, presents compelling evidence that this new tool outperforms standard whole-genome amplification techniques. While thorough and rigorous, the work's impact would increase by providing scripts and data, as well as a description of the biological relevance that would make this method more appealing to the broad community studying genetic heterogeneity in diverse organisms.

    2. Reviewer #1 (Public review):

      Summary:

      Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

      In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

      Strengths:

      This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

      Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

      Weaknesses:

      The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

    3. Reviewer #2 (Public review):

      Summary:

      Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

      The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

      Strengths:

      (1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

      (2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

      (3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

      (4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

      (5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

      (6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

      (7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

      Weaknesses:

      (1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

      (2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

      (3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

      (4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

      (5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

      (6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

      (7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

      (8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

      (9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

    4. Reviewer #3 (Public review):

      In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

      I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

      (1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

      (2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

      (3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

      (4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

      (5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

      (5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

      (5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000.

      Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

      In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

      Strengths:

      This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

      Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

      We thank the reviewer for the positive feedback and appreciation of the potential applications for the methodology we describe here.

      Weaknesses:

      The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

      We thank the reviewer for the suggestions. Indeed, single-cell DNA sequencing has successfully revealed cell-to-cell variability in replication timing and fork progression in mammalian cells[1,2] and we believe that the SPC-PTA workflow could be used in similar studies in Leishmania to complement bulk-based observations[3,4]. Regarding nucleotide information, it is indeed of high relevance to detect minor circulating variants with potential virulence impact and/or effect on drug resistance which could be missed by bulk sequencing. This includes the ability to detect co-occurring variants with potential epistatic effects. These topics will be further developed in the revised version. Finally, we will explicitly discuss how this methodology can be applied beyond Leishmania, to investigate genome plasticity, adaptation, and evolutionary processes in other organisms.

      Reviewer #2 (Public review):

      Summary:

      Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

      The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

      Strengths:

      (1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

      (2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

      (3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

      (4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

      (5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

      (6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

      (7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

      We deeply appreciate the positive and encouraging feedback on our manuscript.

      Weaknesses:

      (1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

      The methanol fixation prior to lysis is part of the original protocol described in the Single-Microbe Genome Barcoding Kit manual and was meant to facilitate lysis and DNA denaturation in bacterial cells (for which the kit was originally developed). However, in our preliminary tests with bulk samples – described in the supplementary material – we noticed a strong negative effect on lysis efficiency/DNA recovery when parasites were fixed with methanol. Thus, we decided to test the effect of skipping this step in the single-cell DNA workflow. We kept the SPC_STD1 sample to have a safe control where the full workflow described in the kit manual was followed.

      As we were unsure if the standard lysis (25 ˚C for 15 minutes) would work efficiently for Leishmania, we included the heat-lysis (99˚C for 15 minutes) as well as the longer incubation lysis (25 ˚C for 1h). These modifications were listed as validated alternatives in the kit's manual.

      The 100k reads threshold was chosen based on the number of reads found in the 'true cell' with the lowest read count.

      Regarding variant calling, a variant was considered confidently called if it was covered, at single-cell level, by at least one deduplicated read with Phred quality above Q30 and mapping quality (MAPQ) also above 30.

      In the revised version, we will include these explanations and improve the explanation of the metrics used to estimate coverage quality.

      (2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

      We agree with the reviewer that the longer incubation period of PTA might explain the higher read count seen in the PTA samples, although the differences in amplification kinetics (linear in PTA, exponential in STD) and potential differences in amplification saturation points make it difficult to compare them. For instance, an updated version of PTA (ResolveDNA V2) uses a lower amplification time (2.5 h) and achieves similar amplification levels compared to the 10h incubation time, suggesting PTA amplification saturates well before the 10h time. In any case, all quality check metrics were done with the cells subsampled to 100 k reads to mitigate the effect of read count differences on the data quality.

      (3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

      We agree with the reviewer but during experimental execution SPCs were only assessed qualitatively via microscopy following the Single-cell microbe DNA barcoding kit manual. No quantitative analysis was done and therefore we do not have this data. Regarding doublet, this was done in silico based on the detection of SPCs containing mixed genomes from the two strains used in the study as described in the Materials and Methods. As pointed by another reviewer, this only allow the detection of inter-strain doublets. In the revised version, we explain this and add an estimation of total doublets based on the inter-strain doublet rate.

      (4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

      After exchange with the technical support team of the SPC generator kit, it was clarified that the heat lysis done in STD4 should have had a shorter incubation time (10 minutes instead of 15 minutes). We suspect that the longer incubation time, combined with the higher temperature and the harsh lysis condition with 0.8M KOH might have damaged SPCs and therefore DNA might have leaked out of them before WGA. In the microscopy images, SPCs in STD4 show a swollen aspect not seen in the other samples. In the revised version we will explain this more clearly.

      (5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

      As described above we will include more discussion on potential biological relevance of the method in the revised version of the manuscript. In the revised version we will attempt to use dedicated bioinformatic tools to discover de novo CNVs, as per the suggestion of other reviewers. This might also allow us to determine the detection limit of the methodology for CNVs.

      (6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

      We believe our data, which uses short-read sequences, does not allow to differentiate between intra-chromosomal CNVs and linear or circular episomal CNVs, so we cannot define if circular CNVs are over-amplified. Of note, we have previously demonstrated that the M-locus CNV in chromosome 36 is intrachromosomal, not circular (episomal)[5].

      (7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

      We believe the SPC-PTA workflow can be applied to organisms with larger genomes as PTA was developed specifically for mammalian cells[6], and also because, in our hands, it outperformed the 10X scDNA solution, which was developed for mammals.

      We believe direct comparison with other studies regarding coverage levels is elusive because other steps in the workflow apart from the WGA, such as the library preparation (PCR-based in our case), as well as genome features like GC content, size, and presence of repetitive regions, can also affect coverage levels and evenness. One strength of our approach was the use a single sample (the 50/50 mix between two L. donovani strain) for all conditions, thus removing potential parasite-specific biases. In addition, the application of a multiplexing system during barcoding allowed us to combine all samples prior to library preparation, thus removing potential differences introduced by this step.

      Regarding the effect of GC-content, we did notice a positive bias in all samples in regions with higher GC content, which had to be corrected in silico. This was the opposite to a negative bias observed in previous study[7] likely due to differences in WGA and/or library preparation. In the revised version, we will include a supplementary figure showing the GC bias.

      (8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

      In the revised version we will provide precise cost estimates and the rationale for the estimation.

      (9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

      The full Zenodo link (https://doi.org/10.5281/zenodo.17094083) will be included in the revised version.

      Reviewer #3 (Public review):

      Summary

      In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

      Strengths

      I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

      We thank the reviewer for the enthusiasm and positive feedback.

      Weaknesses

      (1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

      Both the Zenodo (https://doi.org/10.5281/zenodo.17094083) and the GitHub (https://github.com/gabrielnegreira/2025_scDNA_paper) repositories are now publicly available.

      (2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

      The reason for the higher presence of 'background' SPCs in the PTA samples is not clear, but we hypothesize that it could be due to PTA favoring amplification of small, free floating DNA molecules that might have been trapped in cell-free SPCs, as PTA works with shorter amplicons. Also, the longer incubation time seen in PTA (10 h) might have allowed enhanced amplification of low quantities of free-floating DNA to detectable levels. Regarding Table 1, indeed it only show the total number of reads per sample. In the revised version we will include the suggested column to Table 1.

      (3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

      We believe the 100.000 cutoff is already high for aneuploidy analysis as we have successfully reconstructed parasite karyotype with 20.000 reads per cell8, so a higher cutoff will likely not improve it. For CNV analysis, in the revised version, we will try to identify de novo CNVs using dedicated bioinformatic tools as per other reviewer suggestions. There, we will also test if a higher CNV detection sensitivity is achieved using the suggested 400,000 reads cutoff for the PTA samples.

      (4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

      We fully agree with the reviewer. We will make it clear in the revised version that we quantify inter-strain doublets only, and we will also provide an estimation of total doublets based on the inter-strain doublet rate.

      (5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

      (5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

      We will add a short discussion clarifying the limitations, while noting that our data demonstrate the ability of the approach to resolve very closely related cells, as illustrated by the fine-scale genetic differences observed within the clonal BPK081 population and by the detection of rare variants at targeted loci. We will also emphasize that the sensitivity to detect closely related genotypes depends on sequencing depth and the genomic regions considered.

      (5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000. Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

      In the revised version we will improve the description of the phylogenetic analysis. We will also investigate deeper the 8 mentioned cells to define if they have confounding factors that might have led to their discrepancy. The possibility of multiclonal infection in HU3 is not excluded as this strain was not cloned after isolation.

      References:

      (1) Dileep, V., Gilbert, D. M., Dileep, V. & Gilbert, D. M. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. Nat. Commun. 9, 427 (2018).

      (2) Miura, H. et al. Single-cell DNA replication profiling identifies spatiotemporal developmental dynamics of chromosome organization. Nat. Genet. 51, 1356–1368 (2019).

      (3) Marques, C. A. et al. Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe. Genome Biol. 16, 230 (2015).

      (4) Damasceno, J. D. et al. Leishmania major chromosomes are replicated from a single high-efficiency locus supplemented by thousands of lower efficiency initiation events. Cell Rep. 44, 116094 (2025).

      (5) Imamura, H. et al. Evolutionary genomics of epidemic visceral leishmaniasis in the Indian subcontinent. eLife 5, e12613 (2016).

      (6) Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl. Acad. Sci. 118, e2024176118 (2021).

      (7) Imamura, H. et al. Evaluation of whole genome amplification and bioinformatic methods for the characterization of Leishmania genomes at a single cell level. Sci. Rep. 10, 15043 (2020).

      (8) Negreira, G. H. et al. High throughput single-cell genome sequencing gives insights into the generation and evolution of mosaic aneuploidy in Leishmania donovani. Nucleic Acids Res. 50, 293–305 (2022).

    1. eLife Assessment

      This work reports the characterization of newly identified genetic variants of SLC4A1 in patients with distal renal tubular acidosis. Cell culture studies supplemented with histological analysis of a previously established disease mouse model provide convincing evidence that some of the variants increase intracellular pH, reduce ATP synthesis, and attenuate autophagic degradative flux. The study is valuable in establishing a mechanistic framework for future exploration of the link between intracellular pH and mutations in SLC4A1 in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      This study is an evaluation of patient variants in the kidney isoform of AE1 linked to distal renal tubular acidosis. Drawing on observations in the mouse kidney, this study extends findings to autophagy pathways in a kidney epithelial cell line.

      Strengths:

      Experimental data are convincing and nicely done.

      The revised manuscript incorporates most of the reviewer recommendations and presents a more cohesive story that is easier to read and assess. The data are convincing, of suitable quality and nicely presented. Statistical evaluation is rigorous. The link between kAE1 mutants and cell metabolism and autophagy is novel and provides insights on pathological observations in dRTA.

    3. Reviewer #2 (Public review):

      Context and significance:

      Distal renal tubular acidosis (dRTA) can be caused by mutations in a Cl-/HCO3- exchanger (kAE1) encoded by the SLC4A1 gene. The precise mechanisms underlying the pathogenesis of the disease due to these mutations is unclear, but it is thought that loss of the renal intercalated cells (ICs) that express kAE1 and/or aberrant autophagy pathway function in the remaining ICs may contribute to the disease. Understanding how mutations in SLC4A1 affect cell physiology and cells within the kidney, a major goal of this study, is an important first step to unraveling the pathophysiology of this complex heritable kidney disease.

      Summary:

      The authors identify a number of new mutations in the SLC4A1 gene in patients with diagnosed dRTA that they use for heterologous experiments in vitro. They also use a dRTA mouse model with a different SLC4A1 mutation for experiments in mouse kidneys. Contrary to previous work that speculated dRTA was caused mainly by trafficking defects of kAE1, the authors observe that their new mutants (with the exception of Y413H) traffic and localize at least partly to the basolateral membrane of polarized heterologous mIMCD3 cells, an immortalized murine collecting duct cell line. They go on to show that the remaining mutants induce abnormalities in the expression of autophagy markers and increased numbers of autophagosomes, along with an alkalinized intracellular pH. They also reported that cells expressing the mutated kAE1 had increased mitochondrial content coupled with lower rates of ATP synthesis. The authors also observed a partial rescue of the effects of kAE1 variants through artificially acidifying the intracellular pH. Taken together, this suggests a mechanism for dRTA independent of impaired kAE1 trafficking and dependent on intracellular pH changes that future studies should explore.

      Strengths:

      The authors corroborate their findings in cell culture with a well characterized dRTA KI mouse and provide convincing quantification of their images from the in vitro and mouse experiments. The data largely support the claims as stated. Some of the mutants induce different strengths of effects on autophagy and the various assays than others, and it is not clear why this is from the data in the manuscript. The authors provide discussion of potential reasons for these differences that future studies could explore.

      Weaknesses:

      The pH effects of their mutants are only explored in vitro, and the in vitro system has a number of differences from a living mouse kidney or ex vivo kidney slice.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have identified novel dRTA causing SLC4A1 mutations and studied the resulting kAE1 proteins to determine how they cause dRTA. Based on a previous study on mice expressing the dRTA kAE1 R607H variant, the authors hypothesize that kAE1 variants cause an increase in intracellular pH which disrupts autophagic and degradative flux pathways. The authors clone these new kAE1 variants and study their transport function and subcellular localization in mIMCD cells. The authors show increased abundance of LC3B II in mIMCD cells expressing some of the kAE1 variants, as well as reduced autophagic flux using eGFP-RFP-LC3. These data, as well as the abundance of autophagosomes, serve as the key evidence that these kAE1 mutants disrupt autophagy. Furthermore, the authors demonstrate that decreasing the intracellular pH abrogates the expression of LC3B II in mIMCD cells expressing mutant SLC4A1. Lastly, the authors argue that mitochondrial function, and specifically ATP synthesis, is suppressed in mIMCD cells expressing dRTA variants and that mitochondria are less abundant in AICs from the kidney of R607H kAE1 mice. Overall, the authors provide evidence about how new kAE1 mutants may cause dRTA.

      Strengths:

      The authors cloned novel dRTA causing kAE1 mutants into expression vectors to study the subcellular localization and transport properties of the variants. The immunofluorescence images are generally of high quality and the authors do well to include multiple samples for all of their western blots.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This study is an evaluation of patient variants in the kidney isoform of AE1 linked to distal renal tubular acidosis. Drawing on observations in the mouse kidney, this study extends findings to autophagy pathways in a kidney epithelial cell line. 

      Strengths: 

      Experimental data are convincing and nicely done.

      Thank you

      Weaknesses: 

      Some data are lacking or not explained clearly. Mutations are not consistently evaluated throughout the study, which makes it difficult to draw meaningful conclusions.

      We have revised our manuscript to clarify some earlier explanations and provided rationale for focusing on specific variants throughout the study.

      Reviewer #2 (Public review):

      Context and significance: 

      Distal renal tubular acidosis (dRTA) can be caused by mutations in a Cl-/HCO3- exchanger (kAE1) encoded by the SLC4A1 gene. The precise mechanisms underlying the pathogenesis of the disease due to these mutations are unclear, but it is thought that loss of the renal intercalated cells (ICs) that express kAE1 and/or aberrant autophagy pathway function in the remaining ICs may contribute to the disease. Understanding how mutations in SLC4A1 affect cell physiology and cells within the kidney, a major goal of this study, is an important first step to unraveling the pathophysiology of this complex heritable kidney disease. 

      Summary: 

      The authors identify a number of new mutations in the SLC4A1 gene in patients with diagnosed dRTA that they use for heterologous experiments in vitro. They also use a dRTA mouse model with a different SLC4A1 mutation for experiments in mouse kidneys. Contrary to previous work that speculated dRTA was caused mainly by trafficking defects of kAE1, the authors observe that their new mutants (with the exception of Y413H, which they only use in Figure 1) traffic and localize at least partly to the basolateral membrane of polarized heterologous mIMCD3 cells, an immortalized murine collecting duct cell line. They go on to show that the remaining mutants induce abnormalities in the expression of autophagy markers and increased numbers of autophagosomes, along with an alkalinized intracellular pH. They also reported that cells expressing the mutated kAE1 had increased mitochondrial content coupled with lower rates of ATP synthesis. The authors also observed a partial rescue of the effects of kAE1 variants through artificially acidifying the intracellular pH. Taken together, this suggests a mechanism for dRTA independent of impaired kAE1 trafficking and dependent on intracellular pH changes that future studies should explore. 

      Strengths: 

      The authors corroborate their findings in cell culture with a well-characterized dRTA KI mouse and provide convincing quantification of their images from the in vitro and mouse experiments

      Thank you  

      Weaknesses: 

      The data largely support the claims as stated, with some minor suggestions for improving the clarity of the work. Some of the mutants induce different strengths of effects on autophagy and the various assays than others, and it is not clear why this is from the present manuscript, given that they propose pHi and the unifying mechanism

      We have modified our manuscript to discuss the various strengths of the mutants and emphasize that alteration of cytosolic pH by kAE1 variants may not be the only mechanism leading to dRTA.  

      Reviewer #3 (Public review):

      Summary: 

      The authors have identified novel dRTA causing SLC4A1 mutations and studied the resulting kAE1 proteins to determine how they cause dRTA. Based on a previous study on mice expressing the dRTA kAE1 R607H variant, the authors hypothesize that kAE1 variants cause an increase in intracellular pH, which disrupts autophagic and degradative flux pathways. The authors clone these new kAE1 variants and study their transport function and subcellular localization in mIMCD cells. The authors show increased abundance of LC3B II in mIMCD cells expressing some of the kAE1 variants, as well as reduced autophagic flux using eGFP-RFP-LC3. These data, as well as the abundance of autophagosomes, serve as the key evidence that these kAE1 mutants disrupt autophagy. Furthermore, the authors demonstrate that decreasing the intracellular pH abrogates the expression of LC3B II in mIMCD cells expressing mutant SLC4A1. Lastly, the authors argue that mitochondrial function, and specifically ATP synthesis, is suppressed in mIMCD cells expressing dRTA variants and that mitochondria are less abundant in AICs from the kidney of R607H kAE1 mice. While the manuscript does reveal some interesting new results about novel dRTA causing kAE1 mutations, the quality of the data to support the hypothesis that these mutations cause a reduction in autophagic flux can be improved. In particular, the precise method of how the western blots and the immunofluorescence data were quantified, with included controls, would enhance the quality of the data and offer more supportive evidence of the authors' conclusions. 

      Strengths: 

      The authors cloned novel dRTA causing kAE1 mutants into expression vectors to study the subcellular localization and transport properties of the variants. The immunofluorescence images are generally of high quality, and the authors do well to include multiple samples for all of their western blots.

      Thank you

      Weaknesses: 

      Inconsistent results are reported for some of the variants. For example, R295H causes intracellular alkalinization but also has no effect on intracellular pH when measured by BCECF. The authors also appear to have performed these in vitro studies on mIMCD cells that were not polarized, and therefore, the localization of kAE1 to the basolateral membrane seems unlikely, based upon images included in the manuscript. Additionally, there is no in vivo work to demonstrate that these kAE1 variants alter intracellular pH, including the R607H mouse, which is available to the authors. The western blots are of varying quality, and it is often unclear which of the bands are being quantified. For example, LAMP1 is reported at 100kDa, the authors show three bands, and it is unclear which one(s) are used to quantify protein abundance. Strikingly, the authors report a nonsensical value for their quantification of LCRB II in Figure 2, where the ratio of LCRB II to total LCRB (I + II) is greater than one. The control experiments with starvation and bafilomyocin are not supportive and significantly reduce enthusiasm for the authors' findings regarding autophagy. There are labeling errors between the manuscript and the figures, which suggest a lack of vigilance in the drafting process.

      The R295H variant was identified in a dRTA patient and as such, it was important to report it. However, this is the first mutation located in the amino-terminus of the protein, which may be involved in protein-protein interactions, so other mechanisms may cause dRTA for this variant. We have therefore modified our manuscript to state that alteration of cytosolic pH may not be the only mechanism leading to dRTA. At this time, we are not able to measure cytosolic pH in vivo and hope to be able to do it in the future.

      In our revised manuscript, we also show cell surface biotinylation results supporting that plasma membrane abundance of the kAE1 S525F and R589H variants is not significantly different than WT in non-polarized mIMCD3 cells (Figure 3 A&B), in line with the predominant basolateral localization of the variants in polarized cells (Figure 1C). Therefore, these two mutant proteins are not mis-trafficked in non-polarized cells.  Finally, we have clarified which bands have been used for quantification and corrected quantifications (including ratio measurements).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) R295H is recessively inherited, whereas Y413H is dominantly inherited: this is interesting and may be linked to their cellular expression and function. Is this information known for the other mutations examined in this study? 

      The S25F and R589H dRTA variants have both been reported to exhibit autosomal dominant inheritance. This information is now updated in lines 146 and 158-159.

      (2) R589H expression levels are evaluated in the Western blot of Figure 1, but localization and activity are not examined in Figure 2. However, R589H is included in autophagy experiments shown in later figures. Similarly, mutant R607H is the subject of several experiments further into the manuscript, but no initial analysis is provided for this variant. 

      Protein abundance and localization of the R589H mutant in mIMCD3 cells have been shown in our previous publication in Supplementary Fig 5D and Supplementary Fig 2J [1]. This now indicated on lines 158-159. Our previous paper also presented a detailed study of the R607H dRTA mutant, the mouse model corresponding to the human R589H mutation. This is now indicated on lines 70, 118-119 and 180. The present study builds upon those published findings.

      (3) This inconsistency is confusing, detracts from the usefulness of the study, and makes the comparative analysis of mutations incomplete. It is difficult to extrapolate from published studies in MDCK1 cells, which show different results on trafficking. 

      The mIMCD3 cell line, which more closely resembles the physiology of the mouse collecting duct than MDCK cells, was selected for this study and our previous one [1]. Accordingly, the results obtained are better aligned with in vivo evidence. In contrast, differences in mutant protein expression and localization observed in other cell lines, like the MDCK cells, are likely attributable to differences in their cellular origin. 

      (4) In Figure 2, could the authors explain why total LC3B is graphed for the data shown in mouse lysates, whereas the ratio of bands is analysed for cell lysates? Both sets of data show the two LC3B bands.

      Total LC3B levels were significantly increased in the mutant compared to WT; however, no significant difference was observed in the lipidation ratio. For this reason, that graph is not shown in the main paper but has been included in the Supplementary Figure 1D. 

      (5) In Figure 3, representative fluorescence images should be shown for all cell lines.

      We have now included representative immunofluorescence images for all cell lines in Figure 3C.

      (6) pH effects: Suggest that steady state pHi (Figure 3E) and rate of alkalization (Figure 1F) would be more effective together in Figure 1. The authors should show data for the effect of nigericin on cytoplasmic pH in Figure 3. If the rate of alkalinization in the mutant cells is reduced, shouldn't the intracellular steady state pH be more acidic? A cartoon depicting the transporter activity in the cell and the expected changes in pHi would be helpful. Is there a way to activate/inhibit NHE1 and rescue the effect of the mutant kAE1? It is unclear if the link between the mutant kAE1 and mitochondrial ATP production is a consequence of the intracellular pH or an indirect effect.

      We opted to keep the effect of nigericin on pHi in Supplementary Fig1A given that Figure 3 already contains 11 panels. Also, in intercalated cells, the kAE1 protein physiologically exports 1 molecule of bicarbonate in exchange of 1 chloride ion import hence a reduced transport activity would result in a more alkaline intracellular pH. To clarify this point, we have included a diagram in Figure 1E as suggested. However, to calculate the rate of intracellular alkalinisation, the transporter is functioning in the opposite direction, i.e. extruding chloride and importing bicarbonate (see methods protocol for transport assay). Therefore, in this assay (Figure 1G), a defective chloride/bicarbonate activity results in a reduced rate of intracellular alkalinisation rate. This is now explained on lines 169-172.

      Disruption of NHE1 function would impair sodium homeostasis and as such, potentially affect the activity of other proteins associated with acid-base balance and autophagy in collecting duct cells. Therefore, any resulting effects may not be confidently attributed specifically to the mutant kAE1. With nigericin, we aimed to alter pHi while affecting the least possible other ion concentration. Due to space considerations, Figure 1 has been reorganised to include the rate of alkalinisation and pHi (panels F and G). 

      Reviewer #2 (Recommendations for the authors):

      (1) The authors could improve the readability of this manuscript for a general audience by clarifying and summarizing the respective phenotype(s)/effect(s) of the different mutants in some kind of table in the main figures. It is hard to keep track of the different disease mutants alongside the KI mouse mutations, as the text frequently discusses multiple mutants at a time. 

      As requested, we added two tables (Supplementary Tables 1 & 2) in Supplementary files summarizing the data obtained in this study. We hope this will help the readership to keep track of each variant’s phenotype.

      (2) The subtitle of the results section of Figure 2 should be reworded to reflect that  whole kidney lysates are used for the KI mice and not the other mutants.

      As requested, the title in the Results section has been modified (lines 178-179).

      (3) More discussion of why the different mutants cause different strengths of phenotypes should be included.

      Different variants induce different degree of functional defects as seen in Figure 1F & G. The kAE1 R295H, the only amino acid substitution in the amino-terminal cytosol causing dRTA, does not affect the transporter’s function or cells’ pHi. Therefore, this variant may cause dRTA via a different pathway than transport-defective S525F or partially inactive R589H variants that both affect pHi. Our study does not exclude that dRTA may be caused by other defects than pHi alterations, including defective proteinprotein interactions. This discussion is now included in the manuscript on lines 386-391.

      Reviewer #3 (Recommendations for the authors):

      In general, I found the subject matter of this manuscript interesting and of value to the scientific community. The interpretation of the data and how much it supports the conclusion that "kAE1 variants increases pHi which alters mitochondrial function and leads to reduced cellular energy levels that eventually attenuate energy-dependent autophagic pathways" is largely incomplete. There are significant concerns about the quantification of Western blot data. Additionally, including the R607H variant in the in vitro experiments would improve the interpretation and extrapolation of in vitro data to the kidney.

      We apologize for the confusion with R589H and R607H variants. The R607H mutant is the murine ortholog to the human R589H dRTA variation. To clarify this, we have added this information on line 180, in addition to lines 118-119 and line 70.

      Suggestions:

      (1) Can an anion replacement experiment be performed in the mIMCD cells (no Cl or no HCO3) to determine that bicarbonate transport through AE1 is responsible for the reduced ATP rates in Figure 5? Inclusion of WT +dox control would be helpful to convince the reader of the effects.

      Because Seahorse real-time cell metabolism ATP rates measurements require specific and patented buffers with un-specified compositions, it was not possible to modify the Cl⁻ or HCO₃⁻ content during the ATP measurement assay. All cell lines, including empty vector cells (EV) were treated with doxycycline; thus, WT + dox was already included. The empty vector cell line treated with doxycycline allowed the exclusion of specific effects of doxycycline on mitochondrial activity as a control. This is now clarified in Figure 5 legend, lines 655-656.

      (2) Can the authors measure pHi in fresh kidney sections from the R607H mouse?

      Unfortunately, we are not currently able to measure pHi in fresh kidney sections and although we recognize it would benefit greatly to our study, establishing a new collaboration to perform this measurement would significantly delay the publication of this work; therefore, these results will not be available for the present manuscript. 

      (3) Does pH 7.0 media have any effect on autophagy, as shown in Figure 3? Why was pH 6.6 selected?

      The idea was to artificially acidify pHi in mutant cell lines (that have a steady state alkaline pHi) and assess whether this acidification corrects autophagy defects. We first determined that incubation in cell culture medium at pH 6.6 with 0.033 µM nigericin (final potassium concentration: 168 mM) for 2 hours provided optimal conditions, i.e. ensuring cell viability over the 2-hour period while effectively lowering intracellular pH to 6.9, as demonstrated in Supplementary Figure 1A-C.

      (4) In vitro experiments should be performed on polarized cells with kAE1 properly inserted in the basolateral membrane. Experiments on subconfluent, non-polarized cells do not support the hypothesis that transport functions of AE1 initiate the cascade of events attributed to these SLC4A1 mutations.

      To address this point, we have performed cell surface biotinylations on 70-80 % confluent mIMCD3 cells expressing kAE1 WT, S525F or R589H mutants and show that cell surface abundance of the mutants is not significantly different from the WT protein. This is now shown in Figure 3 A&B. As cell surface biotinylation provides a more quantitative assessment of protein cell surface abundance, we have removed the immunofluorescence images from non-polarised cells and replaced them with representative immunoblots from a cell surface biotinylation assay.

      Concerns:

      (1) No information about the B1 ATPase antibody used.

      Now provided in Supplementary Material, ATP6V1B1 Antibody from Bicell cat#20901.

      (2) No actin band in Figure 1E (as prepared).

      Actin bands are provided for each blot in Figure 1D.

      (3) Figures 1E and 1F are labelled wrong in the figure versus the results section. 

      Thank you for letting us know, this is now corrected.

      (4) The cortical sections shown in Figure 4 for the KI/KI do not appear to have the morphology of a CCD. The authors may want to consider including glomeruli to convince the reader of the localization of the tubules. Same concern with Figure 5G and I. The WT image in 5G does not have the morphology of a CCD. Principal cells should be predominant, and ICs should be dispersed.

      Both figures 4 and 5 have been updated with images showing glomeruli (light blue “G” on figure) with neighbour and dispersed IC staining.

      (5) The quantification of LAMP1 in Figure 4 is unclear. How did the authors determine the boundary of AICs, and how did they calculate the volume of lysosomes? If a zstack was used, how are the authors sure that their 10um section includes the entire AIC?

      The quantification of LAMP1 is detailed under “Image analysis”, then “Volocity” sections in Supplementary Material. The boundary of A-IC was manually detected in Volocity based on the presence of the H<sup>+</sup>-ATPase before Volocity analysis for lysosomal volume as described in the Methods.

      The 10 micron sections are expected to include full AIC as well as partial AIC, but the frequency of these events should be the same between WT and variants’ sections, therefore they were all included in the analysis if cells displayed H<sup>+</sup>-ATPase signal. 

      (6) Figure 5: There is no description of how ATP rates are calculated from the provided traces.

      We used Agilent Seahorse XF ATP rate assay kit for this experiment. In this assay, the total ATP rate is the sum of ATP production rate from both glycolysis and oxidative phosphorylation. Glycolysis releases protons in a 1:1 ratio with ATP hence the glycolytic ATP rate is calculated from the glycolytic proton efflux rate (glycoPER). GlycoPER is determined by subtracting respiration linked proton efflux from total proton efflux by inhibiting complex I and III. This information is now added to Supplementary Material, in the “Metabolic Flux analysis” section.

      (7) Figure labels in Figure 5 are wrong. It seems 5H (as presented) should actually be labeled 5G. In 5H (G?), why did some cells not have any TOM20 pixel intensity for S525F and R589H variants?

      Confocal image acquisition in this experiment was kept under the same settings to allow comparison between samples. Therefore, some cells show dimer fluorescence than others. From the figure 5 panels, all cells showed TOM 20 pixel intensity. Figure 5H panel has been relabelled Figure 5G.

      (8) In Figure 2, the summary graphs show analysis of more samples than are visible on the included western blots. What is the rationale for this? Why does S525F have 9 samples in BafA1 while R295H only has 3 (2H)? Yet, R295H has 6 samples in 2I. In 2D, S525F has at least 9 samples. Explain.

      Figure 2A-C shows representative immunoblots, among several ones independently conducted. Therefore, the final number of samples is higher than showed on Figure 2. This is now indicated in Figure 2 legend, line 603. It became clear quite early in our study that the recessive kAE1 R295H variant does not behave similarly to the other variants studied, maybe because it affects the cytosolic domain, so we did not perform as many replicates for this variant as we did for the others. However, we felt it was valuable to the research community to report the characterization of this variant and decided to keep it in our study. 

      (9) In general, the actin loading does not appear to be equal between samples. And some figures show the same actin blot twice (2A, C) while some show independent actin bands for LC3B and p62. Equal loading seems a fairly significant control, considering the importance of quantification in the figures.

      In addition to performing protein assays, we systematically conduct immunoblot with anti-b-actin antibody to control for loading variability. When possible, two or three proteins, including actin, are detected on the same blot, when molecular weight differ enough. This sometimes results in b-actin being used as a loading control for two different proteins, as seen on Figure 2A and 2C. This is now indicated on lines 605606.

      (10) In the Supplemental Figure 2, which band is being quantified for mature CTSD at 33kDa? Same for intermediate CTSD. The quantification of V-ATPase seems questionable based on the actin variance shown in the blot. Surely the ratio of the fourth sample is greater than 1.

      Supplementary Figure 2 has been updated to include arrows indicating which band was selected for the quantification. After verifying the measurements of band intensities from “Image Lab” quantification software, we confirm the results, including that fourth KI/KI sample has a ratio of 0.78 (Adj Total Band Vol (Int), lanes 10). Screen shots of quantifications are attached below.

      Author response image 1.

      Author response image 2.

      (11) Why are the experiments performed on non-confluent IMCD cells? Figure 1D shows good basolateral localization of AE1, yet the other experiments in the manuscript appear to use IMCD cells in low confluent states, without proper localization of AE1. Figure 3A shows AE1 dispersed throughout the cytoplasm. Why have the authors decided to study the effects of an anion exchanger without it being properly localized to the basolateral membrane? Shouldn't all experiments be performed in polarized IMCDs? If AE1 isnt properly in the membrane, and the cells do not have defined apico-basolateral polarity, then what role can AE1-mediated intracellular pH change have on the results of the experiments? Were the pHi experiments in 3E performed on polarized cells? Or even 1F?

      To address this point, we have performed cell surface biotinylations on 70-80 % confluent mIMCD3 cells expressing kAE1 WT, S525F or R589H mutants and show that cell surface abundance of the mutants is not significantly different from the WT protein. This is now shown in Figure 3A & B. As it provides a more quantitative assessment of protein cell surface abundance, we have removed the immunofluorescence images from non-polarised cells and replaced them with a representative immunoblot from a cell surface biotinylation assay.

      (12) As mentioned in the public comments, how is the ratio A/(A+B) greater than 1? With A and B > 0. In Figure 3, the data is reasonable, but in Figure 2, the data is simply impossible. What is the explanation for this phenomenon? Why was this presentation of data approved? Is it supposedly a fold of WT, like 2K and 2L? Is the reader also to believe that total LC3B is 2-fold greater in KI/KI mice, as shown in 2K? My eyes, though not densitometry equipment, cannot confirm this. The actin bands are not equal. Yet again, there are 4 lanes of KI/KI mice, but the quantification shows 5 samples.

      The ratios in figure 2D, 2F, 2H and 2L have been re-calculated and corrected. As indicated above, immunoblots are representative and quantification of additional blots has been included in the graphs.

      (12) Spelling error Figure 4B: cels.

      Corrected

      References 

      (1) Mumtaz, R. et al. Intercalated Cell Depletion and Vacuolar H+-ATPase Mistargeting in an Ae1 R607H Knockin Model. Journal of the American Society of Nephrology 28, 1507–1520 (2017).

    1. eLife Assessment

      This important study reports convincing evidence of associations between 35 polygenic indices (PGIs) for social, behavioural, and psychological traits, as well as other health conditions (e.g., BMI) and all-cause mortality, based on data from Finnish population-based surveys and a twin cohort linked to administrative registers. PGIs for education, depression, alcohol use, smoking, BMI, and self-rated health showed the strongest associations with all-cause mortality, in the order of ~10% increment in risk per PGI standard deviation. Effect sizes from twin-difference analyses tended to be slightly larger than those from population cohorts, a pattern opposite that generally observed when testing PGI associations with their target phenotypes, and supporting the robustness of findings to confounding by population stratification.

    2. Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype.

      Comments on revised version:

      The authors answered my concerns well. I don't have any further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Comments on revised version:

      I am happy with the revision. No further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype. I have two main comments:

      (1) The authors should clarify whether the p-value reported in the text will remain significant after multiple testing adjustment. Some of the large effects might be significant; for example, Figure 2C

      We have now added Benjamini-Hochberg multiple-testing adjusted p-values in the text each time we present nominal p-values. Additionally, supplementary tables S5 and S6 provide multiple-adjusted p-values for all analysed PGIs.

      Although this was not always the case, many comparisons remained significant after multiple testing adjustments, especially in Figure 2C that the reviewer commented on. In the revised version, we have placed more emphasis on describing these HRs that have low p-values after multiple-test adjustment. The revised text for Figure 2C in the Results section now reads:

      Panel C analyses mortality in three age-specific follow-up periods. The PGIs were more predictive of death in younger age groups, although the difference between the 25–64 and 65–79 age groups was small, except for the PGI of ADHD (HR=1.14, 95% CI 1.08; 1.21 for 25–64-year-olds; HR=1.04, 95% CI 1.00; 1.08 for 65–79-year-olds; p=0.008 for difference, p=0.27 after multiple-testing adjustment). PGIs predicted death only negligibly among those aged 80+, and the largest differences between the age groups 25–64 and 80+ were for PGIs of self-rated health (HR 0.87, 95% CI 0.82; 0.93 for 25–64-year-olds, HR 1.00, 95% CI 0.94; 1.04 for 80+ year-olds, p=2*10<sup>-4</sup> for difference, p=0.006 after multiple-testing adjustment), ADHD (HR 1.14, 95% CI 1.08; 1.21 for 25–64-year-olds, HR 0.99, 95% CI 0.95; 1.03 for 80+ year-olds, p=7*10<sup>-4</sup> for difference, p=0.012 after multiple-testing adjustment) and depressive symptoms (HR 1.12, 95% CI 1.06; 1.18 for 25–64-year-olds, HR 1.00, 95% CI 0.96; 1.04 for 80+ year-olds, p=0.002 for difference, p=0.032 after multiple-testing adjustment). Additionally, the difference in HRs between these age groups achieved significance after multiple testing adjustment at the conventional 5% level for PGIs of cigarettes per day, educational attainment, and ever smoking.

      We have also included the recent study by Argentieri et al. (2025) in the literature review, which was missing from our previous version. We appreciate the reference. Other references mentioned were already included in the previous version of the manuscript.

      (note that the small prediction accuracy of PGI in older age groups has been extensively studied, see Jiang, Holmes, and McVean, 2021, PLoS Genetics).

      We would like to thank the reviewer for suggesting the relevant reference by Jiang et al. We have now expanded on the discussion of age-specific differences in the discussion section and included this reference.

      (2) The authors might check if PGI+Phenotype has improved performance over Phenotype only. This is similar to Model 2 in Table 1, but slightly different.

      The reviewer raises an interesting angle to approach the analysis. We have now added an analysis assessing the information criteria and the significance of improvement between nested models in Supplementary table S8. All the tested PGI+phenotype models show improvement over the phenotype-only model that is statistically significant at all conventional levels when tested by likelihood-ratio tests between nested models . Additionally,  improvement was found when using Akaike and Bayesian (Schwarz) information criteria (albeit sometimes modest in size). We have added a passage in the results section briefly summarising this analysis:

      Supplementary table S8 presents information criteria and significance tests on corresponding models. Models with PGI+phenotype (Models 2a–f) showed improvement over models with the phenotype only (Models 1a, 1c, 1e, 1g, 1i, 1k, with a p=0.0006 or lower) in terms of both Akaike information criterion (AIC) as well as Bayesian (Schwarz) information criterion (BIC) with a p=0.0006 or lower in all comparisons. The full Model 4 again showed improvement over the model with all PGIs jointly (Model 3b, with a p=0.0002 or p=0.00002, depending on continuous/categorical phenotype measurement), which had a lower AIC but not BIC.

      Reviewer #2 (Public review): 

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Weaknesses:

      It is unclear whether the PGIs used for each trait represent the most current or optimal versions based on the latest GWAS data.

      To our reading, this comment is closely related to the “recommendations for the author” number 3 by reviewer 2, and we thus address them together. 

      If the Finnish data used in this study also contributed to the development of some of the PGIs, there is a risk of overestimating their associations with mortality due to overfitting or "double-dipping." Similar inflation of effect sizes has been observed in studies using the UK Biobank, which is widely used for PGI construction.

      To our reading, this comment is closely related to the “recommendations for the author” 4 by reviewer 2, and we thus address them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) Cited reference 1 also investigated the PRS association with life span; cited reference 8 explains PRS association with healthy lifespan. Can authors be clearer about what is new in the context of these references? Specifically, what are the PGIs studied here that were not analyzed in the cited analyses?

      Although some previous studies on the topic do exist, our analysis arguably has novelty in touching upon several unstudied or scarcely studied themes. These include:

      A set of PGIs focusing on social, psychological, and behavioural phenotypes or PGIs for typically non-fatal health conditions.

      An assessment of direct genetic effects/ confounding with a within-sibship design.

      An assessment of potential heterogeneous effects by several socio-demographic characteristics.

      An analysis of external causes of deaths (which can be hypothesised to be particularly relevant here, given the choice of our PGIs not focusing directly on typical causes of death).

      A detailed assessment of the interplay of the most predictive PGIs with their corresponding phenotypes.

      We have substantially revised the Introduction section focusing on making these novel contributions more explicit.

      (2) In the Methods section, it is not very clear why the authors specifically study the "within-sibship" samples. Is this for avoiding nurturing effects from parental genotypes or for controlling assortative mating? The authors should clarify the rationale behind the design.

      The substance-related rationale behind this approach was briefly discussed in the Introduction section while in the Methods section, we focused more on the technical description of our analyses. However, it is certainly worthwhile to clarify to the reader why within-sibship methods have been used. The revised passage in the methods section now states:

      “In addition to this population sample, we used a within-sibship analysis sample to assess the extent of direct and indirect genetic associations captured by the PGIs, as discussed in the introduction.”

      (3) Residual correlations of PGIs were no more than 0.050..." As a minor comment, since PGIs is a noisy variable, the correlation would be low; however, I don't think there are better ways to evaluate Cox assumptions, and in many cases, this assumption is not correct for strong predictors.

      Yes, these points are true. Overall, it is often implausible that empirical distributions exactly match distributional assumptions in statistical models. For example, it may not be realistic to expect that the mortality hazards across categories of independent variables stay exactly proportional during long mortality-follow-ups; some deviations from constant proportions are almost inevitable. However, there are reasonable grounds to argue that in case of moderate violations of the proportional hazards assumption, the estimates still remain interpretable for practical uses. They can be read as approximating average relative hazards over the study period (for discussion, see pages 42–47 in Allison P. 2014. Event history and survival analysis: Regression for longitudinal event data (second edition). Thousand Oaks: SAGE).

      (4) "PGI of ADHD (HR=1.08 95%CI 1.04;1.11 among men; HR=1.01 95%CI 0.97;1.05 among women; p=0.012 for difference)." Is this difference significant after multiple testing correction?

      We have presented multiple-testing adjusted p-values together with nominal ones in this and in all other instances where they are mentioned in the text. Additionally, Supplementary tables S5–S6 present multiple-adjusted p-values for each PGIs studied.

      (5) "Panel D displays that most PGIs had stronger associations with external (accidents, violent, suicide, and alcohol related deaths) than natural causes of death." Similar to the comment above, are there any results that are significantly different between internal and external?

      We have added the p-values of those variables that had larger differences in the revised text. Quoting from the revised article: “The HR differences between external and natural causes of death were nominally significant at the conventional 5% level for cannabis use (p=0.016), drinks per week (p=0.028), left out of social activity (p=0.029), ADHD (p=0.031), BMI (p=0.035) and height (p=0.049), but none of these differences remained significant after adjusting for 35 multiple tests. “

      (6) Table 1: The effect of the phenotype is stronger than the PGI; this is expected as PGI is a weak predictor and can be considered as "noised" measurement of true genetic value (Becker 2021 Nature Human behavior). Is there a way to adjust for the impact of noise in PGI at tagging genetic value and compare if the PGI effect is different from the phenotype effect?

      PGIs are certainly imperfect measures that contain a lot of noise. However, extracting new information from what is unknown is an extremely demanding exercise, and still further complicated for example, by that we do not know the exact benchmark of total genetic effect we should be aiming at. Different methods of heritability estimation, for instance, often give dramatically differing results – for reasons that are still up to scrutiny.

      We are thus not familiar with a method that could achieve satisfactory answer for this challenging task.

      Reviewer #2 (Recommendations for the authors):

      (3) Justification and Selection of PGIs:

      For several traits, such as BMI, multiple polygenic indices (PGIs) are currently available. The criteria used to select specific PGIs for this study are not clearly described. A more systematic and reproducible approach-for example, leveraging metadata from the PGS Catalog-could strengthen the justification for PGI selection and enhance the study's generalizability.

      There are numerous PGIs developed in the extensive GWAS literature, but a finite set of PGIs always needs to be chosen for any analysis. The rationale behind our decision to include every PGI from the repository of Becker et al. 2021 (full reference in the manuscript, see also https://www.thessgac.org/pgi-repository) that was available for the Finnish data (including the possibility to exclude overlapping samples, see our response to the next comment for more discussion) was to provide rigorous analysis by limiting the researchers degrees of freedom in arbitrarily choosing PGIs. Although it would have been tempting to not use some PGIs that were not expected to substantially correlate with mortality, we believe that our conservative strategy increases the credibility of the reported p-values, particularly the multiple adjustment should now work as intended. 

      We also mention now this rationale when discussing the chosen PGIs in the methods section: “As the independent variables of main interest, we used 35 different PGIs in the Polygenic Index repository by Becker et al., which were mainly based on GWASes using UK Biobank and 23andMe, Inc. data samples, but also other data collections. They were tailored for the Finnish data, i.e., excluding overlapping individuals between the original GWAS and our analysis and performing linkage-disequilibrium adjustment. We used every single-trait PGI defined in the repository (except for subjective well-being, for which we were unable to obtain a meta-analysis version that excluded the overlapping samples). By limiting the researchers’ freedom in selecting the measures, this conservative strategy should increase the validity of our estimates, particularly with regards to multiple-testing adjusted p-values.”

      (4) Overlap Between PGI Training Data and Study Sample:

      The authors should describe any overlap between the data used to develop the PGIs and the current study sample. If such overlap exists, it may lead to overestimation of effect sizes due to "double-dipping." A discussion of this issue and its potential implications is warranted, as similar concerns have been raised in studies using UK Biobank data.

      This is, fortunately, not a concern of our analysis. Overlapping samples were excluded in creating the PGIs that we used. We have now described this more clearly in the revised methods section.

      (1) Clarify the Methodology for Family-Based Cox Analysis:

      It is unclear what specific method was used to perform Cox regression in the family-based analysis. Please provide additional methodological details. ”

      We have described the method further and added an additional reference in the revision. The text now stands:

      “We compared these models to the corresponding within-sibship models, using the sibship identifier as the strata variable. This method employs a sibship-specific (instead of a whole-sample-wide baseline hazard in the population models) baseline hazard, and corresponds to a fixed-effects model in some other regression frameworks (e.g., linear model with sibship-specific intercepts)”

      (2) Clarify Timing of Measured Risk Factors Relative to Follow-Up:

      The main text should provide more detailed information regarding the timing of data collection for directly measured risk factors. Specifically, it should be clarified whether the measurements used correspond to the first available data for each individual after the start of follow-up, or if a different criterion was applied.

      BMI, self-rated health, alcohol consumption and smoking status were measured at the baseline survey of each dataset. Education was registered as the highest completed degree up to the end of 2019. Depression was a composite of survey self-report (at the time of the baseline survey), as well as depression-related medicine purchases and hospitalizations over a two-year period before the start of the individual’s follow-up.

      We have added more comprehensive information on the measurement of the phenotypes of interest in Supplementary table 2, including the timing of the measurement.

    1. eLife Assessment

      This work significantly advances our understanding of chromatin organization within regions of repetitive sequences in the parasitic protozoan Trypanosoma brucei. Using cutting edge interdisciplinary tools, the authors provide compelling evidence for two discrete types of repetitive DNA element-associated proteins- one set involved in essential centromere function; and, the other involved in glycoprotein antigenic variation via homologous recombination. Thus, these fundamental findings have implications for this parasite's biology, and for therapeutic targeting in kinetoplastid diseases. This work will be exciting to those in the centromere/mitosis and parasite immunity fields.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeat-containing intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Strengths and Weaknesses:

      The manuscript was previously reviewed through Review Commons. As noted there, the experiments are well controlled, the claims are well supported, and the methods are clearly described. The conclusions are convincing. All concerns I raised have been addressed except one (minor point #8):

      "The way the authors mapped the ChIP-seq data is potentially problematic when analyzing the same repeat type in different genomic regions. Reads with multiple equally good mapping positions were assigned randomly. This is fine when analyzing repeats by type, independent of genomic position, which is what the authors do to reach their main conclusions. However, several figures (Fig. 3B, Fig. 4B, Fig. 5B, Fig. 7) show the same repeat type at specific genomic locations." Due to the random assignment, all of these regions merely show the average signal for the given repeat. I find it misleading that this average is plotted out at "specific" genomic regions.<br /> Initially, I suggested a workaround, but the authors clarified why the workaround was not feasible, and their explanation is reasonable to me. That said, the figures still show a signal at positions where they can't be sure it actually exists. If this cannot be corrected analytically, it should at least be noted in the figure legends, Results, or Discussion.

      Importantly, the authors' conclusions do not hinge on this point; they are appropriately cautious, and their interpretations remain valid regardless.

      Significance:

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with mini-chromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Comments on revised version:

      All my recommendations have been addressed.

    3. Reviewer #2 (Public review):

      The Trypanosoma brucei genome, like that of other eukaryotes, contains diverse repetitive elements. Yet, the chromatin-associated proteome of these regions remains largely unexplored. This study represents a very important conceptual and technical advancement by employing synthetic TALE DNA-binding proteins fused to YFP to selectively capture proteins associated with specific repetitive sequences in T. brucei chromatin. The data presented here are convincing, supported by appropriate controls and a well-validated methodology, aligned with current state-of-the-art approaches.

      The authors used synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in T. brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to identify specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and a protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosomes.

      This study represents a significant conceptual and technical advancement. To the best of our knowledge, it is the first report of employing TALE-YFP for affinity-based detection of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding the organization in these important regions of the trypanosomal chromatin and provides the foundation for investigating the functional roles of associated proteins in parasite biology. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as to scientists investigating the roles of repetitive genomic elements in chromatin structure and their functional role in higher eukaryotes.

      Importantly, any essential or unique interacting partners identified using the approach employed here, could serve as a potential target for therapeutic intervention in severe tropical diseases cause by kinetoplastids.

    4. Author response:

      Point-by-point description of the revisions:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      In this article, the authors used the synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in Trypanosoma brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to detect and identified, using YFP-pulldown, specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and the protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosome.

      Major comments:

      Are the key conclusions convincing?

      The authors reported that they have successfully used TALE-based affinity selection of proteinassociated with repetitive sequences in the T. brucei genome. They claimed that this study has provided new information regarding the relevance of the repetitive region in the genome to chromosome integrity, telomere biology, chromosomal segregation and immune evasion strategies. These conclusions are based on high-quality research, and it is, basically, merits publication, provided that some major concerns, raised below, will be addressed before acceptance for publication.

      (1) The authors used TALE-YFP approach to examine the proteome associated with five different repetitive regions of the T. brucei genome and confirmed the binding of TALE-YFP with Chip-seq analyses. Ultimately, they got the list of proteins that bound to synthetic proteins, by affinity purification and LS-MS analysis and concluded that these proteins bind to different repetitive regions of the genome. There are two control proteins, one is TRF-YFP and the other KKT2-YFP, used to confirm the interactions. However, there are no experiment that confirms that the analysis gives some insight into the role of any putative or new protein in telomere biology, VSG gene regulation or chromosomal segregation. The proteins, which have already been reported by other studies, are mentioned. Although the author discovered many proteins in these repetitive regions, their role is yet unknown. It is recommended to take one or more of the new putative proteins from the repetitive elements and show whether or not they (1) bind directly to the specific repetitive sequence (e.g., by EMSA); (2) it is recommended that the authors will knockdown of one or a small sample of the new discovered proteins, which may shed light on their function at the repetitive region, as a proof of concept.

      The main request from Referee 1 is for individual evaluation of protein-DNA interaction for a few candidates identified in our TALE-YFP affinity purifications, particularly using EMSA to identify binding to the DNA repeats used for the TALE selection. In our opinion, such an approach would not actually provide the validation anticipated by the reviewer. The power of TALE-YFP affinity selection is that it enriches for protein complexes that associate with the chromatin that coats the target DNA repetitive elements rather than only identifying individual proteins or components of a complex that directly bind to DNA assembled in chromatin.

      The referee suggests we express recombinant proteins and perform EMSA for selected candidates, but many of the identified proteins are unlikely to directly bind to DNA – they are more likely to associate with a combination of features present in DNA and/or chromatin (e.g. specific histone variants or histone post-translational modifications). Of course, a positive result would provide some validation but only IF the tested protein can bind DNA in isolation – thus, a negative result would be uninformative.

      In fact, our finding that KKT proteins are enriched using the 177R-TALE (minichromosome repeat sequence) identifies components of the trypanosome kinetochore known (KKT2) or predicted (KKT3) to directly bind DNA (Marciano et al., 2021; PMID: 34081090), and likewise the TelR-TALE identifies the TRF component that is known to directly associate with telomeric (TTAGGG)n repeats (Reis et al 2018; PMID: 29385523). This provides reassurance on the specificity of the selection, as does the lack of cross selectivity between different TALEs used (see later point 3 below). The enrichment of the respective DNA repeats quantitated in Figure 2B (originally Figure S1) also provides strong evidence for TALE selectivity.

      It is very likely that most of the components enriched on the repetitive elements targeted by our TALE-YFP proteins do not bind repetitive DNA directly. The TRF telomere binding protein is an exception – but it is the only obvious DNA binding protein amongst the many proteins identified as being enriched in our TelR-TALE-YFP and TRF-YFP affinity selections.

      The referee also suggests that follow up experiments using knockdown of the identified proteins found to be enriched on repetitive DNA elements would be informative. In our opinion, this manuscript presents the development of a new methodology previously not applied to trypanosomes, and referee 2 highlights the value of this methodological development which will be relevant for a large community of kinetoplastid researchers. In-depth follow-up analyses would be beyond the scope of this current study but of course will be pursued in future. To be meaningful such knockdown analyses would need to be comprehensive in terms of their phenotypic characterisation (e.g. quantitative effects on chromosome biology and cell cycle progression, rates and mechanism of recombination underlying antigenic variation, etc) – simple RNAi knockdowns would provide information on fitness but little more. This information is already publicly available from genome-wide RNAi screens (www.tritrypDB.org), with further information on protein location available from the genome-wide protein localisation resource (Tryptag.org). Hence basic information is available on all targets selected by the TALEs after RNAi knock down but in-depth follow-up functional analysis of several proteins would require specific targeted assays beyond the scope of this study.

      (2) NonR-TALE-YFP does not have a binding site in the genome, but YFP protein should still be expressed by T. brucei clones with NLS. The authors have to explain why there is no signal detected in the nucleus, while a prominent signal was detected near kDNA (see Fig.2). Why is the expression of YFP in NonR-TALE almost not shown compared to other TALE clones?

      The NonR-TALE-YFP immunolocalisation signal indeed is apparently located close to the kDNA and away from the nucleus. We are not sure why this is so, but the construct is sequence validated and correct. However, we note that artefactual localisation of proteins fused to a globular eGFP tag, compared to a short linear epitope V5 tag, near to the kinetoplast has been previously reported (Pyrih et al, 2023; PMID: 37669165).

      The expression of NonR-TALE-YFP is shown in Supplementary Fig. S2 in comparison to other TALE proteins. Although it is evident that NonR-TALE-YFP is expressed at lower levels than other TALEs (the different TALEs have different expression levels), it is likely that in each case the TALE proteins would be in relative excess.

      It is possible that the absence of a target sequence for the NonR-TALE-YFP in the nucleus affects its stability and cellular location. Understanding these differences is tangential to the aim of this study.

      However, importantly, NonR-TALE-YFP is not the only control for used for specificity in our affinity purifications. Instead, the lack of cross-selection of the same proteins by different TALEs (e.g. TelR-TALE-YFP, 177R-TALE-YFP) and the lack of enrichment of any proteins of interest by the well expressed ingiR-TALE-YFP or 147R-TALE-YFP proteins each provide strong evidence for the specificity of the selection using TALEs, as does the enrichment of similar protein sets following affinity purification of the TelR-TALE-YFP and TRF-YFP proteins which both bind telomeric (TTAGGG)n repeats. Moreover, control affinity purifications to assess background were performed using cells that completely lack an expressed YFP protein which further support specificity (Figure 6).

      We have added text to highlight these important points in the revised manuscript:

      Page 8:

      “However, the expression level of NonR-TALE-YFP was lower than other TALE-YFP proteins; this may relate to the lack of DNA binding sites for NonR-TALE-YFP in the nucleus.”

      Page 8:

      “NonR-TALE-YFP displayed a diffuse nuclear and cytoplasmic signal; unexpectedly the cytoplasmic signal appeared to be in the vicinity the kDNA of the kinetoplast (mitochrondria). We note that artefactual localisation of some proteins fused to an eGFP tag has previously been observed in T. brucei (Pyrih et al, 2023).”

      Page 10:

      Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4). Thus, the most enriched proteins are specific to TelR-TALE-YFP-associated chromatin rather than to the TALE-YFP synthetic protein module or other chromatin.

      (3) As a proof of concept, the author showed that the TALE method determined the same interacting partners enrichment in TelR-TALE as compared to TRF-YFP. And they show the same interacting partners for other TALE proteins, whether compared with WT cells or with the NonR-TALE parasites. It may be because NonR-TALE parasites have almost no (or very little) YFP expression (see Fig. S3) as compared to other TALE clones and the TRF-YFP clone. To address this concern, there should be a control included, with proper YFP expression.

      See response to point 2, but we reiterate that the ingi-TALE -YFP and 147R-TALE-YFP proteins are well expressed (western original Fig. S3 now Fig. S2) but few proteins are detected as being enriched or correspond to those enriched in TelR-TALE-YFP or TRF-YFP affinity purifications (see Fig. S9). Therefore, the ingi-TALE -YFP and 147R-TALE-YFP proteins provide good additional negative controls for specificity as requested. To further reassure the referee we have also included additional volcano plots which compare TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP to the ingiR-TALE-YFP affinity selection (new Figure S8). As with No-YFP or NonR-TALE-YFP controls, the use of ingiR-TALE-YFP as a negative control demonstrates that known telomere associated proteins are enriched in TelR-TALE-YFP affinity purification, RPA subunits enriched with 70R-TALE-YFP and Kinetochore KKT poroteins enriched with 177RTALE-YFP. These analyses demonstrate specificity in the proteins enriched following affinity purification of our different TALE-YFPs and provide support to strengthen our original findings.

      We now refer to use of No-YFP, NonR-TALE-YFP, and ingiR-TALE -YFP as controls for comparison to TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP in several places:

      Page10:

      “Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4).”

      Page 11:

      “Thus, the nuclear ingiR-TALE-YFP provides an additional chromatin-associated negative control for affinity purifications with the TelR-TALE-YFP, 70R-TALE-YFP and 177R-TALE-YFP proteins (Fig. S8).”

      “Proteins identified as being enriched with 70R-TALE-YFP (Figure 6D) were similar in comparisons with either the No-YFP, NonR-TALE-YFP or ingiR-TALE-YFP as negative controls.”

      Top Page 12:

      “The same kinetochore proteins were enriched regardless of whether the 177R-TALE proteomics data was compared with No-YFP, NonR-TALE or ingiR-TALE-YFP controls.”

      Discussion Page 13:

      “Regardless, the 147R-TALE and ingiR-TALE proteins were well expressed in T. brucei cells, but their affinity selection did not significantly enrich for any relevant proteins. Thus, 147R-TALE and ingiR-TALE provide reassurance for the overall specificity for proteins enriched TelR-TALE, 70R-TALE and 177R-TALE affinity purifications.”

      (4) After the artificial expression of repetitive sequence binding five-TALE proteins, the question is if there is any competition for the TALE proteins with the corresponding endogenous proteins? Is there any effect on parasite survival or health, compared to the control after the expression of these five TALEs YFP protein? It is recommended to add parasite growth curves, for all the TALE proteins expressing cultures.

      Growth curves for cells expressing TelR-TALE-YFP, 177R-TALE-YFP and ingiR-TALE-YFP are now included (New Fig S3A). No deficit in growth was evident while passaging 70R-TALE-YFP, 147R-TALE-YFP, NonR-TALE-YFP cell lines (indeed they grew slightly better than controls).

      The following text has been added page 8:

      “Cell lines expressing representative TALE-YFP proteins displayed no fitness deficit (Fig. S3A).”

      (5) Since the experiments were performed using whole-cell extracts without prior nuclear fractionation, the authors should consider the possibility that some identified proteins may have originated from compartments other than the nucleus. Specifically, the detection of certain binding proteins might reflect sequence homology (or partial homology) between mitochondrial DNA (maxicircles and minicircles) and repetitive regions in the nuclear genome. Additionally, the lack of subcellular separation raises the concern that cytoplasmic proteins could have been co-purified due to whole cell lysis, making it challenging to discern whether the observed proteome truly represents the nuclear interactome.

      In our experimental design, we confirmed bioinformatically that the repeat sequences targeted were not represented elsewhere in the nuclear or mitochondrial genome (kDNA). The absence of subcellular fractionation could result in some cytoplasmic protein selection, but this is unlikely since each TALE targets a specific DNA sequence but is otherwise identical such that cross-selection of the same contaminating protein set would be anticipated if there was significant non-specific binding. We have previously successfully affinity selected 15 chromatin modifiers and identified associated proteins without major issues concerning cytoplasmic protein contamination (Staneva et al 2021 and 2022; PMID: 34407985 and 36169304). Of course, the possibility that some proteins are contaminants will need to be borne in mind in any future follow-up analysis of proteins of interest that we identified as being enriched on specific types of repetitive element in T. brucei. Proteins that are also detected in negative control, or negative affinity selections such as No-YFP, NoR-YFP, IngiR-TALE or 147R-TALE must be disregarded.

      (6) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      As mentioned earlier, the author claimed that this study has provided new information concerning telomere biology, chromosomal segregation mechanisms, and immune evasion strategies. But there are no experiments that provides a role for any unknown or known protein in these processes. Thus, it is suggested to select one or two proteins of choice from the list and validate their direct binding to repetitive region(s), and their role in that region of interaction.

      As highlighted in response to point 1 the suggested validation and follow up experiments may well not be informative and are beyond the scope of the methodological development presented in this manuscript. Referee 2 describes the study in its current form as “a significant conceptual and technical advancement” and “This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology.”

      The Referee’s phrase ‘validate their direct binding to repetitive region(s)’ here may also mean to test if any of the additional proteins that we identified as being enriched with a specific TALE protein actually display enrichment over the repeat regions when examined by an orthogonal method. A key unexpected finding was that kinetochore proteins including KKT2 are enriched in our affinity purifications of the 177R-TALE-YFP that targets 177bp repeats (Figure 6F). By conducting ChIP-seq for the kinetochore specific protein KKT2 using YFP-KKT2 we confirmed that KKT2 is indeed enriched on 177bp repeat DNA but not flanking DNA (Figure 7). Moreover, several known telomere-associated proteins are detected in our affinity selections of TelRTALE-YFP (Figure 6B, FigS6; see also Reis et al, 2018 Nuc. Acids Res. PMID: 29385523; Weisert et al, 2024 Sci. Reports PMID: 39681615).

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The answer for this question depends on what the authors want to present as the achievements of the present study. If the achievement of the paper was is the creation of a new tool for discovering new proteins, associated with the repeat regions, I recommend that they add a proof for direct interactions between a sample the newly discovered proteins and the relevant repeats, as a proof of concept discussed above, However, if the authors like to claim that the study achieved new functional insights for these interactions they will have to expand the study, as mentioned above, to support the proof of concept.

      See our response to point 1 and the point we labelled ‘6’ above.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      I think that they are realistic. If the authors decided to check the capacity of a small sample of proteins (which was unknown before as a repetitive region binding proteins) to interacts directly with the repeated sequence, it will substantially add of the study (e.g., by EMSA; estimated time: 1 months). If the authors will decide to check the also the function of one of at least one such a newly detected proteins (e.g., by KD), I estimate the will take 3-6 months.

      As highlighted previously the proposed EMSA experiment may well be uninformative for protein complex components identified in our study or for isolated proteins that directly bind DNA in the context of a complex and chromatin. RNAi knockdown data and cell location data (as well as developmental expression and orthology data) is already available through tritrypDB.org and trtyptag.org

      Are the data and the methods presented in such a way that they can be reproduced? Yes

      Are the experiments adequately replicated, and statistical analysis adequate?

      The authors did not mention replicates. There is no statistical analysis mentioned.

      The figure legends indicate that all volcano plots of TALE affinity selections were derived from three biological replicates. Cutoffs used for significance: P < 0.05 (Student's t-test).

      For ChiP-seq two biological replicates were analysed for each cell line expressing the specific YFP tagged protein of interest (TALE or KKT2). This is now stated in the relevant figure legends – apologies for this oversight. The resulting data are available for scrutiny at GEO: GSE295698.

      Minor comments:

      Specific experimental issues that are easily addressable.

      The following suggestions can be incorporated:

      (1) Page 18, in the material method section author mentioned four drugs: Blasticidine, Phleomycin and G418, and hygromycin. It is recommended to mention the purpose of using these selective drugs for the parasite. If clonal selection has been done, then it should also be mentioned.

      We erroneously added information on several drugs used for selection in our labaoratory. In fact all TALE-YFP construct carry the Bleomycin resistance genes which we select for using Phleomycin. Also, clones were derived by limiting dilution immediately after transfection. We have amended the text accordingly:

      Page 17/18:

      “Cell cultures were maintained below 3 x 106 cells/ml. Pleomycin 2.5 µg/ml was used to select transformants containing the TALE construct BleoR gene.”

      “Electroporated bloodstream cells were added to 30 ml HMI-9 medium and two 10-fold serial dilutions were performed in order to isolate clonal Pleomycin resistant populations from the transfection. 1 ml of transfected cells were plated per well on 24-well plates (1 plate per serial dilution) and incubated at 37°C and 5% CO2 for a minimum of 6 h before adding 1 ml media containing 2X concentration Pleomycin (5 µg/ml) per well.”

      (2) In the method section the authors mentioned that there is only one site for binding of NonR-TALE in the parasite genome. But in Fig. 1C, the authors showed zero binding site. So, there is one binding site for NonR-TALE-YFP in the genome or zero?

      We thank the reviewer for pointing out this discrepancy. We have checked the latest Tb427v12 genome assembly for predicted NonR-TALE binding sites and there are no exact matches. We have corrected the text accordingly.

      Page 7:

      “A control NonR-TALE protein was also designed which was predicted to have no target sequence in the T. brucei genome.”

      Page 17:

      “A control NonR-TALE predicted to have no recognised target in the T. brucei geneome was designed as follows: BLAST searches were used to identify exact matches in the TREU927 reference genome. Candidate sequences with one or more match were discarded.”

      (3) The authors used two different anti-GFP antibodies, one from Roche and the other from Thermo Fisher. Why were two different antibodies used for the same protein?

      We have found that only some anti-GFP antibodies are effective for affinity selection of associated proteins, whereas others are better suited for immunolocalisation. The respective suppliers’ antibodies were optimised for each application.

      (4) Page 6: in the introduction, the authors give the number of total VSG genes as 2,634. Is it known how many of them are pseudogenes?

      This value corresponds to the number reported by Consentino et al. 2021 (PMID: 34541528) for subtelomeric VSGs, which is similar to the value reported by Muller et al 2018 (PMID: 30333624) (2486), both in the same strain of trypanosomes as used by us. Based on the earlier analysis by Cross et al (PMID: 24992042), 80% of the identified VSGs in their study (2584) are pseudogenes. This approximates to the estimation by Consentino of 346/2634 (13%) being fully functional VSG genes at subtelomeres, or 17% when considering VSGs at all genomic locations (433/2872).

      (5) I found several typos throughout the manuscript.

      Thank you for raising this, we have read through the manuscipt several times and hopefully corrected all outstanding typos.

      (6) Fig. 1C: Table: below TOTAL 2nd line: the number should be 1838 (rather than 1828)

      Corrected- thank you.

      - Are prior studies referenced appropriately? Yes

      - Are the text and figures clear and accurate? Yes

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Suggested above

      Reviewer #1 (Significance):

      Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field:

      This study represents a significant conceptual and technical advancement by employing a synthetic TALE DNA-binding protein tagged with YFP to selectively identify proteins associated with five distinct repetitive regions of T. brucei chromatin. To the best of my knowledge, it is the first report to utilize TALE-YFP for affinity-based isolation of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology. Importantly, any essential or unique interacting partners identified could serve as potential targets for therapeutic intervention.

      - Place the work in the context of the existing literature (provide references, where appropriate). I agree with the information that has already described in the submitted manuscript, regarding its potential addition of the data resulted and the technology established to the study of VSGs expression, kinetochore mechanism and telomere biology.

      - State what audience might be interested in and influenced by the reported findings. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as scientists investigating chromatin structure and the functional roles of repetitive genomic elements in higher eukaryotes.

      - (1) Define your field of expertise with a few keywords to help the authors contextualize your point of view. Protein-DNA interactions/ chromatin/ DNA replication/ Trypanosomes

      - (2) Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. None

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeatcontaining intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Major Comments

      None. The experiments are well-controlled, claims well-supported, and methods clearly described. Conclusions are convincing.

      Thank you for these positive comments.

      Minor Comments

      (1) Fig. 2 - I couldn't find an uncropped version showing multiple cells. If it exists, it should be linked in the legend or main text; Otherwise, this should be added to the supplement.

      The images presented represent reproducible analyses, and independently verified by two of the authors. Although wider field of view images do not provide the resolution to be informative on cell location, as requested we have provided uncropped images in new Fig. S4 for all the cell lines shown in Figure 2A.

      In addition, we have included as supplementary images (Fig. S3B) additional images of TelRTALE-YFP, 177R-TALE-YFP and ingiR-TALE YFP localisation to provide additional support their observed locations presented in Figure 1. The set of cells and images presented in Figure 2A and in Fig S3B were prepared and obtained by a different authors, independently and reproducibly validating the location of the tagged protein.

      (2) I think Suppl. Fig. 1 is very valuable, as it is a quantification and summary of the ChIP-seq data. I think the authors could consider making this a panel of a main figure. For the main figure, I think the plot could be trimmed down to only show the background and the relevant repeat for each TALE protein, leaving out the non-target repeats. (This relates to minor comment 6.) Also, I believe, it was not explained how background enrichment was calculated.

      We are grateful for the reviewer’s positive view of original Fig. S1 and appreciate the suggestion. We have now moved these analysis to part B of main Figure 2 in the revised manuscript – now Figure 2B. We have also provided additional details in the Methods section on the approaches used to assess background enrichment.

      Page 19:

      “Background enrichment calculation

      The genome was divided into 50 bp sliding windows, and each window was annotated based on overlapping genomic features, including CIR147, 177 bp repeats, 70 bp repeats, and telomeric (TTAGGG)n repeats. Windows that did not overlap with any of these annotated repeat elements were defined as "background" regions and used to establish the baseline ChIP-seq signal. Enrichment for each window was calculated using bamCompare, as log₂(IP/Input). To adjust for background signal amongst all samples, enrichment values for each sample were further normalized against the corresponding No-YFP ChIP-seq dataset.”

      Note: While revising the manuscript we also noticed that the script had a nomalization error. We have therefore included a corrected version of these analyses as Figure 2B (old Fig. S1)

      (3) Generally, I would plot enrichment on a log2 axis. This concerns several figures with ChIP-seq data.

      Our ChIP-seq enrichment is calculated by bamCompare. The resulting enrichment values are indeed log2 (IP/Input). We have made this clear in the updated figures/legends.

      (4) Fig. 4C - The violin plots are very hard to interpret, as the plots are very narrow compared to the line thickness, making it hard to judge the actual volume. For example, in Centromere 5, YFP-KKT2 is less enriched than 147R-TALE over most of the centromere with some peaks of much higher enrichment (as visible in panel B), however, in panel C, it is very hard to see this same information. I'm sure there is some way to present this better, either using a different type of plot or by improving the spacing of the existing plot.

      We thank the reviewer for this suggestion; we have elected to provide a Split-Violin plot instead. This improves the presentation of the data for each centromere. The original violin plot in Figure 4C has been replaced with this Split-Violin plot (still Figure 4C).

      (5) Fig. 6 - The panels are missing an x-axis label (although it is obvious from the plot what is displayed).

      Maybe the "WT NO-YFP vs" part that is repeated in all the plot titles could be removed from the title and only be part of the x-axis label?

      In fact, to save space the X axis was labelled inside each volcano plot but we neglected to indicate that values are a log2 scale indicating enrichment. This has been rectified – see Figure 6, and Fig. S7, S8 and S9.

      (6) Fig. 7 - I would like to have a quantification for the examples shown here. In fact, such a quantification already exists in Suppl. Figure 1. I think the relevant plots of that quantification (YFPKKT2 over 177bp-repeats and centromere-repeats) with some control could be included in Fig. 7 as panel C. This opportunity could be used to show enrichment separated out for intermediate-sized, mini-, and megabase-chromosomes. (relates to minor comment 2 & 8)

      The CIR147 sequence is found exclusively on megabase-sized chromosomes, while the 177 bp repeats are located on intermediate- and mini-sized chromosomes. Due to limitations in the current genome assembly, it is not possible to reliably classify all chromosomes into intermediate- or mini- sized categories based on their length. Therefore, original Supplementary Fig. S1 presented the YFP-KKT2 enrichment over CIR147 and 177 bp repeats as a representative comparison between megabase chromosomes and the remaining chromosomes (corrected version now presented as main Figure 2B). Additionally, to allow direct comparison of YFP-KKT2 enrichment on CIR147 and 177 bp repeats we have included a new plot in Figure 7C which shows the relative enrichment of YFP-KKT2 on these two repeat types.

      We have added the following text , page 12:

      “Taking into account the relative to the number of CIR147 and 177 bp repeats in the current T.brucei genome (Cosentino et al., 2021; Rabuffo et al., 2024), comparative analyses demonstrated that YFP-KKT2 is enriched on both CIR147 and 177 bp repeats (Figure 7C).”

      (7) Suppl. Fig. 8 A - I believe there is a mistake here: KKT5 occurs twice in the plot, the one in the overlap region should be KKT1-4 instead, correct?

      Thanks for spotting this. It has been corrected

      (8) The way that the authors mapped ChIP-seq data is potentially problematic when analyzing the same repeat type in different regions of the genome. The authors assigned reads that had multiple equally good mapping positions to one of these mapping positions, randomly.

      This is perfectly fine when analysing repeats by their type, independent of their position on the genome, which is what the authors did for the main conclusions of the work.

      However, several figures show the same type of repeat at different positions in the genome. Here, the authors risk that enrichment in one region of the genome 'spills' over to all other regions with the same sequence. Particularly, where they show YFP-KKT2 enrichment over intermediate- and mini-chromosomes (Fig. 7) due to the spillover, one cannot be sure to have found KKT2 in both regions.

      Instead, the authors could analyze only uniquely mapping reads / read-pairs where at least one mate is uniquely mapping. I realize that with this strict filtering, data will be much more sparse. Hence, I would suggest keeping the original plots and adding one more quantification where the enrichment over the whole region (e.g., all 177bp repeats on intermediate-/mini-chromosomes) is plotted using the unique reads (this could even be supplementary). This also applies to Fig. 4 B & C.

      We thank the reviewer for their thoughtful comments. Repetitive sequences are indeed challenging to analyze accurately, particularly in the context of short read ChIP-seq data. In our study, we aimed to address YFP-KKT2 enrichment not only over CIR147 repeats but also on 177 bp repeats, using both ChIP-seq and proteomics using synthetic TALE proteins targeted to the different repeat types. We appreciate the referees suggestion to consider uniquely mapped reads, however, in the updated genome assembly, the 177 bp repeats are frequently immediately followed by long stretches of 70 bp repeats which can span several kilobases. The size and repetitive nature of these regions exceeds the resolution limits of ChIP-seq. It is therefore difficult to precisely quantify enrichment across all chromosomes.

      Additionally, the repeat sequences are highly similar, and relying solely on uniquely mapped reads would result in the exclusion of most reads originating from these regions, significantly underestimating the relative signals. To address this, we used Bowtie2 with settings that allow multi-mapping, assigning reads randomly among equivalent mapping positions, but ensuring each read is counted only once. This approach is designed to evenly distribute signal across all repetitive regions and preserve a meaningful average.

      Single molecule methods such as DiMeLo (Altemose et al. 2022; PMID: 35396487) will need to be developed for T. brucei to allow more accurate and chromosome specific mapping of kinetochore or telomere protein occupancy at repeat-unique sequence boundaries on individual chromosomes.

      Reviewer #2 (Significance):

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with minichromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Thank you for supporting the novelty and broad interest of our manuscript

      My field of expertise / Point of view:

      I'm a computer scientist by training and am now a postdoctoral bioinformatician in a molecular parasitology laboratory. The laboratory is working on antigenic variation in T. brucei. The focus of my work is on analyzing sequencing data (such as ChIP-seq data) and algorithmically improving bioinformatic tools.

    1. eLife Assessment

      This important study examines the role of map3k1, a MAP3K family member that has both kinase and ubiquitin ligase domains, in the differentiation of progenitors in the flatworm Planaria. The convincing analyses demonstrate that map3k1 acts within progenitors to restrict their premature differentiation and to prevent formation of teratomas. This work would be of interest to researchers in the fields of regeneration, developmental biology, and aging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors, ectopic neurons and glands along the AP axis and pharynx in ectopic anterior positions. The rest of the study shows that positional information is largely unaffected by loss of map3k1. However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route. They also show that "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas. In short, this study convincingly demonstrates that in planaria, map3k1 maintains progenitor cells in an undifferentiated state, preventing premature fate commitment until they encounter the appropriate signals, either positional cues within a designated region or contact-dependent inputs from surrounding tissues.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work is high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not other invertebrates.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Weaknesses:

      The authors have satisfactorily addressed our previous concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors (Fig. 1), ectopic neurons and glands along the AP axis (Fig. 2) and pharynx in ectopic anterior positions (Fig. 3). The rest of the study show that positional information is largely unaffected by loss of map3k1 (Fig. 4,5). However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route (Fig. 6). They also show that an ill-defined "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work appears to be high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not in other invertebrates.

      Weaknesses:

      (1) The paper is largely descriptive with no mechanistic insights. 

      The mechanistic insights we aim to address are primarily at the cellular systems level – how adult progenitor cells produce pattern. Specifically, we uncovered strong evidence that regulation of differentiation is an active process occurring in migratory progenitors and that this regulation is a major component of pattern formation during the adult processes of tissue turnover and regeneration. The map3k1 phenotype provided a tool used to reveal the existence of this regulation, and to understand the patterning abnormalities prevented by this regulatory mechanism. We updated the text in several places to make clearer some of this emphasis. For example, in the Discussion: "We suggest that differentiation is restricted during migratory targeting as an essential component of pattern formation, with the map3k1 RNAi phenotype indicating the existence and purpose of this element of patterning." 

      Naturally, identifying a particular molecule involved in this process is of interest for understanding molecular mechanism; this would allow for comparison to other cellular systems in other organisms and would focus future molecular inquiry. Future molecular studies into the mechanism of Map3k1 regulation and its downstream signaling will be fascinating as next steps towards understanding the process at the molecular level more deeply. We also added some discussion considering the types of upstream activation cues that could potentially be associated with Map3k1 regulation to suppress differentiation. 

      (2) Given the severe phenotypes of long-term depletion of map3k1, it is important that this exact timepoint is provided in the methods, figures, figure legends and results. 

      We removed the use of the term “long-term” and instead added timepoints used to all figure legends. We also added a summary of timepoints used in the methods section and included RNAi timepoint labels in figures where a phenotype progression over time is relevant to interpretation. For timecourses, we also added suitable time information to text in the results. 

      (3) Figure 1C, the ectopic eyes are difficult to see, please add arrows. 

      To improve visualization, we replaced the example animal in the original Figure 1C with one that has a stronger phenotype, including arrows pointing to every ectopic event. Additionally, we included magnified images of optic cup cells and photoreceptor neurons in the trunk and tail region. This is now Figure 1B.

      (4) line 217 - why does the n=2/12 animals not match the values in Figure 3B, which is 11/12 and 12/12. The numbers don't add up. Please correct/explain. 

      In Figure 3B in the submitted version (3/18 had cells in the tail) had more animals scored (6 animals from a replicate experiment where 1/6 showed the cells in the tail) than the total scored (2/12 had cells in the tail) in the text, which did not have the animals from the replicate added during writing. The results are the same, just different sample sizes were noted in those locations and we fixed this issue. In the updated Figure 3, the order of presentation has shifted (e.g., prior 3B is now in 3C and Figure 3_figure supplement 1). We made sure to include numbers to all figure panels. 

      (5) Figure panels do not match what is written in the results section. There is no Figure 6E. Please correct.

      Thank you for catching this. We have gone through figures and text after editing to make sure that text callouts are appropriately matched to the figures. 

      Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Thank you to the reviewer for the positive feedback

      Weaknesses:

      The article presents a provocative idea related to the importance of positional control for organs and cells, which is at least in part regulated by map3k1. Nonetheless, the role of map3k1 or its potential interaction with regulators of the anterior-posterior, mediolateral axes, and PCGs is somewhat superficial. The authors could elaborate or even speculate more in the discussion section and the different scenarios incorporating these axial modulators into the map3k1 model presented in Figure 8 

      First, to strengthen the support for our finding that positional information is largely unaffected in map3k1 RNAi animals, we added data regarding the expression of additional relevant position control genes (PCGs) –ndl-4, ptk7, sp5, and wnt11-1 – to the PCG panel in Figure 5. The expression domain of ndl-4, an FGF receptor-like protein family member that contributes to head patterning and anterior pole maintenance, was normal in map3k1 RNAi. wnt11-1, a PCG with expression concentrated in the posterior end of the animal and with expression dependent on general Wnt activity, was also normal in map3k1 RNAi animals. ptk7, RNAi of which can result in supernumerary pharynges, also showed normal expression in map3k1 RNAi animals. Finally, sp5, a Wnt-activated gene with expression in the tail, also showed normal expression in map3k1 RNAi animals. 

      Second, to further support the conclusion that cells are not suitably responding to positional information after map3k1 RNAi, which we argue normally dictates where differentiation should occur, we added examples of differentiated cell types that are ectopically positioned within an atypical PCG expression domain for that cell type (Figure 5C). This underscores that following map3k1 RNAi the PCG expression domains do not change, but the pattern of differentiated cell types relative to these domains does shift. We also added data showing that regenerating tails had a proper wntP-2 gradient, but an anterior regenerating pharynx appeared outside of this wntP-2<sup>+</sup> zone and inside of an ndl-5<sup>+</sup> zone (Figure 5- figure supplement 1E). We added some discussion of these new data in the Figure 5 results section. We also noted, regarding independent recent map3k1 work (Lo, 2025), some evidence exists that a minor posterior shift in ndl-5 expression can occur after map3k1 RNAi.

      Next, we added a new element to the model figure (Figure 8B) depicting that PCG expression domains remain normal after map3k1 RNAi, with ectopic differentiation occurring in an incorrect positional information environment. We refer to this new panel in the discussion: "We suggest that map3k1 is not required for the spatial distribution of progenitor-extrinsic differentiation-promoting cues themselves, but for progenitors to be restricted from differentiating until these cues are received (Figure 8B)."; we then follow this statement with a summary in the Discussion of six pieces of evidence that support this model.

      Finally, we added some additional text to the discussion section about candidate mechanisms by which extrinsic cues could potentially regulate Map3k1, pointing to potential future inquiry directions. We suggest that map3k1 is not involved in regulating PCG activity domains themselves, but instead acts as a brake on differentiation within migratory progenitors through active signaling. This brake is then lifted when the progenitors hit their correct PCG-based migratory target, or when they hit their target tissue. How that occurs mechanistically is unknown. One scenario is that each progenitor is tuned to respond to a particular PCG-regulated environment (such as a particular ECM or signaling environment) to generate a molecular change that inactivates Map3K1 signaling, such as by inactivating or disengaging an RTK signal. Alternatively, the migratory process in progenitors could engage the Map3K1 signal, enabling signal cessation with arrival at a target location. When Map3K1 is active it could result in a transcriptional state that prevents full expression of differentiated factors required for maturation, tissue incorporation, and cessation of migration. These considerations are now added to the discussion.

      The article can be improved by addressing inconsistencies and adding details to the results, including the main figures and supplements. This represents one of the most significant weaknesses of this otherwise intriguing manuscript. Below are some examples of a few figures, but the authors are expected to pay close attention to the remaining figures in the paper.

      Details associated with the number of animals per experiment, statistical methods used, and detailed descriptions of figures appear inconsistent or lacking in almost all figures. In some instances, the percentage of animals affected by the phenotype is shown without detailing the number of animals in the experiment or the number of repeats. Figures and their legends throughout the paper lack details on what is represented and sometimes are mislabeled or unrelated. 

      We endeavored to ensure that these noted details are present throughout the legends and figures for all figure panels.

      Specifically, the arrows in Figure 1A are different colors. Still, no reasoning is given for this, and in the exact figure, the top side (1A) shows the percentages and the number of animals below. 

      The only reason for the different colored arrows was for visibility purposes. To avoid confusion, we now use white arrows for all FISH images in figure 1, and where ever else possible. We also replaced the percentages with n numbers in the bottom left corner of the live images in Figure 1A. 

      Conversely, in Figures 1B, C, and D, no details on the number of animals or percentages are shown, nor an explanation of why opsin was used in Figure 1A but not 1B. 

      The original Figure 1B represented a few different examples of ectopic eye/eye cell patterns in the map3k1 RNAi animals to demonstrate the variable and disorganized nature of the phenotype. To address this, we added further explanation in the legend. We also merged 1A and 1B for simplicity of interpretation. opsin was used in Figure 1A to label cell bodies of photoreceptors. anti-Arrestin was used in the example FISH images to see if these cells were interconnected via projections, which we now clarify in the legend and in the text. 

      Is Figure 1B missing an image for the respective control? Figure 1C needs details regarding what the two smaller boxes underneath are. 

      The control for Figure 1B was in Figure 1A; the merger of Figures 1A/B should address this. Boxes in Figure 1C were labelled with numbers corresponding to the image above them.

      Figure 1C could use an AP labeling map in 10 days (e.g., AP6 has one optic cup present). Figure 1C and F counts do not match. 

      We added a cartoon to 1C to show the region imaged. Note that the 36d trunk image (now Fig. 1B) has now been replaced with a full animal image and magnified boxes that show locations of example ectopic cells. That cell in 1C was categorized as in AP5. Note that additional animals were analyzed and data added to the distribution (now Fig. 1D). 

      In Figure 1C, we do not know the number of animals tested, controls used, the scale bar sizes in the first two images, nor the degree of magnification used despite the pharynx region appearing magnified in the second image.  Figure 1C is also shown out of chronological order; 36 days post RNAi is shown before 10 days post RNAi. Moreover, the legends for Figures 1C and 1D are swapped.

      We have endeavored to ensure sample numbers, control images, and appropriate scale bar notation in legends are present for all images. Figure 1C has now been split into two panels: Figure 1B and Figure 1C. It does not follow a chronological order in presentation for the following logic flow: we uncover and describe the phenotype; then, with knowledge of the defect, we walk back to see how early the phenotype starts after RNAi and what the pattern of ectopic cell distribution is when the phenotype starts to emerge (using the knowledge of which cells are affected from the overt phenotype described in 1A/B). 

      Additionally, Figure 1F and many other figures throughout the paper lack overall statistical considerations. Furthermore, Figure 1F has three components, but only one is labeled. Labeling each of them individually and describing them in the corresponding figure legend may be more appropriate.

      The main point of the graphs in 1F (now 1D) was the overt overall pattern difference with the wild-type, which never has ectopic eye cells in the midbody or tail, and that the ectopic eye cells progress throughout the entire AP axis. However, we concur that a statistical test is a reasonable thing to show here and that is now included in the legend. The 3 components (in Figure 1F, now Figure 1D) where kept together with one figure label (D) for simplicity, but were rearranged (top and bottom) with a cartoon to the side and with modified labeling for extra clarity. 

      Figure 2C shows images of gene expression for two genes, but the counts are shown for only one in Figure 2D. It is challenging to follow the author's conclusions without apparent reasoning and by only displaying quantitative considerations for one case but not the other. These inconsistencies are also observed in different figures. 

      In Figure 2C, FISH images of cintillo+ and dd_17258+ neurons are shown to display the specificity of this effect to some neurons and not others. Because cintillo+ cells did not expand at all (n=24/24 animals), the counts for them would all be zero values. We only counted data for dd_17258 cells because it was the neuron that expanded compared to the control animals. We have now added a note in the legend explaining this.

      In Figure 2D, 24/24 animals were reported to show the phenotype, but only eight were counted (is there a reason for this?).

      8 animals were used to quantitatively characterize the spread of cells along the AP axis, as it was deemed an adequate sample size to capture the change in distribution of 17258+ cells from being head restricted to being present throughout the body. Through multiple cohorts of animals in replicates, a total of 24/24 examined animals showed this expansion phenotype. Double FISH experiments were additionally carried out using dd_17258 and various PCGs; these data are now included in Figure 5C, and these animals were added to the total counts regarding quantitative analysis of the phenotype in Figure 2D. 

      In Figure 2E, the expression for three genes is shown, with some displaying anterior and posterior regions while others only show the anterior picture. Is there a particular reason for this? 

      The original first panel in Figure 2E showed an example of a non-expanding gland cell type, dd_9223, which is very restricted to the head in both control and map3k1 RNAi animals. Because we did not observe a phenotype for this cell type (no cells in all control and map3k1 RNAi animal tails), we only included tail images of cell types that showed an abnormal phenotype with clear expanded to the posterior (dd_8476 and dd_7131). However, we have now included tail images of dd_9223 cells and added data for dd_9223 to the graph in Figure 2E. 

      Also, in Figure 2F, the counts are shown for only the posterior region of two genes out of the three displayed in Figure 2E. It is unclear why the authors do not show counts for the anterior areas considered in Figure 2E. Furthermore, the legend for Figure 2D is missing, and the legend for 2F is mislabeled as a description for Figure 2D.

      We now include tail images for dd_9223 in Figure 2E to show that there are no ectopic cells in tails. We did not originally include counts of dd_9223 because there was no phenotype observed. dd_7131 and dd_8476 cell types appeared in the posterior of even control animals at a low frequency, unlike dd_9223 cells. However, we did now add counts for dd_9223 tail regions in the graph. We did not count the anterior regions of the animal because our goal was to show data for the visible phenotype (ectopic cells in the tail) not only with an example image, but also by showing the number of cells in the tail with a graph and statistical test. Legends have been updated with correct details.

      Supplement Figure 1 B reports data up to 6 weeks, but no text in the manuscript or supplement mentions any experiment going up to 6 weeks. There are no statistics for data in Supplement Figure 1E. Any significance between groups is unclear.

      More details about the RNAi feeding schedules have been added in the methods section. All RNAi timepoints are now specified specifically in the legends. The Figure 1F and Figure 1- figure supplement 1E (additional data: ovo<sup>+</sup>; smedwi-1<sup>-</sup> cell counts) and legends now mention the statistical tests performed and annotations (not significant *ns) or p values have been added to the graphs. For simplicity, we decided to include all smedwi-1+ counts together rather than splitting them into low and high smedwi-1+ cells, because we weren't really making any claims about low and high cells. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It would be good to acknowledge in the discussion the recent paper from the Petersen lab on map3k1, published in PLoS Genet 2025, especially if the results differ between the two labs.

      We added reference/discussion regarding the recent PLoS Genetics Lo, 2025 map3k1 paper at several suitable points in the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Please pay close attention to the description of experimental details and the consistency throughout the paper. It seems like the reader has to assume or come across information that is not readily available from the text or the legends in the paper. This is an interesting paper with intriguing findings. However, the version presented here appears rushed or put together on the flight.

      Thank you for your thorough feedback. We have endeavored to ensure all appropriate details are present in figures and/or figure legends.

    1. eLife Assessment

      This important study employs a closed-loop, theta-phase-specific optogenetic manipulation of medial septal parvalbumin-expressing neurons in rats and reports that disrupting theta-timescale coordination impairs performance of challenging aspects of spatial behaviors, while sparing hippocampal replay and spatial coding in hippocampal place cells. The findings are expected to advance theoretical understanding of learning and memory operations and to provide practical implications for the application of similar optogenetic approaches. The experiments were viewed as technically rigorous, but the strength of evidence provided in the current version of the manuscript was viewed as incomplete, mostly due to limited analyses and the descriptions of some of the experimental protocols.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Joshi and colleagues demonstrates that the precise theta-phase timing of spikes is causal for CA1 hippocampal theta sequences during locomotion on a linear track and is necessary for learning the cognitively demanding outbound component of a hippocampus-dependent alternation task (W-maze), independently of replay during immobility. To reach these conclusions, the authors developed a theta-phase-specific, closed-loop manipulation that used optogenetic activation of medial septal parvalbumin (PV) interneurons at the ascending phase of theta during locomotion. This protocol preserved immobility periods, allowing a clean and elegant dissociation from SWR-associated replay.

      The manuscript is well written and was a pleasure to read. The work described is of high quality and introduces several notable advances to the field:

      (a) It extends prior studies that manipulated theta oscillations by examining precise temporal structure (specifically theta sequences) rather than only LFP features.

      (b) The closed-loop manipulation enabled dissociation between deficits in theta sequences during a behavioural task and SWR-associated replay activity.

      (c) As controls, the authors included rats with suboptimal viral transduction or optic-fibre placement, and, within subjects, both stimulation-on (stim-on) and stimulation-off (stim-off) trials. Notably, sequence disruption persisted into stim-off periods within the same session.

      Overall, this is a strong manuscript that will provide valuable insights to the field. I have only minor comments:

      (1) As the authors note, it is striking that both behavioural performance and spike patterns are altered during stim-off trials. They propose that "disruption of theta sequences during the initial experience in an environment is sufficient to have lasting effects," implying that rapid, experience-dependent plasticity is driven by sequential firing. Does this imply that if rats were previously trained on the task, subsequent stim-on and stim-off trials would yield different outcomes, with stim-off trials showing improved performance and intact theta sequences? For example, if the sequence of one-third stim-on, one-third stim-off, one-third stim-on were inverted to off-on-off, would theta sequences be expected to emerge, disappear, and potentially re-emerge? While I am not asking for additional experiments, I think the discussion could be extended in this aspect.

      Alternatively, could the number of stim-off trials (one third of the total) be insufficient to support learning/induce plasticity? In the controls, ~50-100 trials appear necessary to achieve high performance.

      (2) In line with the point above, the authors characterise the behavioural changes induced by MS optogenetic stimulation specifically as a "learning deficit," as rats failed to improve across 300 trials in an initially novel environment (W-maze). While they present this as complementary to prior demonstrations of impaired performance on previously learned tasks (Zutshi et al., 2018; Quirk et al., 2021; Etter et al., 2023; Petersen et al., 2020), an alternative interpretation is a working-memory deficit. This would produce the same behavioural pattern, with reference memory (the less cognitively demanding trials) remaining intact despite stimulation and concomitant changes in theta sequences. This interpretation would also be consistent with work in certain disease models, where reduced synaptic plasticity and working-memory deficits co-occur with preserved place coding despite impaired theta sequences (e.g., Viana da Silva et al., 2024; Donahue et al., 2025).

      (3) It was not immediately clear whether SWR-associated activity was derived from the interleaved ~15-min rest sessions in a rest box, or from periods of immobility or reward consumption in the maze (aSWR, as in Jadhav et al 2012). Regardless, it would be informative to compare aSWR events within the maze to rest-box SWRs that may occur during more prolonged slow-wave episodes (even if not full sleep). This contrasts with Liu et al. (2024), who analysed replay during ~1.5-h sleep sessions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this study developed a closed-loop optogenetic stimulation system with high temporal precision in rats to examine the effect of medial septum (MS) stimulation on the disruption of hippocampal activity at both behavioral and compressed time scales. They found that this manipulation preserved hippocampus single-cell-level spatial coding but affected theta sequences and performance during a spatial alternation task. The performance deficits were observed during the more cognitively demanding component of the task and even persisted after the stimulation was turned off. However, the effects of this disruption were confined to locomotor periods and did not impact waking rest replay, even during the early phase of stimulation-on. Their conclusion is consistent with previous findings from the Pastalkova lab, where MS disruption (using different methods) affected theta sequences and task performance but spared replay (Wang et al., 2015; Wang et al., 2016). However, it differs from a recent study in which optogenetic disruption of EC inputs during running affected both theta sequences and replay (Liu et al., 2023).

      Strengths:

      The experiments were well designed and controlled, and the results were generally well presented.

      Weaknesses:

      Major concerns are primarily technical but also conceptual. To further increase the impact of this study by contrasting findings from different disruptions, it is necessary to better align the analysis and detection methods.

      Major concerns:

      (1) To show that MS disruption does not affect spatial tuning, the authors computed the KL divergence of tuning curves between stimulation-on and stimulation-off conditions. I have two main questions about this analysis:

      (1.1) The authors seem to impose stringent inclusion criteria requiring a large number of spikes and a strong concentration of tuning curves. These criteria may have selected strongly spatially tuned cells, which are typically more stable and potentially less vulnerable to perturbations. Based on the Figure 2 caption, it seems that fewer than 10% of cells were included in the KL divergence analysis, which is lower than the usual proportion of place cells reported in the literature. What is the rationale for using such strict inclusion criteria? What happens to the cells that are not as strongly tuned but are still identified as significant place cells?

      (1.2) The KL divergence was computed between stimulation-on and stimulation-off conditions within the same animal group. However, the authors also showed that MS stimulation had lasting effects on theta sequences and performance even during stimulation-off periods. Would that lasting effect also influence spatial tuning? Based on these questions, the authors should perform additional analyses that directly measure spatial tuning quality and compare results across control and experimental groups - for example, spatial information of spikes (Skaggs et al., 1996), tuning stability, field length, and decoding error during running.

      (2) The authors compared their results with those from Liu et al. (2023) and proposed that the different outcomes could be explained by different sites of disruption. However, the detection and quantification methods for theta sequences and replay differ substantially between the two studies, emphasizing different aspects of the phenomenon. I am not suggesting that either method is superior, but providing additional analyses using aligned detection methods would better support the authors' interpretations and benefit the field by enabling clearer comparisons across studies. In the current analysis, the power spectrum of the decoded ahead/behind distance only indicates that there is a rhythmic pattern, without specifying the decoding features at different theta phases. Moreover, the continuous non-local representations during ripples could include stationary representations of a location or zigzag representations that do not exhibit a linear sequential trace. Given that, the authors should show averaged decoding results corrected by the animal's actual position within theta cycles and compute a quadrant ratio. For replay analysis, they could use a linear fit (as in Liu et al., 2023) and report the proportion of significant replay events.

      (3) The finding that theta sequences and performance were impaired even during stimulation-off periods is particularly interesting and warrants deeper exploration. In the Discussion, the authors claim that this may arise from "the rapid plasticity engaged during early learning." However, this explanation does not fully account for the observation. Previous studies have shown that theta sequences can develop very rapidly (Feng et al., Foster lab, 2015; Zhou et al., Dragoi lab, 2025). If the authors hypothesize that rapid plasticity during early stimulation-on disrupts the theta sequence, then the plasticity window must also be short and terminate during the subsequent stimulation-off period. Otherwise, why can't animals redevelop theta sequences during stimulation-off? The authors should conduct additional analyses during the stimulation-off periods of the W-maze task. For example:

      (3.1) What is the spike-theta phase relationship? Do the phases return to normal or remain altered as during stimulation-on?

      (3.2) Is there a significant place-field remapping from stimulation-on to stimulation-off? (Supplementary Figure 3F includes only a small subset of cells; what if population vector correlations are computed across all cells, or Bayesian decoding of stimulation-on spikes is performed using stimulation-off tuning curves?)

      (3.3) The authors should also discuss why the stimulation-off epochs were not sufficient to support learning, and if the stimulation-off place cell sequences could have supported replay.

      (4) Citations and/or discussion of key studies relevant to the current work are missing: Wang et al. in Pastalkova lab 2015-2016 studies for disruption of theta sequence (but not place cell sequence) disrupting learning but not replay, Drieu et al. in Zugaro lab 2018 study on disruption of theta sequence affecting sleep replay, Farooq and Dragoi 2019 for association between a lack of theta sequence and presence of waking rest replay during postnatal development, etc. The authors should discuss what the conceptually new findings in the current study are, given the findings of the previous literature above.

      (5) The assessment of theta sequence is not state-of-the-art:

      (5.1) Detecting the peak of cross-correlograms between neurons (CCG) relates to behavioral timescale CCG, not the theta sequence one; for the theta sequence, the closest to zero local peak should be used instead.

      (5.2) How were other methods of detecting theta sequences performing on the stimulation-on/stimulation-off data: Bayesian decoding, firing sequences?

      (5.3) How was phase precession during stimulation-on/stimulation-off?

      (6) It would be important to calculate additional variables in the replay part of the study to compare the quality of replay across the 2 groups:

      (6.1) Proportion of significant replay events out of the detected multiunit events.

      (6.2) The average extent of trajectory depicted by the significant replay events in the targeted compared to the control, stimulation-on/stimulation-off.

    4. Reviewer #3 (Public review):

      Joshi et al. present an elegant and technically rigorous study examining how the temporal structure of hippocampal spiking during locomotion contributes to spatial learning. Using a closed-loop, theta phase-specific optogenetic manipulation of medial septal parvalbumin-expressing neurons in rats, the authors demonstrate that disrupting theta-timescale coordination impairs performance on the cognitively demanding component outbound trajectory of a spatial alternation task, while sparing hippocampal replay, place coding, and the simpler inbound learning. The work aims to dissociate the role of theta-associated temporal organization during navigation from sharp-wave ripple-associated replay during subsequent rest periods, providing a mechanistic link between theta sequences and learning. The findings have important implications for models of septo-hippocampal coordination and the functional segregation between online (theta) and offline (SWR) network states. That said, there are a few conceptual and methodological issues that need to be addressed.

      One concern is the overall novelty of this work; the dissociation between online temporal sequence and offline replay events following memory deficits has previously been shown by Wang et al., 2016 elife. While the authors discuss Lui et al., 2023, which demonstrates MEC activation of inhibitory neurons at gamma frequencies during locomotion disrupts theta sequences, subsequent replay and learning (line 65-66), they do not reference Wang et al., 2016 who performed a very similar study with MS pharmacological inactivation, and report large decreases in theta power, attenuated theta frequencies together with behavioural deficits but SWR replay persisted. Given strong similarities in the manipulation and findings, this study should be discussed.

      Along the same lines, it should be noted that Brandon et al. (2014, Neuron) demonstrated that hippocampal place codes can still form in novel environments despite MS inactivation and loss of theta, indicating that spatial representations can emerge without intact septal drive. Referencing this study would strengthen the discussion of how temporal coordination, rather than spatial coding per se, underlies the learning deficits observed here.

      The conclusion that disrupting "theta microstructure" impairs learning relies on the assumption that the observed behavioral deficits arise from altered temporal coding from within hippocampal CA1 only. However, optogenetic modulation of medial septal PV neurons influences multiple downstream regions (entorhinal cortex, retrosplenial cortex) via widespread GABAergic projections. While the authors do touch on this, their discussion should expand to include the network-level consequences of entorhinal grid-cell disruption and how this could affect temporal coding both online and offline.

      The finding that replay content, rate, and duration are unchanged is critical to the paper's claim of dissociation. However, the analysis is restricted to immobility on the track. Given evidence for distinct awake vs. sleep replay, confirming that off-track rest and post-session sleep replays are similarly unaffected would confirm the conclusions of the paper. If these data are unavailable, the limitation should be acknowledged explicitly. Moreover, statistical power for detecting subtle differences in replay organization or spatial bias should be added to the supplement (n of events per animal, variability across sessions).

      The exact protocol for optogenetic stimulation is a bit confusing. For the task, the first and final third (66%) of trials were disrupted and were only stimulated when away from the reward well and only when the animal was moving. What proportion of time within "stimulated" trials remained unstimulated? Why were only 66% of trials stimulated?

    5. Author response:

      We thank all reviewers for their overall assessment, thoughtful comments, and suggestions. We are working to address each reviewer’s comment in detail. In this provisional response, we provide clarifications regarding our experimental approach and the novelty of our work, and include additional analyses that we have performed since the submission of the manuscript. We are also happy to report that we have now shared the raw data, intermediate analysis files, and the complete repository to facilitate replication of the analysis and figures.

      Code repo: github.com/LorenFrankLab/ms_stim_analysis

      Data repo: dandiarchive.org/dandiset/001634

      Docker containers (see GitHub repo for use instructions):

      Database: https://hub.docker.com/r/samuelbray32/spyglass-db-ms_stim_analysis

      Python notebooks: https://hub.docker.com/r/samuelbray32/spyglass-hub-ms_stim_analysis

      (1) Novelty and contrast with earlier manipulations:

      We thank the reviewers for suggesting that we explicitly contrast our results with prior pharmacological (Wang et al., 2016; Wang et al., 2015; Koenig et al., 2011; Brandon et al., 2014), systemic (Robbe & Buzsaki 2009; Petersen and Buzsáki 2020), and behavioral (Drieu et al., 2018) manipulations that also assessed some of the physiological features we evaluated. We will add a discussion of these studies, which will help us emphasize both the insights and discrepancies observed using these prior approaches. We will also more clearly explain the the novelty and importance of our specific approach for temporally and physiologically precise manipulation. Specifically, our approach (closed-loop theta-phase stimulation during locomotion) provides a level of physiological specificity that made it possible to dissociate theta-state dynamics from other hippocampal processes. This in turn allowed us to address a question that has remained unresolved across prior studies: Are hippocampal spatial sequences during locomotion (i.e., theta sequences) necessary to learn a novel hippocampal-dependent task?

      (2) Additional analysis on SWRs during rest:

      since submitting the manuscript, we have conducted additional analysis on the rate and length of SWRs in the rest box and found that their rate and length are also indistinguishable between targeted and control animals (effect of manipulation between control and targeted animals; rSWR rate: p=0.45; rSWR length: p=0.94, mixed effect model). We also find evidence for sequential neural representations in the rest box, when the encoding was performed in the behavioral arena. Example trajectories are shown below. These results are consistent with our observations on SWRs rate, length, and content in the behavioral arena. Additionally, we are in the process of evaluating and quantifying the results of decoding the rSWRs and will include those in the next version of the manuscript.

      Author response image 1.

      Sequential replay events observed in the rest box

      (3) Theta sequence measurement in the absence of theta:

      In the next version of the manuscript, we will explicitly explain why our manipulation makes it is more appropriate to measure sequential hippocampal representations during locomotion (i.e., theta sequences) without using theta oscillation or an epoch-averaged relatively large sliding window as a reference. The key insight here is that our manipulation suppresses theta and thus makes it difficult or impossible to accurately identify theta phase. We understand that theta-phase based approaches were used in prior work; however, these prior analyses may have confounded the absence of hippocampal theta sequences during locomotion by the inability to detect theta oscillatory phase reliably. We will show that our method of using clusterless Bayesian decoding in which we estimate the decoded position at every 2ms timestep is indeed able to capture endogenous hippocampal sequences even without imposing any requirements of aligning to theta oscillations, thus providing an unbiased estimate of the rhythmicity of hippocampal spatial representations.

      (4) Additional analysis on place cell stability and tuning:

      We thank the reviewer for this question. For the KL divergence analysis, we have imposed a spike-count criterion (100 spikes for each interval type —stimulation-off, stimulation-on, and the stimulus sub-interval) and a coverage criterion (50% HPD of the units’ spatial firing distribution was contained within 40cm on the linear track and 100cm on the w-track). These criteria were chosen to ensure that spatial tuning curves were sufficiently well sampled and localized to allow reliable estimation of KL divergence, which is particularly sensitive to noise arising from low spike counts or diffuse firing. Based on the reviewer’s suggestion, we have relaxed the unit inclusion criteria for KL divergence by relaxing the criteria for number of spikes and spatial coverage criterion to include more weakly tuned place cells and replicated our results (p=.146). Further, we have also evaluated the stability of place field order between stimulation-on and stimulation-off conditions using more standard methods (as in Wang et. al., 2015; spearman correlation of place field order, control vs targeted, p = .920, t-test). These results are consistent with our observations about place field stability during stimulation-off and stimulation-on conditions (Fig. 2F).

      Author response image 2.

      Spearman correlation of place field order during stimulation-on and stimulation-off conditions.

    1. eLife Assessment

      This is a useful study that investigates the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). The authors generate Dreg1-/- mice and show a reduction of group 2 innate lymphoid cells (ILC2). However, the strength of evidence supporting the impact of Dreg1 on Gata3 expression, a transcription factor required for ILC2 cell fate decisions, and the cell-intrinsic requirement of Dreg1 for ILC2 remain incomplete. This study will be of interest to immunologists.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines the role of the long non-coding RNA Dreg1 in regulating Gata3 expression and ILC2 development. Using Dreg1-deficient mice, the authors show a selective loss of ILC2s but not T or NK cells, suggesting a lineage-specific requirement for Dreg1. By integrating public chromatin and TF-binding datasets, they propose a Tcf1-Dreg1-Gata3 regulatory axis. The topic is relevant for understanding epigenetic regulation of ILC differentiation.

      Strengths:

      (1) Clear in vivo evidence for a lineage-specific role of Dreg1.

      (2) Comprehensive integration of genomic datasets.

      (3) Cross-species comparison linking mouse and human regulatory regions.

      Weaknesses:

      (1) Mechanistic conclusions remain correlative, relying on public data.

      (2) Lack of direct chromatin or transcriptional validation of Tcf1-mediated regulation.

      (3) Human enhancer function is not experimentally confirmed.

      (4) Insufficient methodological detail and limited mechanistic discussion.

    3. Reviewer #2 (Public review):

      The authors investigate the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). Dreg1 is encoded close to the Gata3 locus, a transcription factor implicated in the differentiation of T cells and ILC, and in particular of type 2 immune cells (i.e., Th2 cells and ILC2). The center of the paper is the generation of a Dreg1-deficient mouse. While Dreg1-/- mice did not show any profound ab T or gd T cell, ILC1, ILC3, and NK cell phenotypes, ILC2 frequencies were reduced in various organs tested (small intestine, lung, visceral adipose tissue). In the bone marrow, immature ILC2 or ILC2 progenitors were reduced, whereas a common ILC progenitor was overrepresented, suggesting a differentiation block. Using ATAC-seq, the authors find that the promoter of Dreg1 is open in early lymphoid progenitors, and the acquisition of chromatin accessibility downstream correlates with increased Dreg1 expression in ILC2 progenitors. Examining publicly available Tcf1 CUT&Run data, they find that Tcf1 was specifically bound to the accessible sites of the Dreg1 locus in early innate lymphoid progenitors. Finally, the syntenic region in the human genome contains two non-coding RNA genes with an expression pattern resembling mouse Dreg1.

      The topic of the manuscript is interesting. However, there are various limitations that are summarized below.

      (1) The authors generated a new mouse model. The strategy should be better described, including the genetic background of the initially microinjected material. How many generations was the targeted offspring backcrossed to C57BL/6J?

      (2) The data is obtained from mice in which the Dreg1 gene is deleted in all cells. A cell-intrinsic role of Dreg1 in ILC2 has not been demonstrated. It should be shown that Dreg1 is required in ILC2 and their progenitors.

      (3) The data on how Dreg1 contributes to the differentiation and or maintenance of ILC2 is not addressed at a very definitive level. Does Dreg1 affect Gata3 expression, mRNA stability, or turnover in ILC2? Previous work of the authors indicated that knockdown of Dreg1 does not affect Gata3 expression (PMID: 32970351).

      (4) How Dreg1 exactly affects ILC2 differentiation remains unclear.

    1. eLife Assessment

      This study presents a platform to implement closed-loop experiments in mice based on auditory feedback. The authors provide convincing evidence that their platform enables a variety of closed-loop experiments using neural or movement signals, indicating that it will be a valuable resource to the neuroscience community. The paper could be strengthened by the addition of additional tutorials, such as on how to run an experiment.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a resource to the systems neuroscience community by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively. The revised preprint has improved substantially upon the previous submission.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab, and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex, and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper, they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      Many of the original weaknesses have been addressed in the revised preprint.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, my excitement for these experiments is tempered by the relative incompleteness of the dataset.

      Additionally, adoption of the platform may be hindered by the absence of a tutorial on how to run a session.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, they present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried out using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2021 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears incomplete, and more generally, the paper shows weaknesses regarding several points:

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. Still, they definitely have value as a starting point for laboratories interested in implementing such approaches.

      Throughout the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not really analyzed, while this finding which does not match previous reports (Clancy et al. 2020) would be important to further examine.

    4. Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and the authors call it CLoPy. Authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenarios.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the closed-loop system, mice have shown to better performance, learnt arbitrary tasks and can adapt to changes in the rules as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems design. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary: 

      The authors provide a resource to the systems neuroscience community, by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      There are several weaknesses in the paper that diminish the impact of its strengths. First, the value of the CLoPy platform is not clearly articulated to the systems neuroscience community. Similarly, the resource could be better positioned within the context of the broader open-source neuroscience community. For an example of how to better frame this resource in these contexts, I recommend consulting the pyControl paper. Improving this framing will likely increase the accessibility and interest of this paper to a less technical neuroscience audience, for instance by highlighting the types of experimental questions CLoPy can enable.

      We appreciate the editor’s feedback regarding the clarity of the CLoPy platform's value and its positioning within the broader neuroscience community. We agree and understand the importance of effectively communicating the utility of CLoPy to both the systems neuroscience field and the wider open-source neuroscience community.

      To address this, we have revised the introduction and discussion sections of the manuscript to more clearly articulate the unique contributions of the CLoPy platform. Specifically:

      (1) We have emphasized how CLoPy can address experimental questions in systems neuroscience by highlighting its ability to enable real-time closed-loop experiments, such as investigating neural dynamics during behavior or studying adaptive cortical reorganization after injury. These examples are aimed at demonstrating its practical utility to the neuroscience audience.

      (2) We have positioned CLoPy within the broader open-source neuroscience ecosystem, drawing comparisons to similar resources like pyControl. We describe how CLoPy complements existing tools by focusing on real-time optical feedback and integration with genetically encoded indicators, which are becoming increasingly popular in systems neuroscience. We also emphasize its modularity and ease of adoption in experimental settings with limited resources.

      (3) To make the manuscript more accessible to a less technically inclined audience, we have restructured certain sections to focus on the types of experiments CLoPy enables, rather than the technical details of the implementation.

      We have consulted the pyControl paper, as suggested, and have used it as a reference point to improve the framing of our resource. We believe these changes will increase the accessibility and appeal of the paper to a broader neuroscience audience.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, and an analysis of the movement-feedback experiments, my excitement for these experiments is tempered by the relative incompleteness of the dataset, as well as its description and analysis in the text. For instance, in the neurofeedback experiment, many of these regions only have data from a single mouse, limiting the conclusions that can be drawn. Additionally, there is a lack of reporting of the quantitative results in the text of the document, which is needed to better understand the degree of the results. Finally, the writing of the results section could use some work, as it currently reads more like a methods section.

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate the time and effort you took to review our work and provide detailed suggestions for improvement. Below, we address the key points raised in your review:

      (1) Dataset Completeness: We acknowledge that some of the neurofeedback experiments include data from only a single mouse for some cortical regions while for some cortical regions, there are several animals. This was due to practical constraints during the study, and we understand the limitations this poses for drawing broad conclusions. We felt it was still important to include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future. To address this, we have revised the text to explicitly acknowledge these limitations and clarify that the results for some regions are exploratory in nature. We believe our flexible tool will provide a means for our lab and others include more animals representing additional cortical regions in future studies. Importantly, we have included all raw and processed data as well as code for future analysis.

      (2) Quantitative Results: We recognize the importance of reporting quantitative results in the text for better clarity and interpretation. In response, we have added more detailed description of the quantitative findings from both the neurofeedback and movement-feedback experiments. This will include effect sizes, statistical measures, and key numerical results to provide a clearer understanding of the degree and significance of the observed effects.

      (3) Results Section Writing: We appreciate your observation that parts of the results section read more like a methods section. To improve clarity and focus, we have restructured the results section to present the findings in a more concise and interpretative manner, while moving overly detailed descriptions of experimental procedures to the methods section.

      Suggestions for improved or additional experiments, data or analyses:

      Not necessary for this paper, but it would be interesting to see if the CLNF group could learn without auditory feedback.

      This is a great suggestion and certainly something that could be done in the future.

      There are no quantitative results in the results section. I would add important results to help the reader better interpret the data. For example, in: "Our results indicated that both training paradigms were able to lead mice to obtain a significantly larger number of rewards over time," You could show a number, with an appropriate comparison or statistical test, to demonstrate that learning was observed.

      Thank you for pointing this out. We have mentioned quantification values in the results now, along with being mentioned in the figure legends, and we are quoting it in following sentences. “A ΔF/F0 threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, N=23, n=60 and CLNF Rule-change, N=17, n=60) were able to discover the task rule and perform above 80% over ten days of training (Figure 4A, RM ANOVA p=2.83e-5), and Rule-change mice even learned a change in ROIs or rule reversal (Figure 4A, RM ANOVA p=8.3e-10, Table 5 for different rule changes). There were no significant differences between male and female mice (Supplementary Figure 3A).”

      For: "Performing this analysis indicated that the Raspberry Pi system could provide reliable graded feedback within ~63 {plus minus} 15 ms for CLNF experiments." The LED test shows the sending of the signal, but the actual delay for the audio generation might be longer. This is also longer than the 50 ms mentioned in the abstract.

      We appreciate the reviewer’s insightful comment. The latency reported (~63ms) was measured using the LED test, which captures the time from signal detection to output triggering on the Raspberry Pi GPIO. We agree that the total delay for auditory feedback generation could include an additional latency component related to the digital-to-analog conversion and speaker response. In our setup, we employ a fast Audiostream library written in C to generate the audio signal and expect the delay contribution to be negligible compared to the GPIO latency. Though we did not do this, it can be confirmed by an oscilloscope-based pilot measurement (for additional delay calculation). We have updated the manuscript to clarify that the 63 ± 15 ms value reflects the GPIO-triggered output latency, and we have revised the abstract to accurately state the delay as “~63 ms” rather than 50 ms. This ensures consistency and avoids underestimation of the latency. We have corrected the LED latency for CLNF and CLMF experiments in the abstract as well.

      It could be helpful to visualize an individual trial for each experiment type, for instance how the audio frequency changes as movement speed / calcium activity changes.

      We have added Supplementary Figure 8 that contains this data where you can see the target cortical activity trace, target paw speed, rewards, along with the audio frequency generated.

      The sample sizes are small (n=1) for a few groups. I am excited by the variety of regions recorded, so it could be beneficial for the authors to collect a few more animals to beef up the sample sizes.

      We've acknowledged that some of the sample sizes are small. Importantly, we have included raw and processed data as well as code for future analysis. We felt it was still important to still include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future.

      I am curious as to why 60 trials sessions were used. Was it mostly for the convenience of a 30 min session, or were the animals getting satiated? If the former, would learning have occurred more rapidly with longer sessions?

      This is a great observation and the answer is it was mostly due to logistical reasons. We tried to not keep animals headfixed for more than 45 minutes in each session as they become less engaged with long duration headfixed sessions. After headfixing them, it takes about 15 minutes to get the experiment going and therefore 30 - 40 minutes long recorded sessions seemed appropriate before they stop being engaged or before they get satiated in the task. We provided supplemental water after the sessions and we observed that they consumed water after the sessions so they were not fully satiated during the sessions even when they performed well in the task and got maximum rewards. We also had inter-trial rest periods of 10s that elongated the session duration. We think it would be interesting to explore the relationship between session duration(number of trials) and task learning progression over the days in a separate study.

      Figure 4E is interesting, it seems like the changes in the distribution of deltaF was in both positive and negative directions, instead of just positive. I'd be curious as to the author's thoughts as to why this is the case. Relatedly, I don't see Figure 4E, and a few other subplots, mentioned in the text. As a general comment, I would address each subplot in the text.

      We have split Figure 4 into two to keep the figures more readable. Previous Figure 4E-H are now Figure 5A-D in the revised manuscript. The online real-time CLNF sessions were using a moving window average to calculate ΔF/F<sub>0</sub>  and the figures were generated by averaging the whole recorded sessions. We have added text in Methods under “Online ΔF/F<sub>0</sub>calculation” and “Offline ΔF/F<sub>0</sub> calculation” sections making it clear about how we do our ΔF/F<sub>0</sub> normalization based on average fluorescence over the entire session. Using this method of normalization does increase the baseline so that some peaks appear to be below zero. Additionally, it is unclear what strategy animals are employing to achieve the rule specific target activity. The task did not constrain them to have a specific strategy for cortical activation - they were rewarded as long as they crossed the threshold in target ROI(s). For example, in 2-ROI experiments, to increase ROI1-ROI2 target activity, they could increase activity of ROI1 relative to ROI2 or decreased activity of ROI1 relative to ROI1 - both would have led to a reward as long as the result crossed the threshold.

      We have now addressed and added reference to the figures in the text in Results under “Mice can explore and learn an arbitrary task, rule, and target conditions” and “Mice can rapidly adapt to changes in the task rule” sections - thanks for pointing this out.

      For: "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time," I would provide a visual summary showing the learning curves for the different types of regions.

      We have rewritten this section to emphasize that these conclusions were based on pooled data from multiple regions of interest. The sample sizes for each type of region are different and some are missing. We believe it would be incomplete and not comparable to present this as a regular analysis since the sample sizes were not balanced. We would be happy to dive deeper into this and point to the raw and processed dataset if anyone would like to explore this further by GitHub or other queries.

      Relatedly, I would further explain the fast vs slow learners, and if they mapped onto certain regions.

      Mice were categorized into fast or slow learners based on the slope of learning over days (reward progression over the days) as shown in Supplementary Figure 3C,D. Our initial aim was not to probe cortical regions that led to fast vs slow learning but this was a grouping we did afterwards. Based on the analysis we did, the fast learners included the sensory (V1), somatosensory (BC, HL), and motor (M1, M2) areas, while the slow learners included the motor (M1, M2), and higher order (TR, RL) cortical areas. Testing all dorsal cortical areas would be prudent to establish their role in fast or slow learning and it is an interesting future direction.

      Also I would make the labels for these plots (e.g. Supp Fig3) more intuitive, versus the acronyms currently used.

      We have made more expressive labels and explained the acronyms below the Supplementary Figure 3.

      The CLMF animals showed a decrease in latency across learning, what about the CLNF animals? There is currently no mention in the text or figures.

      We have now incorporated the CLNF task latency data into both the Results text and Figure 4C. Briefly, task latency decreased as performance improved, increased following a rule change, and then decreased again as the animals relearned the task. The previous Figure 4C has been updated to Figure 4D, and the former Figure 4D has been moved to Supplementary Figure 4E.

      Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, the present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2020 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears uncomplete, and more generally the paper tends to offer overstatements regarding several points:

      In contrast to the introductory statements of the article, closed-loop physiology in rodents is a well-established research topic. Beyond auditory feedback, this includes optogenetic feedback (O'Connor et al. 2013, Abbasi et al. 2018, 2023), electrical feedback in hippocampus (Girardeau et al. 2009), and much more.

      We have included and referenced these papers in our introduction section (quoted below) and rephrased the part where our previous text indicated there are fewer studies involving closed-loop physiology.

      “Some related studies have demonstrated the feasibility of closed-loop feedback in rodents, including hippocampal electrical feedback to disrupt memory consolidation (Girardeau et al.2009), optogenetic perturbations of somatosensory circuits during behavior (O'Connor et al.2013), and more recent advances employing targeted optogenetic interventions to guide behavior (Abbasi et al. 2023).”

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. In particular, the closed-loop latency that they achieve (>60 ms) may be perceived by the mice. This is in contrast with other available closed-loop setups.

      We thank the reviewer for this thoughtful comment and fully agree that our closed-loop latency is larger than that achieved in some other contemporary setups. Our primary aim in presenting this work, however, is not to compete with the lowest possible latencies, but to provide an open-source, accessible, and flexible platform that can be readily adopted by a broad range of laboratories. By building on widely available and lower-cost components, our design lowers the barrier of entry for groups that wish to implement closed-loop imaging and behavioral experiments, while still achieving latencies well within the range that can support many biologically meaningful applications.

      For example, our latency (~60 ms) remains compatible with experimental paradigms such as:

      Motor learning and skill acquisition, where sensorimotor feedback on the scale of tens to hundreds of milliseconds is sufficient to modulate performance.

      Operant conditioning and reward-based learning, in which reinforcement timing windows are typically broader and not critically dependent on sub-20 ms latencies.

      Cortical state dependent modulation, where feedback linked to slower fluctuations in brain activity (hundreds of milliseconds to seconds) can provide valuable insight.

      Studies of perception and decision-making, in which stimulus response associations often unfold on behavioral timescales longer than tens of milliseconds.

      We believe that emphasizing openness, affordability, and flexibility will encourage widespread adoption and adaptation of our setup across laboratories with different research foci. In this way, our contribution complements rather than competes with ultra-low-latency closed-loop systems, providing a practical option for diverse experimental needs.

      Through the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      We fully agree that such a control would provide valuable insight into the contribution of feedback to learning in the CLNF paradigm. In designing our initial experiments, we envisioned multiple potential control conditions, including No-feedback and Random-feedback. However, our first and primary objective was to establish whether mice could indeed learn to modulate cortical ROI activation through auditory feedback, and to further investigate this across multiple cortical regions. For this reason, we focused on implementing the CLNF paradigm directly, without the inclusion of these additional control groups. To broaden the applicability of the system, we subsequently adapted the platform to the CLMF experiments, where we did incorporate a No-feedback group. These results, as the reviewer notes, strengthen the evidence for the role of feedback in shaping task performance. We agree that the inclusion of a No-feedback control group in the CLNF paradigm will be crucial in future studies to further dissect the specific contribution of feedback to cortical conditioning.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not clearly analyzed (for instance looking at the timing of the calcium signal modulation across the two ROIs. It seems that overall ROIs1 and 2 covariate, in contrast to Clancy et al. 2020. How can this be explained?

      We agree that the possibility of increased performance being driven by modulation of a single ROI is an important consideration. Our study indeed began with 1-ROI closed-loop experiments. In those early experiments, while we did observe animals improving performance across days, we realized that daily variability in ongoing cortical GCaMP activity could lead to fluctuations in threshold-crossing events. The 2-ROI design was subsequently introduced to reduce this variability, as the target activity was defined as the relative activity between the two ROIs (e.g., ROI1 – ROI2). This approach offered a more stable signal by normalizing ongoing fluctuations. In our analysis of the early 2-ROI experiments, we observed that animals adopted diverging strategies to achieve threshold crossings. Specifically, some animals increased activity in ROI1 relative to ROI2, while others decreased activity in ROI2 to accomplish the same effect. Once discovered, each animal consistently adhered to its chosen strategy throughout subsequent training sessions. This was an early and intriguing observation, but as the experiments were not originally designed to systematically test this effect, we limited our presentation to the analysis of a small number of animals (shown in Figure 11). We have added details about this observation in our Results section as well, quoted below-

      “In the 2-ROI experiment where the task rule required “ROI1 - ROI2” activity to cross a threshold for reward delivery, mice displayed divergent strategies. Some animals predominantly increased ROI1 activity, whereas others reduced ROI2 activity, both approaches leading to successful threshold crossing (Figure 11)”.

      We hope this clarifies how the use of two ROIs helps explain the apparent covariation of the signals, and why some divergence from the observations of Clancy et al. (2020) may be expected.

      Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and authors call it CLoPy. The authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenario.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the close loop system mice have shown better performance, learnt arbitrary task and can adapt to change in the rule as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems designed. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

      We appreciate the reviewer’s comment and agree that latency is an important factor in our setup. The latency arises partly from the inherent slow kinetics of calcium signaling and GCaMP6s, and partly from the imaging rate of 15 FPS (every 66 ms). These limitations can be addressed in several ways: for example, using faster calcium indicators such as GCaMP8f, or adapting the system to electrophysiological signals, which would require additional processing capacity. In our implementation, image acquisition was fixed at 15 FPS to enable real-time frame processing (256 × 256 resolution) on Raspberry Pi 4B devices. With newer hardware, such as the Raspberry Pi 5, substantially higher acquisition and processing rates are feasible (although we have not yet benchmarked this extensively). More powerful platforms such as Nvidia Jetson or conventional PCs would further support much faster data acquisition and processing.

      Major comments:

      (1) Page 5 paragraph 1: "We tested our CLNF system on Raspberry Pi for its compactness, general-purpose input/output (GPIO) programmability, and wide community support, while the CLMF system was tested on an Nvidia Jetson GPU device." Can these programs and hardware be integrated with windows-based system and a microcontroller (Arduino/ Tency). As for the broad adaptability that's what a lot of labs would already have (please comment/discuss)?

      While we tested our CLNF system on a Raspberry Pi (chosen for its compactness, GPIO programmability, and large user community) and our CLMF system on an Nvidia Jetson GPU device (to leverage real-time GPU-based inference), the underlying software is fully written in Python. This design choice makes the system broadly adaptable: it can be run on any device capable of executing Python scripts, including Windows-based PCs, Linux machines, and macOS systems. For hardware integration, we have confirmed that the framework works seamlessly with microcontrollers such as Arduino or Teensy, requiring only minor modifications to the main script to enable sending and receiving of GPIO signals through those boards. In fact, we are already using the same system in an in-house project on a Linux-based PC where an Arduino is connected to the computer to provide GPIO functionality. Furthermore, the system is not limited to Raspberry Pi or Arduino boards; it can be interfaced with any GPIO-capable devices, including those from Adafruit and other microcontroller platforms, depending on what is readily available in individual labs. Since many neuroscience and engineering laboratories already possess such hardware, we believe this design ensures broad accessibility and ease of integration across diverse experimental setups.

      (2) Hardware Constraints: The reliance on Raspberry Pi and Nvidia Jetson (is expensive) for real-time processing could introduce latency issues (~63 ms for CLNF and ~67 ms for CLMF). This latency might limit precision for faster or more complex behaviors, which authors should discuss in the discussion section.

      In our system, we measured latencies of approximately ~63 ms for CLNF and ~67 ms for CLMF. While such latencies indeed limit applications requiring millisecond precision, such as fast whisker movements, saccades, or fine-reaching kinematics, we emphasize that many relevant behaviors, including postural adjustments, limb movements, locomotion, and sustained cortical state changes, occur on timescales that are well within the capture range of our system. Thus, our platform is appropriate for a range of mesoscale behavioral studies that probably needs to be discussed more. It is also important to note that these latencies are not solely dictated by hardware constraints. A significant component arises from the inherent biological dynamics of the calcium indicator (GCaMP6s) and calcium signaling itself, which introduce slower temporal kinetics independent of processing delays. Newer variants, such as GCaMP8f, offer faster response times and could further reduce effective biological latency in future implementations.

      With respect to hardware, we acknowledge that Raspberry Pi provides a low-cost solution but contributes to modest computational delays, while Nvidia Jetson offers faster inference at higher cost. Our choice reflects a balance between accessibility, cost-effectiveness, and performance, making the system deployable in many laboratories. Importantly, the modular and open-source design means the pipeline can readily be adapted to higher-performance GPUs or integrated with electrophysiological recordings, which provide higher temporal resolution. Finally, we agree with the reviewer that the issue of latency highlights deeper and interesting questions regarding the temporal requirements of behavior classification. Specifically, how much data (in time) is required to reliably identify a behavior, and what is the minimum feedback delay necessary to alter neural or behavioral trajectories? These are critical questions for the design of future closed-loop systems and ones that our work helps frame.

      We have added a slightly modified version of our response above in the discussion section under “Experimental applications and implications”.

      (3) Neurofeedback Specificity: The task focuses on mesoscale imaging and ignores finer spatiotemporal details. Sub-second events might be significant in more nuanced behaviors. Can this be discussed in the discussion section?

      This is a great point  and we have added the following to the discussion section. “In the case of CLNF we have focused on regional cortical GCAMP signals that are relatively slow in kinetics. While such changes are well suited for transcranial mesoscale imaging assessment, it is possible that cellular 2-photon imaging (Yu et al. 2021) or preparations that employ cleared crystal skulls (Kim et al. 2016) could resolve more localized and higher frequency kinetic signatures.”

      (4) The activity over 6s is being averaged to determine if the threshold is being crossed before the reward is delivered. This is a rather long duration of time during which the mice may be exhibiting stereotyped behaviors that may result in the changes in DFF that are being observed. It would be interesting for the authors to compare (if data is available) the behavior of the mice in trials where they successfully crossed the threshold for reward delivery and in those trials where the threshold was not breached. How is this different from spontaneous behavior and behaviors exhibited when they are performing the test with CLNF? 

      We would like to emphasize that we are not directly averaging activity over 6 s to compare against the reward threshold. Instead, the preceding 6 s of activity is used solely to compute a dynamic baseline for ΔF/F<sub>0</sub> ( ΔF/F<sub>0</sub> = (F –F<sub>0</sub> )/F<sub>0</sub>). Here, F<sub>0</sub>is calculated as the mean fluorescence intensity over the prior 6 s window and is updated continuously throughout the session. This baseline is then subtracted from the instantaneous fluorescence signal to detect relative changes in activity. The reward threshold is therefore evaluated against these baseline-corrected ΔF/F<sub>0</sub> values at the current time point, not against an average over 6 s. This moving-window baseline correction is a standard approach in calcium imaging analyses, as it helps control for slow drifts in signal intensity, bleaching effects, or ongoing fluctuations unrelated to the behavior of interest. Thus, the 6-s window is not introducing a temporal lag in reward assignment but is instead providing a reference to detect rapid increases in cortical activity.  We have added the term dynamic baseline to the Methods to clarify.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      Additional suggestions for improved or additional experiments, data or analyses.

      For: "Looking closely at their reward rate on day 5 (day of rule change), they had a higher reward rate in the second half of the session as compared to the first half, indicating they were adapting to the rule change within one session." It would be helpful to see this data, and would be good to see within-session learning on the rule change day

      Thank you for pointing this out. We had missed referencing the figure in the text, and have now added a citation to Supplementary Figure 4A, which shows the cumulative rewards for each day of training. As seen in the plot for day 5, the cumulative rewards are comparable to those on day 1, with most rewards occurring during the second half of the session.

      For: "These results suggest that motor learning led to less cortical activation across multiple regions, which may reflect more efficient processing of movement-related activity," it could also be the case that the behaviour became more stereotyped over learning, which would lead to more concentrated, correlated activity. To test this, it would be good to look at the limb variability across sessions. Similarly, if it is movement-related, there should be good decoding of limb kinematics.

      Indeed, we observed that behavior became more stereotyped over the course of learning, as shown in Supplementary Figure 4C, 4D. One plausible explanation for the reduction in cortical activation across multiple regions is that behavior itself became more stereotyped, a possibility we have explored in the manuscript. Specifically, forelimb movements during the trial became increasingly correlated as mice improved on the task, particularly in the groups that received auditory feedback (Rule-change and No-rule-change groups; Figure 8). As movements became more correlated, overall body movements during trials decreased and aligned more closely with the task rule (Figure 9D). This suggests that reduced cortical activity may in part reflect changes in behavior. Importantly, however, in the Rule-change group, we observed that on the day of the rule switch (day 5), when the target shifted from the left to the right forelimb, cortical activity increased bilaterally (Figure 9A–C). This finding highlights our central point: groups that received feedback (Rule-change and No-rule-change) were able to identify the task rule more effectively, and both their behavior and cortical activity became more specifically aligned with the rule compared to the No-feedback group. We agree with the reviewers that additional analyses along these lines would be valuable future directions. To facilitate this, we have included the movement data for readers who may wish to pursue further analyses, details can be found under “Data and code availability” in Methods section. However, given the limited sample sizes in our dataset and the need to keep the manuscript focused on the central message, we felt that including these additional analyses here would risk obscuring the main findings.

      For: "We believe the decrease in ΔF/F0peak is unlikely to be driven by changes in movement, as movement amplitudes did not decrease significantly during these periods (Figure 7D CLMF Rule-change)." I would formally compare the two conditions. This is an important control. Also, another way to see if the change in deltaF is related to movement would be to see if you can predict movement from the deltaF.

      Figure 7D in the previous version is Figure 9D in the current revision of the manuscript. We've assessed this for the examples shown based on graphing the movement data, unfortunately there is not enough of that data to do a group analysis of movement magnitude. We would suggest that this would be an excellent future direction that would take advantage of the flexible open source nature of our tool.

      Recommendations for improving the writing and presentation.

      In the abstract there is no mention of the rationale for the project, or the resulting significance. I would modify this to increase readership by the behavioral neuroscience community. Similarly, the introduction also doesn't highlight the value of this resource for the field. Again, I think the pyControl paper does a good job of this. For readability, I would add more subheadings earlier in the results, to separate the different technical aspects of the system.

      We have revised the introduction to include the rationale for the project, its potential implications, and its relevance for translational research. We have also framed the work within the broader context of the behavioral and systems neuroscience community. We greatly appreciate this suggestion, as we believe it enhances the clarity and accessibility of the manuscript for the community.

      For: "While brain activity can be controlled through feedback, other variables such as movements have been less studied, in part because their analysis in real time is more challenging." I would highlight research that has studied the control of behavior through feedback, such as the Mathis paper where mice learn to pull a joystick to a virtual box, and adapt this motion to a force perturbation.

      We have added a citation to the Mathis paper and describe this as an additional form of feedback. The text is quoted below:

      “Opportunities also exist in extending real time pose classification (Forys et al. 2020; Kane et al. 2020) and movement perturbation (Mathis et al. 2017) to shape aspects of an animal’s motor repertoire.”

      Some of the results content would be better suited for the methods, one example: "A previous version of the CLNF system was found to have non-linear audio generation above 10 kHz, partly due to problems in the audio generation library and partly due to the consumer-grade speaker hardware we were employing. This was fixed by switching to the Audiostream (https://github.com/kivy/audiostream) library for audio generation and testing the speakers to make sure they could output the commanded frequencies"

      This is now moved to the Methods section.

      For: "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19), supporting the idea that neural efficiency could improve with learning," not sure I agree with this, the studies on cortical plasticity are usually to show a neural basis for the learning observed, efficiency is separate from this.

      We have modified this statement to remove the concept of efficiency "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19).”

      The paragraph that opens "Distinct task- and reward-related cortical dynamics" that describes the experiment should appear in the previous section, as the data is introduced there.

      We have moved the mentioned paragraphs in the previous section where we presented the data and other experiment details. This makes the text more readable and contextual.

      I would present the different ROI rules with better descriptors and visualization to improve the readability.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments.

      Minor corrections to the text and figures.

      Figure 1 is a little crowded, combining the CLNF and CLMF experiments, I would turn this into a 2 panel figure, one for each, similar to how you did figure 2.

      We have revised Figure 1 to include two panels, one for CLNF and one for CLMF. The colored components indicate elements specific to each setup, while the uncolored components represent elements shared between CLNF and CLMF. Relevant text in the manuscript is updated to refer to these figures.

      For Figure 2, the organization of the CLMF section is not intuitive for the reader. I would reorder it so it has a similar flow as the CLNF experiment.

      We have revised the figure by updating the layout of panel B (CLMF) to align with panel A (CLNF), thereby creating a more intuitive and consistent flow between the panels. We appreciate this helpful suggestion, which we believe has substantially improved the clarity of the figure. The corresponding text in the manuscript has also been updated to reflect these changes.

      For Figure 3, highlight that C and E are examples. They also seem a little out of place, so they could even be removed.

      We have now explicitly labeled Figures 3C and 3E as representative examples (figure legend and on figure itself). We believe including these panels provides helpful context for readers: Figure 3C illustrates how the ROIs align on the dorsal cortical brain map with segmented cortical regions, while Figure 3E shows example paw trajectories in three dimensions, allowing visualization of the movement patterns observed during the trials.

      In the plots, I would add sample sizes, for instance, in CLNF learning curve in Figure 4A, how many animals are in each group? 

      We have labeled Figure 4 with number of animals used in CLNF (No-rule-change, N=23; Rule-change, N=17), and CLMF (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4).

      Also, Figure 7 for example, which figures are single-sessions, versus across animals? For Figure 7c, what time bin is the data taken from?

      We have clarified this now and mentioned it in all the figures. Figure 7 in the previous version is Figure 9 in the current updated manuscript. Figure 9A is from individual sessions on different days from the same mouse. Figure 9B is the group average reward centered ΔF/F<sub>0</sub> activity in different cortical regions (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4). Figure 9C shows average ΔF/F<sub>0</sub> peak values obtained within -1sec to +1sec centered around the reward point (N=8).

      It says "punish" in Figure 3, but there is no punishment?

      Yes, the task did not involve punishment. Each trial resulted in either a success, which is followed by a reward, or a failure, which is followed by a buzzer sound. To better reflect these outcomes, we have updated Figure 3 and replaced the labels “Reward” with “Success” and “Punish” with “Failure.”

      The regression on 5c doesn't look quite right, also this panel is not mentioned in the text.

      The figure referred to by the reviewer as Figure 5 is now presented as Figure 6 in the revised manuscript. Regarding the reviewer’s observation about the regression line in the left panel of Figure 5C, the apparent misalignment arises because the majority of the data points are densely clustered at the center of the scatter plot, where they overlap substantially. The regression line accurately reflects this concentration of overlapping data. To improve clarity, we have updated the figure and ensured that it is now appropriately referenced in the Results section.

      Reviewer #2 (Recommendations for the authors):

      (1) There would be many interesting observations and links between the peripheral and cortical studies if there was a body video available during the cortical study. Is there any such data available?

      We agree that a detailed analysis of behavior during the CLNF task would be necessary to explore any behavior correlates with success in the task. Unfortunately, we do not have a sufficient video of the whole body to perform such an analysis.

      (2) The text (p. 24) states: [intracortical GCAMP transients measured over days became more stereotyped in kinetics and were more correlated (to each other) as the task performance increased over the sessions (Figure 7E).] But I cannot find this quantification in the figures or text?

      Figure 7 in the previous version of the manuscript now appears as Figure 9. In this figure, we present cortical activity across selected regions during trials, and in Figure 9E we highlight that this activity becomes more correlated. Since we did not formally quantify variability, we have removed the previous claim that the activity became stereotyped and revised the text in the updated manuscript accordingly.

      Typos:

      10-serest c (page 13)

      Inverted color codes in figure 4E vs F

      Reviewer #3 (Recommendations for the authors):

      We have mostly attempted to limit the feedback to suggestions and posed a few questions that might be interesting to explore given the dataset the authors have collected.

      Comments:

      In close loop systems the latency is primary concern, and authors have successfully tested the latency of the system (Delay): from detection of an event to the reaction time was less than 67ms.

      We have commented on the issues and limitations caused by latency, and potential future directions to overcome these challenges in responses to some of the previous comments.

      Additional major comments:

      "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time (Figure 4A, Animation 1)." Fig 4A is merely showing change in task performance over time and does not have information regarding the changes observed specific to CLNF for each ROI.

      We acknowledge that the sample size for individual ROI rules was not sufficient for meaningful comparisons. To address this limitation, we pooled the data across all the rules tested. The manuscript includes a detailed list of the rules along with their corresponding sample sizes for transparency.

      A ΔF/F<sub>0</sub> threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, n=28 and CLNF Rule-change, n=13). It is unclear what the replicates here are. Trials or mice? The corresponding Figure legend has a much smaller n value.

      Thank you for pointing this out. We realized that we had not indicated the sample replicates in the figure, and the use of n instead of N for the number of animals may have been misleading. We have now corrected the notation and clarified this information in the figure to resolve the discrepancy.

      What were the replicates for each ROI pairs evaluated?

      Each ROI rule and number of mice and trials are listed in Table 5 and Table 6.

      Our analysis revealed that certain ROI rules (see description in methods) lead to a greater increase in success rate over time than others (Supplementary Figure 3D). The Supplementary figures 3C and 3D are blurry and could use higher resolution images. 

      We have increased the font size of the text that was previously difficult to read and re-exported the figure at a higher resolution (300 DPI). We believe these changes will resolve the issue.

      Also, It will help the reader is a visual representation of the ROI pairs are provided, instead of the text view. One interesting question is whether there are anatomical biases to fast vs slow learning pairs (Directionality - anterior/posterior, distance between the selected ROIs etc). This could be interesting to tease apart.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments. While a detailed investigation of the anatomical basis of fast versus slow learning cortical ROIs is beyond the scope of the present study, we agree that this represents an exciting future direction for further research.

      How distant should the ROIs be to achieve increased task performance?

      We appreciate this insightful question. We did not specifically test this scenario. In our study, we selected 0.3 × 0.3 mm ROIs centered on the standard AIBS mouse brain atlas (CCF). At this resolution, ROIs do not overlap, regardless of their placement in a two-ROI experiment. Furthermore, because our threshold calculations are based on baseline recordings, we expect the system would function for any combination of ROI placements. Nonetheless, exploring this systematically would be an interesting avenue for future experiments.

      Figures:

      I would leave out some of the methodological details such as the protocol for water restriction (Fig. 3) out of the legend. This will help with readability.

      We have removed some of the methodological details, including those mentioned above, from the legend of Figure 3 in the updated manuscript.

      Fig 1 and Fig 2: In my opinion, It would be easier for the reader if the current Fig. 2, which provides a high level description of CLNF and CLBF is presented as Fig. 1. The current Fig. 1, goes into a lot of methodological implementation details, and also includes a lot of programming jargon that is being introduced early in the paper that is hard to digest early on in the paper's narrative.

      Thank you for the suggestion. In the new manuscript, Figure 1 and Figure 2 have been swapped.

      Higher-resolution images/ plots are needed in many instances. Unsure if this is the pdf compression done by the manuscript portal that is causing this.

      All figures were prepared in vector graphics format using the open-source software Inkscape. For this manuscript, we exported the images at 300 DPI, which is generally sufficient for publication-quality documents. The submission portal may apply additional processing, which could have resulted in a reduction in image quality. We will carefully review the final submission files and ensure that all figures are clear and of high quality.

      The authors repeatedly show ROI specific analysis M1_L, F1_R etc. It will be helpful to provide a key, even if redundant in all figures to help the reader.

      We have now included keys to all such abbreviations in all the figures.

      There are also instances of editorialization and interpretation e.g., "Surprisingly, the "Rule-change" mice were able to discover the change in rule and started performing above 70% within a day of the rule change, on day 6" that would be more appropriate in the main body of the paper.

      Thank you for pointing this out in the figure legend, and we have removed it now since we already discussed this in the Results.

      Minor comments

      (1) The description of Figure 1 is hard to follow and can be described better based on how the information is processed and executed in the system from source to processing and back. Using separated colors (instead of shaded of grey) for the neuro feedback and movement feedback would help as well. Common components could have a different color. The specification like the description of the config file should come later.

      Figure 1 in the previous version is Figure 2 in the updated version. We have taken suggestions from other reviewers and made the figure easier to understand and split it into two panels with color coding Green for CLNF, Pink for CLMF specific parts while common shared parts are left without any color.

      (2) Page 20 last paragraph:

      Authors are neglecting that the rule change is done one day prior and the results that you see in the second half on the 6th day are not just because of the first half of the 6th day instead combined training on the 5th day (rule change) and then the first half of the 6th day. Rephrasing this observation is essential.

      We have revised the text for clarity to indicate that the performance increase observed on day 6 is not necessarily attributable to training on that day. In fact, we noted and mentioned that mice began to perform the task better during the second half of the session on day 5 itself.

      (3)  The method section description of the CLMF setup (Page no 39 first paragraph) is more detailed, a diagram of this setup would make it easy to follow and a better read.

      We have made changes to the CLMF setup (Figure 1B) and CLMF schematic (Figure 2B) to make it easier to understand parts of the setup and flow of control.

    1. eLife Assessment

      This is a valuable study that integrates behavioral and molecular approaches to identify neuromodulators influencing blood-feeding behavior in the disease vector Anopheles stephensi. Through gene expression analyses across blood-seeking life stages and RNA interference experiments, the authors present solid evidence that co-knockdown of the neuromodulators short Neuropeptide F and RYamide affects blood-seeking states in A. stephensi. However, evidence demonstrating that these neuropeptides are sufficient to promote host-seeking is lacking.

    2. Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of off-target effects, are not adequately discussed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways. 

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neurotranscriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model. 

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria. 

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF. 

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in Author response Image 1 below. 

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.  

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following: 

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligandpromiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns. 

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.  4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (4) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (5) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impactful. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      (a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.  

      (b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.  

      (c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformatting here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides). 

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-hostseeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.  

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios. 

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability. 

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 202ti). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa-mediated control of blood feeding in An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.  

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected. 

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. eLife Assessment

      This paper provides a useful new theory of the hallucinatory effects of 5-HT2A psychedelics. The authors present convincing evidence that a computational model trained with the Wake-Sleep algorithm can reproduce some features of hallucinations by varying the strength of top-down connections in the model, though it is not clear that this model applies to 5-HT2A hallucinogens in particular. The work will be of interest to researchers studying hallucinations or offline activity and plasticity more broadly.

    2. Reviewer #1 (Public review):

      Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that challenge certain mainstream ideas in psychedelic neuroscience.

      While some of my concerns have been addressed in revision, I am still not convinced that this model applies to 5-HT2A hallucinogens, as opposed to a pharmacologically distinct hallucinogen. I think it is important to justify which class of hallucinogens this model applies to and distinguish it from other hallucinogens. While some researchers tend to group several hallucinogens together (e.g., 5-HT2A agonists, NMDA antagonists, kappa-opioids agonists), I'm not convinced this is warranted, when they have distinct subjective and cognitive effects (including quite different visual distortions, and again I point out that the kappa-opioid agonist salvinorin A, which is referred to as an "oneirogen," has been described as particularly dream-like, perhaps more so than 5-HT2A hallucinogens), as well as some differences in therapeutic outcomes (ketamine seems to not have as persisting of therapeutic effects, and kappa-opioid agonist have yet to be shown to be therapeutic). Their use patterns highlight this (e.g., 5-HT2A drugs are used less in non-festival/rave social settings compared to NMDA drugs like ketamine, which can be used frequently enough to the point of abuse; kappa-opioid agonists have quite mixed effects in terms of pleasurable outcomes, thereby rarely being used/abused and almost never to my knowledge being used recreationally).

      In sum, more is needed to justify the claim that this work applies to 5-HT2A drugs in particular.

    3. Reviewer #2 (Public review):

      This work is a nice contribution to the literature in articulating a specific, testable theory of how psychedelics act to generate hallucinations and plasticity.

      I believe my concerns from the first round of review have been addressed in this version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      First, we thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article has been considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we have made to the text.

      Common Concerns (R1 & R2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in our previous text–we have added them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay. 

      Relevant modifications: Page 4, 1st paragraph; Page 11, 1st paragraph.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We have provided a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Relevant modifications: Page 9, final paragraph; Page 12, final paragraph.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We have clarified that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Relevant modifications: Page 4, first paragraph; Page 13, first paragraph.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We have elaborated on this point, and moved the discussion earlier in the text.

      Relevant modifications: Page 1, 1st paragraph; Page 4, 2nd paragraph.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We ultimately decided to remove these discussions from the main text, as they had little bearing on the content of our work. Within the Ethics Declarations section we softened our claims from “millenia” to “centuries,” as indigenous psychedelic use over this latter period of time is well-substantiated.

      Relevant modifications: removed from introduction; modified Ethics Declarations

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. There are two possible additional factors that could contribute to this phenomenon: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We have provided an extended discussion of these nuances in our revision.

      Relevant modifications: Page 1, paragraph 2.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Relevant modifications: Page 9, paragraph 1; Page 10, final paragraph; Page 11, final paragraph.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our Wake-Sleep-trained models and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide an biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b).

      To demonstrate that our proposed hallucination mechanism is capable of producing more complex hallucinations in larger, more powerful models, we employed our same hallucination generation mechanism in a pretrained Very Deep Variational Autoencoder (VDVAE) (Child et al., 2021), which is a hierarchical variational autoencoder with a nearly identical structure compared to our Wake-Sleep-trained networks, with both a bottom-up inference pathway and a top-down generative pathway that maps cleanly onto our multicompartmental neuron model. VDVAEs are trained on the same objective function as our Wake-Sleep-trained networks, but using the backpropagation algorithm. The VDVAE models were able to generate much more complex hallucinations (emergence of complex geometric patterns, smooth deformations of objects and faces), whose complexity arguably exceeds those produced by the DeepDream algorithm. Therefore while the VDVAEs are less biologically realistic (they do not learn via local synaptic plasticity), they function as a valuable high-level model of hallucination generation that complements our Wake-Sleep-trained approach. As further validation, we were also able to replicate our key results and testable predictions with these models.

      Relevant modifications: Results section “Modeling hallucinations in large-scale pretrained networks”; Figure 6, S7, S8; Page 12, paragraph 3; Methods section “Generating hallucinations in hierarchical variational autoencoders.”

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We have added a discussion of this in our ‘Model Limitations’ section.

      Relevant modifications: Page 12, paragraph 4.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      Relevant modifications: Page 10, paragraph 1.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      In our revised submission, ‘ripple’ phenomena are now visible in two places: Fig 2c-d, and Fig 6 (rows 2 and 3). Because the VDVAE models used to generate Figure 6 produce higher quality generated images, the ripples appearing in these plots are likely more prototypical, but it is not easy to evaluate the quality of these visualizations relative to subjective hallucination phenomena.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our Wake-Sleep-trained model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results. In fact, the pretrained VDVAE models that we worked with do include top-down influence during the Wake-stage inference process, and these models recapitulated our key results and testable predictions (Fig. S8).

      Relevant modifications: Fig. S8; Page 12, paragraph 4.

    1. eLife Assessment

      This valuable study highlights the key role of NK cells and PD-L1+ neutrophils in worsening sepsis responses in the context of MASH (metabolic dysfunction-associated steatohepatitis). It focused on the role of neutrophils in mediating this effect, which is based on a choline-deficient high-fat diet model of various knockouts or selective ablation of immune cell types. While the data presented are of great interest, there are concerns around the reliability of the strength of the evidence provided, which is currently considered incomplete. The study may be of interest to researchers in immunopathological disease mechanisms once confirmatory studies have been completed.

      [Editors' note: the authors no longer have access to the original flow cytometry data and plan to compile new datasets for further consideration.]

    2. Reviewer #1 (Public review):

      Summary:

      By using an established NAFLD model, choline-deficient high-fat diet, Barros et al show that LPS challenge causes excessive IFN-γ production by hepatic NK cells which further induces recruitment and polarization of a PD-L1 positive neutrophil subset leading to massive TNFα production and increased host mortality. Genetic inhibition of IFN-γ or pharmacological blockade of PD-L1 decreases recruitment of these neutrophils and TNFα release, consequently preventing liver damage and decreasing host death.

      Since NAFLD is often accompanied by chronic, low-grade inflammation, it can lead to an overactive but dysfunctional immune response and increase the body's overall susceptibility to infections, therefore this is very important research question.

      Strengths:

      The biggest strength of the manuscript is vast number of mouse strains used.

      Weaknesses:

      After the review, there are still some open questions from my side:

      (1) I would like the authors to defend their choice of diet type since this has not been done in the review/response to authors. In case they cannot, we need additional proof (HFD or WD model).

      (2) Since the authors used same control groups (chow and HFCD), as required by the animal ethics committee, they must have power analysis test to show that the number of controls (but also in other groups) they used is enough to see the effect. Please provide it.

    3. Reviewer #2 (Public review):

      Summary:

      This is an extremely interesting mouse study, trying to understand how sepsis is tolerated during obesity/NAFLD. The researchers combine a well-established model of NASH (Choline-deficiency with High Fat Diet) with a sepsis model (IP injection of 10mg/kg LPS), leading to dramatic mortality in mice. Using this model, they characterize the complex contributions of immune cells. Specifically, they find that NK-cells and Neutrophils contribute the most to mortality in this model due to IFNG and PD-L1+ Neutrophils.

      Strengths:

      The biggest strength of the manuscript is how clear the primary phenotypes/endpoints of their model are. Within 6 hours of LPS injection, there is a stark elevation of liver inflammation and damage, which is exacerbated by a High Fat/CholineDeficient diet (HFCD). And after 1 day, almost all of the mice die. Using these endpoints, the authors were able to identify which cells were critical for mortality in the model and the specific mediators involved.

      Comments on revisions:

      I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

    1. eLife Assessment

      This work describes the establishment of an image analysis pipeline for signal correction, segmentation and quantitative data analysis of multilayered organoid and tumoroid systems. The revised study is important for the field to address many practical challenges in deep-tissue visualization. The image analysis pipeline is well-designed and compelling.

    2. Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhaced laser penetration, dual view registration and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline image, different pre-treatments are done dependent on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses in toto properties of gastruloid nuclear density, patterns of cell division, morphology, deformation and gene expression.

      Strengths:

      The methods developed are sound, well described and well validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research and would be of interest to the wide scientific community.

      Comments on revisions:

      I am happy with the job the authors have done with the revision. No further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques, enabling improved deep-tissue visualization compared with conventional methods. This advanced approach allows comprehensive 3D imaging of whole-mount immunostained gastruloids, capturing both tissue-scale architecture and single-cell-level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The image analysis method for nuclei segmentation was thoroughly benchmarked against existing methods, demonstrating advantages over conventional approaches, and its applicability across diverse datasets was convincingly established. The authors also evaluated the state-of-the-art Cellpose-SAM framework, showing that it performs well on their data and that the authors' preprocessing strategy can further enhance Cellpose-SAM's segmentation performance in deep tissues.<br /> The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven napari platform, facilitating interactive exploration and analysis.

      Weaknesses:

      In my initial review, I noted that the developed image analysis pipeline lacked benchmarking against existing methods and provided only a limited demonstration of its applicability to other datasets. These points have been appropriately addressed in the revised manuscript, and I have no further weaknesses to note.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim was compellingly achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community. Given that suitable datasets for developing advanced 3D cell segmentation methods remain scarce in biological image analysis, the public release of these data is significant and is expected to stimulate further advances in the development of sophisticated computational approaches.

      Comments on revisions:

      The authors have addressed the previous revision thoroughly and appropriately. I have no further suggestions or additional recommendations at this time.

    4. Reviewer #3 (Public review):

      Summary

      The paper presents a imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-depended intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference is very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done with the correlation between nuclear shape deformation and tissue density changes being a interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot) and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      Comments on revisions:

      The minor issues that I originally raised in my first review have been fully resolved in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.  

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.  

      Strengths:  

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      A recommendation should be added on when or under which conditions to use this pipeline. 

      We thank the reviewer for this valuable feedback, we added the text in the revised version, ines 418 to 474. “In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm”.

      “The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed)”.

      “Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered”.

      “Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results”.

      “Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium)”.

      “Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed”.

      Reviewer #2 (Public review):  

      Summary:  

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.  

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.  

      All computational tools developed in this study are released as open-source, Python-based software.  

      Strengths:  

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.  

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics. We added the following sentences to the Discussion, lines 418 to 474, and completed the discussion on applicability with a table showing the purpose, requirements, applicability and limitations of each step of the processing and analysis pipeline.

      “Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope”.

      “Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript”.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. To evaluate our 3D nuclei segmentation model, we tested it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method). The results are added in the manuscript, Fig. S9b.

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.  

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 1.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript.  The results are added in the manuscript, Fig. S9b. Fig3 displays the results qualitatively compared to our trained model Stardist-tapenade.

      Author response image 2.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      CellPose-SAM, which is a recent model developed building on the CellPose framework. The pre-trained model performs well on gastruloids imaged using our pipeline, and performs better than StarDist3D at segmenting elongated objects such as deformed nuclei. The performances are qualitatively compared on Fig. S9a and S10.  We also demonstrate how using local contrast enhancement improves the results of CellPose-SAM (Fig. S10a), showing the versatility of the Tapenade pre-processing module. Tissue-scale, packing-related metrics from Cellpose–SAM labels qualitatively match those from stardist-tapenade as shown Fig.10c and d.

      Appraisal:  

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.  

      Impact and utility:  

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.  

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary  

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.  

      Strengths  

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses  

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:  

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).  

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately:

      https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      A morphometric analysis based on the axial views was added as Fig. S6a of the manuscript, complementary to the XY views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      In lines 64 and 65, it is mentioned that confocal and light-sheet microscopy remain limited to samples under 100μm in diameter. I would recommend revising this sentence. In the paper of Moos and colleagues (also cited in this manuscript; PMID: 38509326), gastruloid samples larger than 100μm are imaged in toto with an open-top dual-view and dual-illumination light-sheet microscope, and live cell behaviour is analysed. Another example, if considering also multi-angle systems, is the impressive work of McDole and colleagues (PMID: 30318151), in which one of the authors of this manuscript is a corresponding author. There, multi-angle light sheet microscopy is used for in toto imaging and reconstruction of post-implantation mouse development (samples much larger than 100μm). Some multi-sample imaging strategies have been developed for this type of imaging system, though not to the sample number extent allowed by the Viventis LS2 system or the Bruker TruLive3D imager, which have higher image quality limitations.

      We thank the reviewer for this remark. As reported in their paper, Moos et al. used dual-view light-sheet microscopy to image gastruloids, which are particularly dense and challenging tissues, with whole-mount samples of approximately 250 µm in diameter. Nevertheless, their image quality metric (DCT) shows a rapid twofold decrease within 50 µm depth (Extended Fig 5.h), whereas with two-photon microscopy, our image quality metric (FRC-QE) decreases by a factor of two over 150 µm in non-cleared samples (PBS) (see Fig. 2 c). While these two measurements (FRC-QE versus DCT) are not directly comparable, the observed difference reflects the superior depth performance of two-photon microscopy, owing in part to the use of non-descanned detectors. In our case, imaging was performed with Hoechst, a blue fluorophore suboptimal for deep imaging, whereas in the Moos dataset (Draq5, far-red), the configuration was more favorable for imaging in depth  which further supports our conclusion.

      In McDole et al, tissues reaching 250µm were imaged from 4 views, but do not reach cellular-scale resolution in deeper layers compatible with cell segmentation to our knowledge.

      We corrected the sentence ‘However, light-sheet and confocal imaging approaches remain limited to relatively small organoids typically under 100 micrometers in diameter ‘ by the following (line 64) :

      “While advances in light-sheet microscopy have extended imaging depth in organoids, maintaining high image quality throughout thick samples remains challenging. In practice, quantitative analyses are still largely restricted to organoids under roughly 100 µm in diameter”.

      It is worth mentioning that two-photon microscopes are much more widely available than light sheet microscopes, and light sheet systems with 2-photon excitation are even less accessible, which makes the described workflow of Gros and colleagues have a wide community interest.  

      We thank the reviewer for this remark, and added this suggestion line 74:

      “Finally, two-photon microscopes are typically more accessible than light-sheet systems and allow for straightforward sample mounting, as they rely on procedures comparable to standard confocal imaging”.

      Reviewer #2 (Recommendations for the authors):  

      Suggestions:  

      A comparison with established pre-trained models for 3D organoid image segmentation (e.g., Cellos[1], AnyStar[2], and DeepStar3D[3], all based on StarDist3D) would help highlight the advantages of the authors' custom StarDist3D model, which has been specifically optimized for two-photon microscopy images.  

      (1)  Cellos: https://doi.org/10.1038/s41467-023-44162-6

      (2)  AnyStar: https://doi.org/10.1109/WACV57701.2024.00742

      (3)  DeepStar3D: https://doi.org/10.1038/s41592-025-02685-4

      We agree with the reviewer that a benchmark against existing segmentation methods is very useful. This is addressed in the revised version, as detailed above (Figure 3).

      Recommendations:  

      Please clarify the following point. In line 195, the authors state, "This allowed us to detect all mitotic nuclei in whole-mount samples for any stage and size." Does this mean that the custom-trained StarDist3D model can detect 100% of mitotic nuclei? It was not clear from the manuscript, figures, or videos how this was validated. Given the reported performance scores of the StarDist3D model for detecting all nuclei, claiming 100% detection of mitotic nuclei seems surprisingly high.

      We thank the reviewer for this comment. As it was detailed in the methods section, the detection score reaches 82%, and only the complete pipeline (detection+minimal manual curation) allows us to detect all mitotic nuclei. To make it clearer, the following precisions were added in the Results section:

      ”To detect division events, we stained gastruloids with phosphohistone H3 (ph3) and trained a separate custom Stardist3D model using 3D annotations of nuclei expressing ph3 (see Methods III H). This model together allowed us to detect nearly all mitotic nuclei in whole-mount samples for any stage and size (Fig.3f and Suppl.Movie 4), and we used minimal manual curation to correct remaining errors.”

      Minor corrections:  

      It appears that Figures 4-6 are missing from the submitted version, but they can be found in the manuscript available on bioRxiv.

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4 to 6.

      In line 185, is the intended phrase "by comparing the 2D predictions and the 2D sliced annotated segments..."? 

      To gain some clarity, we replaced the initial sentence:

      “The f1 score obtained by comparing the 3D prediction and the 3D ground-truth is well approximated by the f1 score obtained by comparing the 2D annotations and the 2D sliced annotated segments, with at most a 5% difference between the two scores.” by

      “The f1 score obtained in 3D (3D prediction compared with the 3D ground-truth) is well approximated by the f1 score obtained in 2D (2D predictions compared with the 2D sliced annotated segments). The difference between the 2 scores was at most 5%.”

      Reviewer #3 (Recommendations for the authors):

      (1) How is the "local neighborhood volume" defined, and how was it computed?

      The reviewer is referring to this paragraph (the term is underscored) :

      “To probe quantities related to the tissue structure at multiple scales, we smooth their signal with a Gaussian kernel of width σ, with σ defined as the spatial scale of interest. From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of nuclear volume to local neighborhood volume), and nuclear volume at multiple scales.”

      To improve clarity, the phrasing has been revised: the term local neighborhood volume has been replaced by local averaging volume, and a reference to the Methods section has been added.

      From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of space occupied by nuclear volume within the local averaging volume, as defined in the Methods III I), and nuclear volume at multiple scales.

      (2) In the definition of inertia tensor (18), isn't the inner part normally defined in the reversed way (delta_i,j - ...)?

      We thank the reviewer for noticing this error, which we fixed in the manuscript.

      (3) For intensity normalization, the paper uses the Hoechst signal density as a proxy for a ubiquitous nuclei signal. I would assume that this is problematic, for eg, dividing cells (which would overestimate it). Would using the average Hoechst signal per nucleus mask (as segmentation is available) be a better proxy?

      We agree that this idea is appealing if one assumes a clear relationship between nuclear volume and Hoechst intensity. However, since cell and nuclear volumes vary substantially with differentiation state (see Fig. 4), such a normalization approach would introduce additional biases at large spatial scales. We believe that the most robust improvement would instead consist in masking dividing cells during the normalization procedure, as these events could be detected and excluded from the computation.

      Nonetheless, we believe the method proposed by the reviewer could prove relevant for other types of data, so we will implement this recommendation in the code available in the Tapenade package.

      (4) Figures 4-6 were part of the Supplementary Material, but should be included in the main text?

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4-6.

      We also noticed a missing reference to Fig. S3 in the main text, so we added lines 302 to 307 to comment on the wavelength-dependency of the normalization method. We improved the description of Fig.6, which lacked clarity (line 316 to 321, line 327).

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H. T.; Karatas, E.; Poquillon, T.; Grenci, G.; Furlan, A.; Dilasser, F.; Mohamad Raffi, S. B.; Blanc, D.; Drimaracci, E.; Mikec, D.; Galisot, G.; Johnson, B. A.; Liu, A. Z.; Thiel, C.; Ullrich, O.; OrgaRES Consortium; Racine, V.; Beghin, A. (2025). Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology.  Nature Methods, 22(6), pp.1343-1354

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4:Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

    1. eLife Assessment

      This important work compares the size of two brain areas, the amygdala and the hippocampus, across 12 species belonging to the Macaca genus. The authors find, using a convincing methodological approach, that amygdala - but not hippocampal - volume varies with social tolerance grade, with high tolerance species showing larger amygdala than low tolerance species of macaques. Interestingly, their findings also suggest an inverted developmental effect, with intolerant species showing an increase in amygdala volume across the lifespan, compared to tolerant species exhibiting the opposite trend. Overall, this paper offers new insights into the neural basis of social and emotional processing.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades such that high-tolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new, important evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old.

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are nicely detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      (4) The following comments were brought up during the review. In their revision, the authors have sufficiently addressed all of these comments by providing detailed responses and updating their manuscript. First, the revision clarified how much one could draw conclusions about "nature vs. nurture" from this study. Second, the revision also clarified the contributions of very young and very old animals in their correlations. Third, in their revision, the authors expanded on how their results could be interpreted in the context of multiple behavioral traits by Thierry (2021) by providing more detailed descriptions. Finally, during the revision, the authors clarified that both intolerant and tolerant species experience complex socio-cognitive demands and highlighted that socio-cognitive challenges arise across the tolerance spectrum under different behavioral demands.

    3. Reviewer #2 (Public review):

      Summary:

      This comparative study of macaque species and type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power they have combined data from 4 centres - that have all used different scanning methods and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focussed on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: 1) that more intolerant species have relatively larger amygdalae, and 2) that with development there is an opposite pattern of volume change (increasing with age in intolerant sp and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. I suspect I would end up doing the same but it feels a bit like 'heads I win, tails you lose'. In the case of Grade 1 species, the individuals have a lot to learn especially if they are not top of the hierarchy, but at the same time there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite to how I read them, in which case the Table and preceding text needs to align.)

      Comments on revisions:

      I am happy with all of the revisions and the care shown by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus that remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The modifications brought up between the two versions of the article have answered my remarks regarding age/grade/brain area differences.

      As such, I think the results are holding strong, but maybe more work is needed with respect to interpretation.<br /> Classification of the social grade, as well as the issue of nature vs nurture have been addressed by the authors, I thank them for this.<br /> I still feel the integration of the amygdala as a common cognitive & emotional center could be possibly more pushed in the discussion, although I acknowledge that it would be complicated to do without knowing how the emotional and social lives of these animals impacted the growth of their amygdala...

      Strengths:

      Methods & breadth of species tested

      Weaknesses:

      Interpretations, which, although softened, could still be more integrated with the literature on emotion

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      We thank Reviewer #1 for its thoughtful and constructive feedback. We found the suggestions particularly helpful in refining the conceptual framework and clarifying key aspects of our interpretations.

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades, such that hightolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      Strengths:

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old - an age that is rare in the wild but more common in captive settings. 

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are well detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      Weaknesses:

      (1) The nature vs. nurture distinction is an important one, but it may be difficult to draw conclusions about "nature" in this case, given that only two data points (from grades 3 and 4) come from animals under one year of age (Method Figure 1D). Most brains were collected after substantial social exposure-typically post age 1 or 1.5-so the data may better reflect developmental changes due to early life experience rather than innate wiring. It might be helpful to frame the findings more clearly in terms of how early experiences shape development over time, rather than as a nature vs. nurture dichotomy.

      We agree with the reviewer that presenting our findings through a strict nature vs. nurture dichotomy was potentially misleading. We have revised the introduction and the discussion (e.g. lines 85-95 and 363-365) to clarify that we examined how neurodevelopmental trajectories differ across social grades with the caveat of related to the absence of very young individuals in our samples.  We now explicitly mention that our results may reflect both early species-typical biases and experience-dependent maturation.

      We positioned our study on social tolerance in a comparative neuroscience framework and introduced a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates

      Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organize these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024).

      “Cross-fostering experiments (De Waal and Johanowicz, 1993), along with our own results, suggest that social tolerance grades reflect both early, possibly innate predispositions and later environmental shaping”.

      (2) It would be valuable to clarify how the older individuals, especially those 20+ years old, may have influenced the observed age-related correlations (e.g., positive in grades 1-2, negative in grades 3-4). Since primates show well-documented signs of aging, some discussion of the potential contribution of advanced age to the results could strengthen the interpretation.

      We thank the reviewer for highlighting this important point. In our dataset, younger and older subjects are underrepresented, but they are distributed across all subgroups. Therefore, we do not think that it could drive the interaction effect we are reporting. In our sample, amygdala volume tended to increase with age in intolerant species and decrease in tolerant species. We included a new analysis (Figure 4) that allows providing a clearer assessment of when social grades 1 vs 4 differed in terms of amygdala and hippocampus volume. While our model accounts for age continuously, we agree that age-related variation deserves cautious interpretation and require longitudinal designs in future studies.

      We also added the following statements in the discussion (lines 386-391)

      “Due to a limited sample size of our study, this crossing trend, already accounted for by our continuous age model, should be further investigated. These results call for cautious interpretation of age-related variation and further emphasize the importance of longitudinal studies integrating both behavioral, cognitive and anatomical data in non-human primates, which would help to better understand the link between social environment and brain development (Song et al., 2021)”.

      (3) The authors categorize the behavioral traits previously described in Thierry (2021) into 3 selfdefined cognitive requirements, however, they do not discuss under what conditions specific traits were assigned to categories or justify why these cognitive requirements were chosen. It is not fully clear from Thierry (2021) alone how each trait would align with the authors' categories. Given that these traits/categories are drawn on for their neuroanatomical hypotheses, it is important that the authors clarify this. It would be helpful to include a table with all behavioral traits with their respective categories, and explain their reasoning for selecting each cognitive requirement category.

      Thank you for this important suggestion. We have extensively revised the introduction to explain how we derived from the scientific literature the three cognitive dimensions—socio-cognitive demands, behavioral inhibition, and predictability of the social environment—. We now provide a complete overview of the 18 behavioral traits described in Thierry’s framework and their cognitive classification in a dedicated table , along with hypothesized neural correlates. We have also mentioned traits that were not classified in our framework along with short justification of this classification. We believe this addition significantly improves the transparency and intelligibility of our conceptual approach.

      “The concept of social tolerance, central to this comparative approach, has sometimes been used in a vague or unidimensional way. As Bernard Thierry (2021) pointed out, the notion was initially constructed around variations in agonistic relationships – dominance, aggressiveness, appeasement or reconciliation behaviors – before being expanded to include affiliative behaviors, allomaternal care or male–male interactions (Thierry, 2021). These traits do not necessarily align along a single hierarchical axis but rather reflect a multidimensional complexity of social style, in which each trait may have co-evolved with others (Thierry, 2021, 2000; Thierry et al., 2004). Moreover, the lack of a standardized scientific definition has sometimes led to labeling species as “tolerant” or “intolerant” without explicit criteria (Gumert and Ho, 2008; Patzelt et al., 2014). These behavioral differences are characterized by different styles of dominance (Balasubramaniam et al., 2012), severity of agonistic interactions (Duboscq et al., 2014), nepotism (Berman and Thierry, 2010; Duboscq et al., 2013; Sueur et al., 2011) and submission signals (De Waal and Luttrell, 1985; Rincon et al., 2023), among the 18 covariant behavioral traits described in Thierry's classification of social tolerance (Thierry, 2021, 2017, 2000)”.

      “To ground the investigation of social tolerance in a comparative neuroanatomical framework, we introduce a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates. Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organized these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024; Testard 2022)”.

      (4) One of the main distinctions the authors make between high social tolerance species and low tolerance species is the level of complex socio-cognitive demands, with more tolerant species experiencing the highest demands. However, socio-cognitive demands can also be very complex for less tolerant species because they need to strategically balance behaviors in the presence of others. The relationships between socio-cognitive demands and social tolerance grades should be viewed in a more nuanced and context-specific manner. 

      We fully agree and we did not mean that intolerant species lives in a ‘simple’ social environment but that the ones of more tolerant species is markedly more demanding. Evidence supporting this statement include their more efficient social networks (Sueur et al., 2011) and more complex communicative skills (e.g. tolerant macaques displayed higher levels of vocal diversity and flexibility than intolerant macaques in social situation with high uncertainty (Rebout et al., 2020).

      In the revised version (lines 106-122), we now highlight that socio-cognitive challenges arise across the tolerance spectrum, including in less tolerant species where strategic navigation of rigid hierarchies and risk-prone interactions is required. We hope that this addition offers a more balanced and nuanced framing of socio-cognitive demands across macaque societies

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      (5) While the limitations section touches on species-related considerations, the issue of individual variability within species remains important. Given that amygdala volume can be influenced by factors such as social rank and broader life experience, it might be useful to further emphasize that these factors could introduce meaningful variation across individuals. This doesn't detract from the current findings but highlights the importance of considering life history and context when interpreting subcortical volumes-particularly in future studies.

      We have now emphasized this point in the limitations section (lines 441-456). While our current dataset does not allow us to fully control for individual-level variables across all collection centers, we recognize that factors such as rank, social exposure, and individual life history may influence subcortical volumes

      “Although we explained some interspecies variability, adding subjects to our database will increase statistical power and will help addressing potential confounding factors such as age or sex in future studies. One will benefit from additional information about each subject. While considered in our modelling, the social living and husbandry conditions of the individuals in our dataset remain poorly documented. The living environment has been considered, and the size of social groups for certain individuals, particularly for individuals from the CdP, have been recorded. However, these social characteristics have not been determined for all individuals in the dataset. As previously stated, the social environment has a significant impact on the volumetry of certain regions. Furthermore, there is a lack of data regarding the hierarchy of the subjects under study and the stress they experience in accordance with their hierarchical rank and predictability of social outcomes position (McCowan et al., 2022)”. 

      Reviewer #2 (Public review):

      We thank Reviewer #2 for its thoughtful remarks and for acknowledging the value of our comparative approach despite its inherent constraints.

      Summary:

      This comparative study of macaque species and the type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power, they have combined data from 4 centres, which have all used different scanning methods, and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focused on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: (1) that more intolerant species have relatively larger amygdalae, and (2) that with development, there is an opposite pattern of volume change (increasing with age in intolerant species and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case, otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable. 

      We thank the reviewer for this important observation. In the original version, Table 1 presented simplified direct predictions linking social tolerance grades to amygdala and hippocampus volumes. We recognize that this formulation may have created confusion In the revised manuscript, we have thoroughly restructured the table and its accompanying rationale. Table 1 now better reflects our conceptual framework grounded in three cognitive dimensions—sociocognitive demands, behavioral inhibition, and social predictability—each linked to behavioral traits and associated neural hypotheses based on published literature. This updated framework, detailed in lines 144-169 of the introduction, provides a more nuanced basis for interpreting our results and avoids the inconsistencies previously noted. The Discussion was also revised accordingly (lines 329-255) to clarify where our findings diverge from the original predictions and to explore alternative explanations based on social complexity. Rather than directly predicting amygdala size from social tolerance grades, we propose that variation in volume emerges from differing combinations of cognitive pressures across species.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. In the case of Grade 1 species, the individuals have a lot to learn, especially if they are not top of the hierarchy, but at the same time, there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite of how I read them, in which case the Table and preceding text need to align.)

      In order to facilitate the interpretation of our Bayesian modelling, we have selected a more focused ROI in our automatic segmentation procedure of the Hippocampus (from Hippocampal Formation to Hippocampus) and have added to the new analysis (Figure 4) that helps to properly test whether the hippocampus significantly differs between species from social grade 1 vs 4. The present analysis found that this is the case in adult monkeys. This is therefore consistent with our hypothesis that amygdala volumes are principally explained by heightened sociocognitive demands in more tolerant species.

      We also acknowledge the reviewer’s concerns about the limited generalizability due to our sample. The challenges of comparative neuroimaging in non-human primates—especially when using post-mortem datasets—are substantial. Given the ethical constraints and the rarity of available specimens, increasing the number of individuals or species is not feasible in the short term. However, we have made all data and code publicly available and clearly stated the limitations of our sample in the manuscript. Despite these constraints, we believe our dataset offers an unprecedented comparative perspective, particularly due to the inclusion of rare and tolerant species such as M. tonkeana, M. nigra, and M. thibetana, which have never been included in structural MRI studies before. We hope this effort will serve as a foundation for future collaborative initiatives in primate comparative neuroscience.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their thoughtful and detailed review. Their comments helped us refine both the conceptual and interpretative aspects of the manuscript. We respond point by point below.

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species. 

      25 brains were extracted by the authors themselves who are highly with this procedure. Overall, we believe that dissection protocols did not alter the total brain volume. Despite our expertise, we experienced some difficulties to not damage the cerebellum. Therefore, this region was not included in our analysis. We also noted that this brain region was also damaged or absent from the Prime-DE dataset.

      Several protocols were used to prepare and store tissue. It could have impacted the total brain volume.

      We agree that differences in tissue preparation and storage could potentially affect total brain volume. Therefore, we explicitly included the main sample preparation variable — whether brains had been previously frozen — as a covariate in our model. This factor did not explain our results. Moreover, Figures 1D and 1I display the frozen status and its correlation with the amygdala and hippocampus ratios, respectively. Figure 2 shows the parameters of the model and the posterior distributions for the frozen status and total brain volume effects.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to modelpredicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1)  Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      We fully agree with this observation. In the revised version of the manuscript, we now include a detailed conceptual table listing all 18 behavioral traits from Thierry’s framework. For each trait, we provide its underlying social implications, its associated cognitive dimension (when applicable), and the hypothesized neural correlate. 

      While some traits may could have been arguably classified in several cognitive dimensions (e.g. reconciliation rate), we preferred to assign each to a unique dimension for clarity. Additionally, the introduction (lines 95-169 + Table1) now explains how each trait was evaluated based on existing literature and assigned to one of the three proposed cognitive categories: socio-cognitive demands, behavioral inhibition, or social unpredictability. This structure offers a clearer and more transparent basis for the neuroanatomical hypotheses tested in the study.

      “Navigating social life in primate societies requires substantial cognitive resources: individuals must not only track multiple relationships, but also regulate their own behavior, anticipate others’ reactions, and adapt flexibly to changing social contexts. Taken advantage of databases of magnetic resonance imaging (MRI) structural scans, we conducted the first comparative study integrating neuroanatomical data and social behavioral data from closely related primate species of the same genus to address the following questions: To what extent can differences in volumes of subcortical brain structures be correlated with varying degrees of social tolerance? Additionally, we explored whether these dispositions reflect primarily innate features, shaped by evolutionary processes, or acquired through socialization within more or less tolerant social environments”.

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      “The second category, inhibitory control, includes traits that involve regulating impulsivity, aggression, or inappropriate responses during social interactions. Tolerant macaques have been shown to perform better in tasks requiring behavioral inhibition and also express lower aggression and emotional reactivity in both experimental and natural contexts (Joly et al., 2017; Loyant et al., 2023). These features point to stronger self-regulation capacities in species with egalitarian or less rigid hierarchies. More broadly, inhibition – especially in its strategic form (self-control) – has been proposed to play a key role in the cohesion of stable social groups. Comparative analyses across mammals suggest that this capacity has evolved primarily in anthropoid primates, where social bonds require individuals to suppress immediate impulses in favour of longer-term group stability (Dunbar and Shultz, 2025). This view echoes the conjecture of Passingham and Wise (2012), who proposed that the emergence of prefrontal area BA10 in anthropoids enabled the kind of behavioural flexibility needed to navigate complex social environments (Passingham et al., 2012)”.

      “The third category, social environment predictability, reflects how structured and foreseeable social interactions are within a given society. In tolerant species, social interactions are more fluid and less kin-biased, leading to greater contextual variation and role flexibility, which likely imply a sustained level of social awareness. In fact, as suggested by recent research, such social uncertainty and prolonged incentives are reflected by stress-related physiology : tolerant macaques such as M. tonkeana display higher basal cortisol levels, which may be indicative of a chronic mobilization of attentional and regulatory resources to navigate less predictable social environments (Sadoughi et al., 2021)”.

      “Each behavioral trait was individually evaluated based on existing empirical literature regarding the types of cognitive operations it likely involves. When a primary cognitive dimension could be identified, the trait was assigned accordingly. However, some behaviors – such as maternal protection, allomaternal care, or delayed male dispersal – do not map neatly onto a single cognitive process. These traits likely emerge from complex configurations of affective and socialmotivational systems, and may be better understood through frameworks such as attachment theory (Suomi, 2008), which emphasizes the integration of social bonding, emotional regulation, and contextual plasticity. While these dimensions fall beyond the scope of the present framework, they offer promising directions for future research, particularly in relation to the hypothalamic and limbic substrates of social and reproductive behavior”.

      “Rather than forcing these traits into potentially misleading categories, we chose to leave them unclassified within our current cognitive framework. This decision reflects both a commitment to conceptual clarity and the recognition that some behaviors emerge from a convergence of cognitive demands that cannot be neatly isolated. This tripartite framework, leaving aside reproductive-related traits, provides a structured lens through which to link behavioral diversity to specific cognitive processes and generate neuroanatomical predictions”.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      As pointed out by Thierry and collaborators, the social tolerance concept is already grounded in a phylogenetic framework as social tolerance matches the phylogenetical tree of these macaque species, suggesting a biological ground of these behavioral observations. Given the modest sample size and uneven species representation, we opted not to adopt tools such as Phylogenetic Generalized Least Squares (PGLS) in our analysis. Our primary aim in this study was to explore neuroanatomical variation as a function of social traits, not to perform a phylogenetic comparative analysis per see. That said, we now explicitly acknowledge this limitation in the Discussion and indicate that future work using larger datasets and phylogenetic methods will be essential to disentangle social effects from evolutionary relatedness. We hope that making our dataset openly available will facilitate such futures analyses.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      We appreciate this insightful observation. Indeed, findings from studies in humans and nonhuman primates showing associations between brain structure and social network size typically rely on detailed life history and behavioral data at the individual level. Unfortunately, such finegrained information was not consistently available across our entire sample. While some individuals from the Centre de Primatologie (CdP) were housed in known group compositions and social settings, we did not have access to longitudinal social data—such as rank, grooming rates, or network centrality—that would allow for robust individual-level analyses. We now acknowledge this limitation more clearly in the Discussion (lines 436-443), and we fully agree that future work combining neuroimaging with systematic behavioral monitoring will be necessary to explore how species-level effects interact with individual social experience.

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by DomínguezBorràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      We thank the reviewer for this important and nuanced point. We have revised the manuscript to adopt a more cautious and integrative tone regarding the function of the amygdala. In the revised Discussion (lines 341-355), we now explicitly state that the amygdala is involved in a broad range of processes—emotional, social, and affective—and that these domains are deeply intertwined. Rather than proposing a strict dissociation, we now suggest that the amygdala supports integrated socio-emotional functions that are mobilized differently across social tolerance styles. We also cite recent relevant literature (e.g., Domínguez-Borràs & Vuilleumier, 2021) to support this view and have removed any claim suggesting we challenge the emotional function of the amygdala per se. Our aim is to contribute to a richer understanding of how affective and social processes co-construct structural variation in this region.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Private Comments:

      (1) Table 1 should be formatted for clarity i.e., bolded table headers, text realignment, and spacing. It was not clear at first glance how information was organized. It may also be helpful to place behavioral traits as the first column, seeing that these traits feed into the author's defined cognitive requirements.

      We have reformatted Table 1 to improve clarity and readability. Behavioral traits now appear in the first column, followed by cognitive dimensions and hypothesized neural correlates. Column headers have been bolded and alignment has been standardized.

      (2) Figures could include more detail to help with interpretations. For example, Figure 3 should define values included on the x-axis in the figure caption, and Figure 4 should explain the use of line, light color, and dark color. Figure 1 does not have a y-axis title.

      The figures have been revised and legends completed to ensure more clarity.

      (3) Please proofread for typos throughout.

      The manuscript has been carefully proofread, and all typographical and grammatical errors have been corrected. These changes are visible in the tracked version.

      Reviewer #2 (Recommendations for the authors):

      Specific comments:

      (1) Given all of the variability would it not be a good idea to just compare (eg in the supplemental) the macaque data from just the Strasbourg centre for m mulatta and m toneanna. I appreciate the ns will be lower, but other matters are more standardized.

      We fully understand the reviewer’s suggestion to restrict the comparison to data collected at a single site in order to minimize inter-site variability. However, as noted, such an analysis would come at the cost of statistical power, as the number of individuals per species within a single center is small. For example, while M. tonkeana is well represented at the Strasbourg centre, only one individual of M. mulatta is available from the same site. Thus, a restricted comparison would severely limit the interpretability of results, particularly for age-related trajectories. To address variability, we included acquisition site and brain preservation method as covariates or predictors where appropriate, and we have been cautious in our interpretations. We also now emphasize in the Methods and Discussion the value of future datasets with more standardized acquisition protocols across species and centers. We hope that by openly sharing our data and workflow, we can contribute to this broader goal.

      (2) I have various minor edits:

      (a) L 25 abstract - Specify what is meant by 'opposite trend'; the reader cannot infer what this is.

      Modified in line 25-28: “Unexpectedly, tolerant species exhibited a decrease in relative amygdala volume across the lifespan, contrasting with the age-related increase observed in intolerant species—a developmental pattern previously undescribed in primates.”

      (b) L67 - The reference 'Manyprimates' needs fixing as it does in the references section.

      After double checking, Manyprimates studies are international collaborative efforts that are supposed to be cite this way (https://manyprimates.github.io/#pubs).

      (c) L74 - Taking not Taken.

      This typo has been corrected.

      (d) L129 - It says 'total volume', but this is corrected total volume?

      We have clarified in the figures legends that the “total brain volume” used in our analyses excludes the cerebellum and the myelencephalon, as specified in our image preprocessing protocol. This ensures consistency across individuals and institutions.

      (e) L138 - Suddenly mentions 'frozen condition' without any prior explanation - this needs explaining in the legend - also L144.

      We have added an explanation of the ‘frozen condition’ variable in in the relevant figure legend.

      (f) L166 - Results - it would be helpful to remind readers what Grade 1 signifies, ie intolerant species.

      We now include a brief reminder in the Results section that Grade 1 corresponds to socially intolerant species, to help readers unfamiliar with the classification (Lines 240-251).

      (g)Figure 4 - Provide the ns for each of the 4 grades to help appreciate the meaningfulness of the curves, etc.

      The number of subjects has been added to the Figure and a novel analysis helps in the revised ms help to appreciate the meaningfulness of some of these curves.

      (h) L235 - 'we had assumed that species of high social tolerance grade would have presented a smaller amygdala in size compared to grade 1'. But surely this is the exact opposite of what is predicted in Table 1 - ie, the authors did not predict this as I read the paper (Unless Table l is misleading/ambiguous and needs clarification).

      As discussed in our response to Reviewer #2 and #3, we have restructured both Table 1 and the Discussion to ensure consistency. We now explicitly state that the findings diverge from our initial inhibitory-control-based prediction and propose alternative interpretations based on sociocognitive demands.

      (i) L270 - 'This observation' which?? Specify.

      We have replaced ‘this observation’ with a precise reference to the observed developmental decrease in amygdala volume in tolerant species.

      (j) L327 - 'groundbreaking' is just hype given that there are so many caveats - I personally do not like the word - novel is good enough.

      We have replaced the word ‘groundbreaking’ with ‘novel’ to adopt a more measured and appropriate tone in the discussion.

      (3) I might add that I am happy with the ethics regarding this study. 

      Thanks, we are also happy that we were able to study macaque brains from different species using opportunistic samplings along with already available data. We are collectively making progress on this!

      (4) Finally, I should commend the authors on all the additional information that they provide re gender/age/species. Given that there are 2xs are many females as males, it would be good to know if this affects the findings. I am not a primatologist, so I don't know, for example, if the females in Grade 1 monkeys are just as intolerant as the males?

      We thank the reviewer for this thoughtful comment. We now explicitly mention the female-biased sex ratio in the Methods section and report in the Results (Figure 2, Figure 3) that sex was included as a covariate in our Bayesian models. While a small effect of sex was found for hippocampal volume, no effect was observed for the amygdala. Given the strong imbalance in our dataset (2:1 female-to-male ratio), we refrained from drawing any conclusion about sex-specific patterns, as these would require larger and more balanced samples. Although we did not test for sex-by-grade interactions, we agree that this question—especially regarding whether females and males express social style differences similarly across grades—represents an important direction for future comparative work.

      Reviewer #3 (Recommendations for the authors):

      I found the article well-written, and very easy to follow, so I have little ways to propose improvements to the article to the authors, besides addressing the various major points when it comes to interpretation of the data.

      One list I found myself wanting was in fact the list of the social tolerance grades, and the process by which they got selected into 3 main bags of socio-cognitive skills. Then it would become interesting to see how each of the 12 species compares within both the 18 grades (maybe once again out of the scope of this paper, there are likely reviews out there that already do that, but then the authors should explicitly mention so in the paper: X, 19XX have compared 15 out of 18 traits in YY number of macaque species); and within the 3 major subcognitive requirements delineated by the authors, maybe as an annex?

      We thank the reviewer for this thoughtful suggestion. In the revised manuscript, we now include a detailed table (Table 1) that lists the 18 behavioral traits derived from Thierry’s framework, along with their associated cognitive dimension and hypothesized neuroanatomical correlate. While we did not create a matrix mapping each of the 12 species across all 18 traits due to space and data availability constraints, we agree this is an important direction that should be tackled by primatologist. We now include a sentence (line 87-90) in the manuscript to guide readers to previous comparative reviews (e.g., Thierry, 2000; Thierry et al., 2004, 2021) that document the expression of these traits across macaque species. We also clarify that our three cognitive categories are conceptual tools intended to structure neuroanatomical predictions, and not formal clusters derived from quantitative analyses.

      In the annex, it would also be good to have a general summarizing excel/R file for the raw data, with important information like age, sex, and the relevant calculated volumes for each individual. The folders available following the links do not make it an easy task for a reader to find the raw data in one place.

      We fully agree with the reviewer on the importance of data accessibility. We have now uploaded an additional supplementary file in .csv format on our OSF repository, which includes individuallevel metadata for all 42 macaques: species, sex, age, social grade, total brain volume, amygdala volume, and hippocampus volume. The link to this file is now explicitly mentioned in the Data Availability section. We hope this will facilitate comparisons with other datasets and improve usability for the community. In addition, we provide in a supplementary table the raw data that were used for our Bayesian modelling (see below).

      The availability of the raw data would also clear up one issue, which I believe results from the modelling process: it looks odd on Figure 2, that volume ratios, defined as the given brain area volume divided by the total brain volume, give values above 1 (especially for the hippocampus). As such, the authors should either modify the legend or the figure. In general, it would be nicer to have the "real values" somewhere easily accessible, so that they can be compared more broadly with: 1) other macaques species to address questions relevant to the species; 2) other primates to address other questions that are surely going to arise from this very interesting work!

      We thank the reviewer for pointing this out. The ratio values in Figure 1 correspond to the proportion of the regional volume (amygdala or hippocampus) relative to the total brain volume, excluding the cerebellum and myelencephalon. As such, values above 0.01 (i.e., above 1% of the brain volume) are expected for these structures and do not indicate an error. We have updated the figure legend to clarify this point explicitly. In addition, we have now made a cleaned .csv file available via OSF, containing all raw volumetric data and metadata in a format that facilitates cross-species or cross-study comparisons. This replaces the previous folder-based structure, which may have been less accessible.

      Typos:

      L233: delete 'in'

      L430: insert space in 'NMT template(Jung et al., 2021).'