10,000 Matching Annotations
  1. Jan 2026
    1. Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about this manuscript and offer a few minor comments below that may help to further strengthen the study.

      (1) Page 4

      PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Figure 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not in other fully-engaged PIC structures.

      (2) Page 8

      Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as the free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function of the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3. Because the yeast strains used in Figure 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      (3) Page 11

      Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

    2. Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward, and the models for coupling initiation and CTD phosphorylation and for the evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Weaknesses:

      Additional data that should be easily obtainable and analysis of existing data would enable an additional test of the models presented and extract additional mechanistic insights.

    3. Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module and of Ser5 phosphorylation on the CTD of Pol II is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled, and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      Weaknesses:

      (1) The work is limited in scope and does not provide any major insights into the mechanism of transcription. One indication of this limitation is that in the Discussion, published structural and functional results on transcription are used to support the interpretations of the results here more than current results inform previous models or findings.

      (2) The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3, is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript.

      (3) Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. This idea is supported by a single example from the literature (T. brucei). A more thorough evolutionary analysis could have tested this idea more rigorously.

    1. eLife Assessment

      This manuscript uses adaptive-bandit simulations to describe the dynamics of the Pseudomonas-derived chephalosporinase PDC-3 β-lactamase and its mutants to better understand antibiotic resistance. The finding, that clinically observed mutations alter the flexibility of the Ω- and R2-loops, reshaping the cavity of the active site, is valuable to the field. The evidence is considered incomplete, however, with the need for analysis to demonstrate equilibrium weighting of adaptive trajectories and related measures of statistical significance.

    2. Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics 2 of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived chephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author conclude that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds in disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67 and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent componente analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript that this simulation strategy is well suited for the problem under evaluation.

      Weaknesses:

      In the revised version, the authors addressed my concerns regarding their use of the MSM, and in my view, their conclusions are now much more robust and well-supported by the data. While it would be very interesting to see a quantitative correlation between the effects of the mutations observed in the MD data and relevant experimental findings, I understand that this may be beyond the scope of the manuscript.

    3. Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting and the study uses MD simulations and to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. Some greater consideration of the uncertainties and how the method choice affect the ability to compare equilibrium properties would strengthen the quantitative conclusions. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described the relationship to prior literature is discussed extensively.

      Comments on revised version:

      I am concerned that the authors state in the response to reviews that it is not possible to get error bars on values due to the use of the AB-MD protocol that guides the simulations to unexplored basins. Yet the authors want to compare these values between the WT and mutants. This relates to RMSD, RMSF, % H-bond and volume calculations. I don't accept that you cannot calculate an uncertainty on a time averaged property calculated across the entire simulation. In these cases you can either run repeat simulations to get multiple values on which to do statistical analysis, or you can break the simulation into blocks and check both convergence and calculate uncertainties.

      I note that the authors do provide error bars on the volumes, but the statistics given for these need closer scrutiny (I cant test this without the raw data). For example the authors have p<0.0001 for the following pair of volumes 1072 {plus minus} 158 and 1115 {plus minus} 242, or for SASA p<0.0001 is given for 2 identical numbers 155+/- 3.

      I also remain concerned about comparisons between simulations run with the AB-MD scheme. While each simulation is an equilibrium simulation run without biasing forces, new simulations are seeded to expand the conformational sampling of the system. This means that by definition the ensemble of simulations does not represent and equilibrium ensemble. For example, the frequency at which conformations are sampled would not be the same as in a single much longer equilibrium simulation. While you may be able to see trends in the differences between conditions run in this way, I still don't understand how you can compare quantitative information without some method of reweighing the ensemble. It is not clear that such a rewieghting exists for this methods, in which case I advise some more caution in the wording of the comparisons made from this data.

      At this stage I don't feel the revision has directly addressed the main comments I raised in the earlier review, although there is a stronger response to the comments of Reviewer #2.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics 2 of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived chephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author conclude that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds in disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67 and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent componente analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript that this simulation strategy is well suited for the problem under evaluation.

      Weaknesses:

      In the revised version, the authors addressed my concerns regarding their use of the MSM, and in my view, their conclusions are now much more robust and well-supported by the data. While it would be very interesting to see a quantitative correlation between the effects of the mutations observed in the MD data and relevant experimental findings, I understand that this may be beyond the scope of the manuscript.

      Thank you for the careful evaluation and constructive comments. Regarding the suggestion of a more quantitative correlation with experimental observables, we agree that this would be valuable, and we have noted it as an important direction for future work.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting and the study uses MD simulations and to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. Some greater consideration of the uncertainties and how the method choice affect the ability to compare equilibrium properties would strengthen the quantitative conclusions. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described the relationship to prior literature is discussed extensively.

      Comments on revised version:

      I am concerned that the authors state in the response to reviews that it is not possible to get error bars on values due to the use of the AB-MD protocol that guides the simulations to unexplored basins. Yet the authors want to compare these values between the WT and mutants. This relates to RMSD, RMSF, % H-bond and volume calculations. I don't accept that you cannot calculate an uncertainty on a time averaged property calculated across the entire simulation. In these cases you can either run repeat simulations to get multiple values on which to do statistical analysis, or you can break the simulation into blocks and check both convergence and calculate uncertainties.

      We thank the reviewer for raising this point. We would like to clarify that we did not intend to state that error bars are impossible to obtain under AB-MD. In fact, we reported error bars for several quantities derived from the AB-MD trajectories (we also broke the trajectories into blocks and calculated uncertainties for RMSF in our first-round response as you suggested). However, these data are closely related to your concern about comparing quantitative information without an appropriate reweighting of the ensemble. Therefore, in the revised manuscript, we removed quantitative analyses that were calculated directly from the raw AB-MD trajectories. Instead, the quantitative comparisons are now obtained from MSM analysis. We report pocket volumes and key interaction metrics for MSM metastable states, with corresponding error bars for these MSM-based quantities (Figure 6 and its supplementary figure).

      I note that the authors do provide error bars on the volumes, but the statistics given for these need closer scrutiny (I cant test this without the raw data). For example the authors have p<0.0001 for the following pair of volumes 1072 {plus minus} 158 and 1115 {plus minus} 242, or for SASA p<0.0001 is given for 2 identical numbers 155+/- 3.

      Thank you for this comment. As noted above, we have removed the table from the manuscript, and the pocket-volume results together with their error bars are now shown in Figure 6. To address the concern raised here and to avoid making the same mistake in future analyses, we re-examined how the statistics were computed. We believe the very small p-values were caused by treating per-frame MD values as independent observations in two-sample t-tests. Because consecutive MD frames are strongly time-correlated, they do not satisfy the independence assumption, which can greatly overestimate the effective sample size and lead to artificially small p-values. For the SASA, a p < 0.0001 is reported even though both values are shown as 155 ± 3. This is due to rounding, which can hide subtle underlying differences.

      I also remain concerned about comparisons between simulations run with the AB-MD scheme. While each simulation is an equilibrium simulation run without biasing forces, new simulations are seeded to expand the conformational sampling of the system. This means that by definition the ensemble of simulations does not represent and equilibrium ensemble. For example, the frequency at which conformations are sampled would not be the same as in a single much longer equilibrium simulation. While you may be able to see trends in the differences between conditions run in this way, I still don't understand how you can compare quantitative information without some method of reweighing the ensemble. It is not clear that such a rewieghting exists for this methods, in which case I advise some more caution in the wording of the comparisons made from this data.

      At this stage I don't feel the revision has directly addressed the main comments I raised in the earlier review, although there is a stronger response to the comments of Reviewer #2.

      We thank the reviewer for reiterating this important point, and we agree with the underlying concern. Although AB-MD generates unbiased trajectories, the ensemble of simulations does not represent an equilibrium ensemble. As a result, statistics computed by simply concatenating all AB-MD trajectories should not be used for quantitative comparisons. In the original version, we acknowledge that we reported several quantitative descriptors directly from concatenated AB-MD frames, including (i) distributions of χ1 torsions, (ii) mean pocket volumes and SASA, and (iii) percentages of some key interactions. We agree that this was not appropriate given the adaptive sampling protocol. In the revised manuscript, we have removed these quantitative analyses.

      We retained RMSD and RMSF analyses, but we have revised their wording and clarified their purpose. RMSD and RMSF are used only to summarize the structural variability and residue-level mobility observed across the collected trajectory segments and to motivate the selection of structural features for MSM construction. The manuscript now states: “Because AB-MD adaptively seeds new unbiased trajectories to expand conformational sampling, RMSD and RMSF are used here to summarize the structural variability and per-residue mobility observed across the collected trajectories.”

      Regarding the reviewer’s question about reweighting, the Markov state model (MSM) provides a principled framework to obtain the stationary distribution π from the transition probability matrix T<sub>τ</sub>. The resulting π<sub>i</sup> gives the equilibrium weight of each microstate i, and the corresponding discrete free energy can be written as F<sup>i</sup>=−k<sub>B</sub>Tln(π<sub>i</sup>). PCCA then coarse-grains the microstate space into a small number of metastable states. In the revised manuscript, quantitative comparisons are therefore derived from the MSM at the level of these metastable states, rather than from unweighted counts of concatenated AB-MD frames.

      Accordingly, we have revised the sections “E219K and Y221A mutations facilitate proton transfer” and “Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams”, and we have added new figures in Figure 6 and its figure supplement. The adjustments to the quantitative analyses do not affect our original conclusions.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses adaptive sampling simulations to understand the impact of mutations on the specificity of the enzyme PDC-3 β-lactamase. The authors argue that mutations in the Ω-loop can expand the active site to accommodate larger substrates.

      Strengths:

      The authors simulate an array of variants and perform numerous analyses to support their conclusions. The use of constant pH simulations to connect structural differences with likely functional outcomes is a strength.

      Weaknesses:

      I would like to have seen more error bars on quantities reported (e.g., % populations reported in the text and Table 1).

      We appreciate this point. Here, the population we analyze is intended to showcase conformational differences across variants rather than to estimate equilibrium occupancies. Although each system includes 100 trajectories, they were generated using an adaptive-bandit protocol. The protocol deliberately guides towards underexplored basins, therefore conformational heterogeneity betweentrajectories is expected by design. For example, in E219K the MSM decomposition shows that in states 1, 6, and 7 the K67(NZ)–S64(OG) distance is almost entirely > 6 Å, whereas in states 2 and 3 it is almost entirely < 3.5 Å (Figure 5—figure supplement 12). These distances suggest that the hydrogen bond fraction is approximately zero in states 1, 6, and 7, and close to one in states 2 and 3. In addition, the mean first passage time of the Markov state models suggests that the formation and disruption of this hydrogen bond occur on the microsecond timescale, which is far longer than the length of each individual trajectory (300 ns). Consequently, across the 100 replicas, some trajectories exhibit very low fractions, while others display the opposite trend. Under such bimodal, protocol-induced heterogeneity, computing an error bar across trajectories mainly visualizes the protocol’s dispersion and risks being misread as thermodynamic uncertainty, which is not central to our aim of comparing conformational differences between wild-type PDC-3 and variants. We therefore do not include the error bars. 

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived cephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author concludes that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds is disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67, and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent component analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem, and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript suggests that this simulation strategy is well-suited for the problem under evaluation.

      Weaknesses:

      In the description of many of their results, the authors do not provide enough information for a deep understanding of the biochemistry/biophysics involved. Without these issues addressed, the strength of the evidence is of concern.

      We thank the reviewer for pointing out the need for deeper discussion of the biochemical and biophysical implications of our results. In our manuscript, we begin by examining basic structural metrics (e.g., RMSD and RMSF) which clearly indicate that the major conformational changes occur in the Ω-loop and the R2 loop. We have now added a paragraph to describe the importance of the Ωloop and highlighted it in the revised manuscript on lines 142-166 of page 6. This observation guided our subsequent focus on these regions, as well as on the catalytic site. Our analysis revealed notable alterations in the hydrogen bonding network—especially in interactions involving the K67-S64, K67N152, K67-G220, Y150-A292, and N287-N314 pairs. These observations led us to conclude that:

      (1) Mutations E219K and Y221A facilitate the proton transfer of catalytic residues. This is consistent with prior experimental data showing that these substitutions produce the most pronounced increase in sensitivity to cephalosporin antibiotics (lines 210-212 in page 8 of the revised manuscript). 

      (2) Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams.This is in line with MIC measurements reported by Barnes et al. (2018), which showed that mutants with larger active-site pockets exhibit markedly greater sensitivity to cephalosporins with bulky side chains than others (lines 249-259 in pages 10).

      Furthermore, we applied Markov state models (MSMs) to explore the timescales of the transitions between these different conformational states. We believe that these methodological steps support our conclusions.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting, and the study uses MD simulations to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. However, the study doesn't clearly describe the way the data is generated. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described, and the relationship to prior literature is discussed extensively.

      Weaknesses:

      The methods used to gain the results are not explained clearly, meaning it was hard to determine exactly how some data was obtained. The convergence and uncertainties in the data were not adequately quantified. The text is also a little long, which obscures the main findings.

      We thank the reviewer for the suggestion. We respectfully ask the reviewer to specify which aspects of the data-generation methods are unclear so that we can include the necessary details in the next revision. Moreover, all statistics that are reported in the manuscript are obtained from extensive analyses of 300,000 simulation frames. The Markov state models have been validated by the ITS plots and Chapman-Kolmogorov (CK) test. The two-sample t-tests were also carried out for the volume and SASA.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1D focus on the PDC3 catalytic site. However, the authors mentioned before that the enzyme has two domains, an alpha domain and an alpha/beta domain. The reader would benefit from a more detailed description of the enzyme, its active site, AND the location of the mutants under investigation in the figure.

      We have updated Figure 1D and marked the positions of all mutations (V211A/G, G214A/R, E219A/G/K and Y221A/H), which have now been highlighted as spheres.

      (2) Since in the journal format, the results come before the methods. It would be interesting to add a brief description of where the results came from. For example, in the first section of the results, the authors describe the flexibility of the omega loop and the R2 loop. However, the reader won't know what kind of simulation was used and for how long, for example. A sentence would add the required context for a deeper understanding here.

      At the beginning of the Results and Discussion section we now state: “To investigate how the mutations in the Ω-loop affect PDC-3 dynamics, adaptive-bandit molecular dynamics (AB-MD) simulations were carried out for each system. 100 trajectories of 300 ns each (totaling 30 μs per system) were run.”

      (3) Still in the same section, the authors don't define what change in RMSF is considered significant. For example, I can't see a relevant change in the RMSF for the omega loop between the et enzyme and the E219 mutants in Figure 2D. A more objective definition would be of benefit here.

      Our analysis reveals that while the wild-type PDC-3 and the G214A, G214R, E214G, and Y221A variants exhibit an average per-residue RMSF of around 4 Å in the Ω-loop, the V211A and V211G variants show markedly lower values (around 1.5 Å), and the E219K and Y221H variants exhibit intermediate values between 2 and 2.5 Å. In addition, the fluctuations around the binding site should be seen collectively along with the fluctuations in the R2-loop. Importantly, we urge the reviewer to focus on the MDLovofit analysis in Figure 2C, where the dynamic differences between the core and the fluctuating loops is clearly evident.  

      (4) In line 138, the authors state that "Therefore, the flexibility of these proteins is mainly caused by the fluctuations in the Ω-loops and R2-loop". This is quite a bold statement to be drawn at this point. First of all, there is no mention of it in the manuscript, but is there any domain movement? Figure 2C clearly shows that there is some mobility in omega and R2 loops. But there is no evidence shown in the manuscript that shows that "the flexibility of these proteins is mainly caused by the fluctuations in the" loops. Please consider rephrasing this sentence or adding more data, if available.

      We have revised the wording to take the reviewer’s concern into account. The sentence now states: “Therefore, flexibility of PDC-3 is predominantly localized to the Ω- and R2-loops, whereas the remainder of the structure is comparatively rigid.” To further explain to the reviewer, the β lactamase enzymes are fairly rigid structures, where no large-scale domain motions occur. Instead, the enzyme communicates structurally via cross correlation of loop dynamics ( https://doi.org/10.7554/eLife.66567 ).  

      (5) I guess, the most relevant question for the scope of the paper is not answered in this section. The authors show that the mobility of the omega- and R2-loops is altered by some mutations. Why is that? I wish I could see a figure showing where the mutations are and where the loops are. This question will come back in other sections.

      We have updated Figure 1D to mark the positions of all mutations (V211A/G, G214A/R, E219A/G/K and Y221A/H) as spheres. The Ω- and R2-loops are also highlighted. All mutations map to the Ω-loop, indicating that these substitutions directly perturb this region. Notably, K67 forms a hydrogen bond with the backbone of G220 within the Ω-loop and another with the phenolic hydroxyl of Y150. Y150, in turn, hydrogen-bonds with A292 in the R2 loop. Together, the residue interaction network (G220– K67–Y150–A292) suggest a pathway by which Ω-loop mutations propagate their effects to the R2 loop.

      (6) The authors then analyze the network of polar residues in the active site and the hydrogen bonds observed there. For the K67-N152 hydrogen bond, for example, there is a reduction in the occupancy from ~70% in the wild-type enzyme to ~30% and 40% in the mutants E219K and Y221, respectively. This finding is interesting. The question that remains is "why is that"? From the structural point of view, how does the replacement of E219 with a Lysine alter the hydrogen bond formation between K67 and N152? Is it due to direct competition? Solvent rearrangement? The reader is left without a clue in this section. Also, Figure 3B won't help the reader, since the mutated residues are not shown there. Please consider adding some information about why the authors believe that the mutations are disrupting the active site hydrogen bond network and showing it in Figure 3B.

      We appreciate the comment and have updated Figures 1D and 3B to highlight the mutation sites. The change from ~70% in the wild type to ~30–40% in the E219K and Y221T variants reported in Table 1 refers to the S64–K67 hydrogen bond. In the wild type, K67 forms an additional hydrogen bond with G220 on the Ω-loop, which helps anchor the K67 side chain in a geometry that favors the S64–K67 interaction. In the variants, the mutations reshape the Ω-loop and frequently disrupt the K67–G220 contact. The loss of this local anchor increases the conformational dispersion of K67, which is consistent with the observed reduction of the S64–K67 occupancy. Furthermore, our observation that the mutations are disrupting the active-site hydrogen-bond network is a data-driven conclusion rather than a subjective inference. Across ten systems, our AB-MD simulations provided 30 µs of sampling per system. Saving one frame every nanosecond yielded 30,000 conformations per system and 300,000 in total. All hydrogen-bond and salt-bridge statistics were computed over this full ensemble. Thus, the conclusion that the mutations disrupt the active-site hydrogen-bond network follows directly from these ensemble statistics. 

      (7) The pKa calculations and the pocket volume calculations show that the mutations expand the volume of the catalytic site and alter the microenvironment. Is there any change in the solvation associated with these changes? If the volume expands and the environment becomes more acidic, are there more water molecules in the mutants as compared to the wt enzyme? If so, can changes in solvation be associated with the changes in the hydrogen bond network? Would a simulation in the presence of a substrate be meaningful here? ( I guess it would!).

      Regarding solvation, we observe a modest increase in transient water occupancy associated with the increase in volume of the pocket. The conserved deacylation water molecule is the most important and is always present throughout the simulation. Additional waters enter and leave the pocket but do not form persistent interactions that measurably perturb the hydrogen-bond network of the Ω- and R2-loops. We agree that simulations with a bound substrate would be informative. However, our study focuses on how Ω-loop mutations modulate the active site of apo PDC-3 and its variants. Within this scope, we find: (i) Amino acid substitutions change the flexibility of Ω-loops and R2-loops; (ii) E219K and Y221A mutations facilitate the proton transfer; (iii) Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams.

      (8) I have some concerns regarding the Markov State Modeling as shown here. After a time-independent component analysis, the authors show the projections on the components, which is different between wild wild-type enzyme and the mutants, and draw some conclusions from these changes. For example, the authors state that "From the metastable state results, we observe that E219K adopts a highly stable conformation in which all the tridentate hydrogen-bonding interactions (K67(NZ)-S64(OG), K67(NZ)N152(OD1) and K67(NZ)-G220(O) mentioned above are broken". This is conclusion is very difficult to draw from Figure 5 alone. Unless the macrostates observed in the MSM can be shown (their structures) and could confirm the broken interactions, I really don't believe that the reader can come to the same conclusion as drawn by the authors here. I would recommend the authors to map the macrostates back to the coordinates and show them (what structure corresponds to what macrostate). After showing that, it makes sense to discuss what macrostate is being favored by what mutation. Taking conclusions from tiCA projections only is not recommended. I very strongly suggest that the authors revisit this entire section, adding more context so that the reader can draw conclusions from the data that is shown.

      We appreciate the reviewer’s concern. In the Markov state modeling section, our objective is to quantify the timescales (via mean first passage times) associated with the formation and disruption of the critical hydrogen bonds (K67(NZ)-S64(OG), K67(NZ)-N152(OD1), K67(NZ)-G220(O), Y150(N)A292(O), N287(ND2)-N314(OD1)) mentioned above. Representative structures illustrating these interactions are shown in Figures 3B and 4A. We agree that the main Figure 5 alone does not convey structural information. Accordingly, we provide Figure 5—figure supplements 12–16. Together, Figure 5B and Figure 5—figure supplements 12–16 map structures to metastable states, whereas Figures 3B and 4A supply atomistic detail of the interactions. Author response image 1 presents selected subplots from Figure 5— figure supplements 12–14. Together with the free-energy landscape in Figure 5A, these data indicate that E219K adopts a highly stable conformation in which all three K67-centered hydrogen bonds (K67(NZ)–S64(OG), K67(NZ)–N152(OD1), and K67(NZ)–G220(O)) are broken.

      Author response image 1.

      TICA plot illustrates the distribution of E219K with the colour indicating the K67(NZ)-S64(OG), K67(NZ)-N152(OD1) and K67(NZ)-G220(O) distance.

      (9) As a very minor issue, there are a few typos in the manuscript text. The authors might want to take some time to revisit their entire text. Examples in lines 70, 197, etc.

      Thank you for your comment. We have corrected these typos.

      Reviewer #3 (Recommendations for the authors):

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting, and the study uses MD simulations to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket.

      However, the study doesn't clearly describe the way the data is generated and potentially lacks statistical rigour, which makes it uncertain if the key results are significant. As such, it is difficult to judge if the conclusions made are supported by data.

      All necessary data-acquisition methods are described in the Methods section. The Markov state models have been validated by the ITS plot and the Chapman-Kolmogorov (CK) test (Figure 5—figure supplement 2–11) . The two-sample t-tests were also carried out for the volume and SASA (Table 2).

      The results section jumps straight to reporting RMSD and RMSF values; however, it is not clear what simulations are used to generate this information. Indeed, the main text does not mention the simulations themselves at all. The methods section mentions that 10 independent MD simulations were set up for each system, but no information is given as to how long these were run or the equilibration protocol used. Then it says that AB-MD simulations were run, but it is not clear what starting coordinates were used for this or how the 10 replicates were fed into these simulations. Most importantly, are the RMSD and RMSF calculations and later distance distribution information derived from the equilibrium MD runs or from the AB-MD simulations?

      Thank you for pointing this out. We have added “To investigate how the mutations in the Ω-loop affect PDC-3 dynamics, adaptive-bandit molecular dynamics (AB-MD) simulations were carried out for each system. 100 trajectories of 300 ns each (totaling 30 μs per system) were run.” to the Results and Discussion section. We didn’t run 10 independent MD simulations per system. We regret the typo in the Methods section that confused the reviewer. The sentence should have read – ‘All-atom MD simulations of wild-type PDC-3 and its variants were performed.’ Each system was equilibrated for 5 ns at 1 atmospheric pressure using Berendsen barostat. AB-MD simulations were initiated from these equilibrated structures. All analyses, apart from CpHMD, are based on the AB-MD trajectories.

      If these are taken from the equilibrium simulations, then it is critical that the reproducibility and statistical significance of the simulations is established. This can be done by calculating the RMSD and RMSF values independently for each replicate and determining the error bars. From this, the significance of differences between WT and mutant simulations can be determined. Without this, I have no data to judge if the main conclusions are supported or not. If these are derived from the AB-MD simulations, then I want to know how the independent simulations were combined and reweighted to generate overall RMSD, RMSF, and distance distributions. Unless I misunderstand the approach, the individual simulations no longer sample all regions of conformational space the same relative amount you would see in a standard MD simulation - specific conformational regions are intentionally run more to enhance sampling, then the overall conformational distributions cannot be obtained from these simulations without some form of reweighting scheme. But no such scheme is described. In addition, convergence of the data is required to ensure that the RMSD, RMSF, and distances have reached stable values. It is possible that I am misunderstanding the approach here. But in that case, I hope the authors can clarify the method and provide a means of ensuring that the data presented is converged. Many of the differences are clear by eye, but it is important to know they are not random differences between simulations and rather reflect differences between them.

      Thank you for raising this important point. In our AB-MD workflow, the adaptive bandit is used only for starting-structure selection (adaptive seeding). After each epoch, it chooses new starting snapshots from previously sampled conformations and launches the next runs. Each trajectory itself is standard, unbiased MD with no biasing potentials and no modification of the Hamiltonian. In other words, AB decides where we start, but does not alter the physics or sampling dynamics within an individual trajectory. In addition, our goal in this work is to compare variants under the same adaptive-bandit (AB) protocol, rather than to estimate equilibrium (Boltzmann) populations. Hence, we did not apply equilibrium reweighting to RMSD, RMSF, or distance distributions. However, MSM section provides reweighted reference results based on the MSM stationary distribution.

      In the response to reviews, the authors state that the "RMSF is a statistical quantity derived from averaging the time series of atomic displacements, resulting in a fixed value without an inherent error bar." But normally we would run multiple replicates and get an error bar from the different values in each. To dismiss the request for uncertainties and error bars seems to miss the point. I strongly agree with the prior reviewer that comparisons between RMSF or other values should be accompanied by uncertainties and estimates of statistical significance.

      Regarding the reviewers’ suggestion to present the data as a bar graph with error bars, we would like to note that RMSF is calculated as the time average of the fluctuations of each residue’s Cα atom over the entire simulation. As such, RMSF is a statistical quantity derived from averaging the time series of atomic displacements, resulting in a fixed value without an inherent error bar. We believe that our current presentation clearly and accurately reflects the local flexibility differences among the variants. Nearly all published studies report RMSF in this way, as indicated by the following examples:

      Figure 3a in DOI: https://doi.org/10.1021/jacsau.2c00077

      Figure 2 in DOI: https://doi.org/10.1021/acs.jcim.4c00089

      Supplementary Fig. 1, 2, 5, 9, 12, 20, 22, 24, and 26 in DOI: https://doi.org/10.1038/s41467-022-293313

      However, in response to the reviewers’ strong request, we present RMSF plots with error bars in our response letter. 

      Author response image 2.

      The root-mean-square fluctuation (RMSF) profiles of wild-type PDC-3 and its variants. Blue lines show the mean RMSF across 100 independent MD trajectories for each system; red translucent bands denote the standard deviation across trajectories. The Ω-loop (residues G183 to S226) is highlighted in yellow, and the R2-loop (residues L280 to Q310) is highlighted in blue.

      It was good to see that convergence of the constant-pH simulations was shown. While it can be challenging to get absolute pH values from the implicit solvent-based simulations, the differences between the systems are large and the trends appear significant. I was not clear how the starting coordinates were chosen for these simulations. Is the end point of the classical simulations, or is a representative snapshot chosen somehow?

      To ensure comparison, all systems used the X-ray crystal structure (PDB ID: 4HEF) with T79A substitution as the initial structure. The E219K and Y221A mutants were generated in silico using the ICM mutagenesis module. We have added the clarification in Methods section: “The starting structures were identical to those used for AB-MD.”

      Significant figures: Throughout the text and tables, the authors present data with more figures than are significant. 1071.81+-157.55 should be reported as 1100 +/ 160 or 1070 =- 160 . See the eLife guidelines for advice on this.

      Thank you for your suggestion. We have amended these now. 

      The manuscript is very long for the results presented, and I feel that a clearer story would come across if the authors shortened the text so that the main conclusions and results were not lost.

      We appreciate the suggestion. We examined the twenty most recent research articles published in eLife and found that they are either longer than or comparable in length to our manuscript.

    1. eLife Assessment

      This study presented valuable findings regarding the basic molecular pathways leading to the cystogenesis of Autosomal Dominant Polycystic Kidney Disease, suggesting BICC1 functions as both a minor causative gene for PKD and a modifier of PKD severity. Solid data were supplied to demonstrate the functional and structural interactions between BICC-1, PC1 and PC2, respectively. The characterization of such interactions remains to be developed further, which renders the specific relevance of these findings for the etiology of relevant diseases unclear.

    2. Reviewer #1 (Public review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported.

      Comments on revision:

      My comments have been mostly addressed.

    3. Reviewer #2 (Public review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      Comments on revision:

      My comments have been addressed.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      Conclusion:<br /> The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      Comments on revision:

      My comments have been addressed and sorted.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of uthis interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation (see Author response image 1). We, however, do not yet have data to support this and thus have not included this model in the manuscript. Yet, we have updated the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      We have updated the discussion to include a discussion on the potential consequences on posttranscriptional regulation by Bicc1.

      Author response image 1.

      Model of BICC1, PC1 and PC2 self-regulation. In this model Bicc1 acts as a positive regulator of PKD gene expression. In the presence of ‘sufficient’ amounts of PC1/PC2 complex, it is tethered to the complex and remains biologically inactive (Fig. 1A). However, once the levels of the PC1/PC2 complex are reduced, Bicc1 is now present in the cytoplasm to promote expression of the PKD proteins, thereby raising their levels (Fig. 4B), which then in turn will ‘shutdown’ Bicc1 activity by again tethering it to the plasma membrane.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require utilization of the mice described in above reference, which is beyond the scope of this manuscript. We, however, have revised the discussion to elaborate on this potential mechanism. 

      We have updated the discussion to include a statement on the potential direct regulation of Pkd1 mRNA by Bicc1.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, similar to the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed when we sacrificed the mice as late as P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing us to the reference showing the heterozygous mice exhibit glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that a better understanding of the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are beyond the timeframe for this revision. 

      No changes were made in the revised manuscript. 

      Reviewer #2 (Public review):

      (1) These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed. 

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      (2) The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been. 

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. As presented below, most of the criticisms raised by the reviewer have been easily addressed in the revised version of the manuscript. Yet, none of the critiques seems to directly impact the overall interpretation of the data. 

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript requires further editing. For example, figure panels and legends are mismatched in Figure 1

      We have corrected the labeling of Figure 1. 

      (2) Y-axis units and values are inconsistent in Figures 4b-4g, Supplementary Figures S2e and S2f are not referenced in the text, genotypes are missing in Supplementary Figure S3f, and numerous typographical errors are present.

      In respect to the y-axis in Figure 4b-g, the scale is different for each of them, but that is intentional as one would lose the differences if they were all scaled identically. But we have now mentioned this in the figure legend to make the reader aware of it. In respect to the Supplemental Figure S2e,f, we included the panels in the description of the mutant BICC1 lines, but unfortunately forgot to reference them. This has now been done.

      We have updated the labeling of the Y-axis for the cystic indices adding “[%]” as the unit and updated the figure legend of Figure 4. We have included the genotypes in Supplementary Figure S3f. The Supplementary Figure S2e,f is now mentioned in the supplemental material (page 9, 2<sup>nd</sup> paragraph). 

      Reviewer #2 (Recommendations for the authors):

      (1) Previous data from mouse, Xenopus, and zebrafish suggest a crucial role for the RNAbinding protein Bicc1 in the pathogenesis of PKD, although BICC1 mutations in human PKD have not been previously reported." The cited sources (and others that were not cited) link Bicc1 mutations to renal cysts, similar to a report by Kraus (PMID: 21922595) that the authors cite later. However, a more direct link to PKD was reported by Lian and colleagues using whole Pkd1 mice (PMID: 20219263) and by Gamberi and colleagues using Pkd1 kidneys and human microarrays (PMID: 28406902). Although relevant, neither is cited here, and only the former is cited later in the manuscript.

      Thanks for pointing this out. We have added these three citations.

      We have added these three citations (PMID: 21922595, PMID: 20219263 and PMID: 28406902) in the indicated sentence.

      (2) In Figure 1B, the lanes do not seem to correspond among panels, particularly evident in the panel with myc-mBicc1. Hence, it is difficult to agree with the presented conclusions.

      We have corrected the labeling of the lanes in Figure 1b.

      (3) In the Figure 1 legend: "(g) Western blot analysis following co-IP experiments, using an anti-mouse Bicc1 or anti-goat PC2 antibody as bait, identified protein interactions between endogenous PC2 and BICC1 in UCL93 cells. Non-immune goat and mouse IgG were included as a negative control." There is no mention of panel H, although this reviewer can imagine what the authors meant. The capitalization differs in the figure and legend. More troublingly, in panel G, a non-defined star indicates a strong band present in both immune and non-immune control.

      We have corrected the figure legend of Figure 1 and clarified the non-specific band in the figure legend.

      (4) In Figure 4, the authors do not show the matched control for the Bicc1 Pkd1 interaction in panel d, nor do they show a scale bar in either a) or d). Thus, the phenotypic severity cannot be properly assessed.

      Thanks for pointing out the missing scale bars, which have now been added. In respect to the two kidneys shown in Figure 4d, the two kidneys shown are from littermates to illustrate the kidney size in agreement with the cumulative data shown in Figure 4e. Unfortunately, this litter did not have a wildtype control. As the data analysis in Figure 4e is based on littermates, mixing and matching kidneys of different litters does not seem appropriate. Thus, we have omitted showing a wildtype control in this panel. However, the size of the wildtype kidney can be seen in Figure 4a.

      We have added the scale bar to both panels and have updated the figure legend to emphasize that the kidneys shown are from littermates and that no wildtype littermate was present in this litter.

      (5) "Surprisingly, an 8-fold stronger interaction was observed between full-length PC1 and myc-mBicc1-ΔKH compared to mycmBicc1 or myc-mBicc1-ΔSAM." Assuming all the controls for protein folding and expression levels have been carried out and not shown/mentioned, this sentence seems to contradict the previous statement that Bicc1deltaSAM reduced the interaction with PC1 by 55%. Because the full length and SAM deletion have different interaction strengths, the latter sentence makes no sense.

      The reduction in the levels of myc-mBicc1-ΔSAM compared to wildtype mycmBicc1 in respect to PC1 binding was not significant. We have clarified this in the text.

      We have corrected the sentence and modified the Figure accordingly. 

      (6) Imprecise statements make a reader wonder how to interpret the data: "More than three independent experiments were analyzed." Stating the sample size or including it in the figure would save space and improve confidence in the data presented.

      We have stated the exact number of animals per conditions above each of the bars.

      (7) "Next, we performed a similar mouse study for Pkd1 by reducing the gene dose of Pkd1 postnatally in the collecting ducts using a Pkhd1-Cre as previously described40" What did the authors mean?

      The reference was included to cite the mouse strain, but realized that it can be mis-interpreted that the exact experiments has been performed previously. We have clarified this in the text.

      We have reworded the sentence to avoid misinterpretation. 

      (8) The authors examined the additive effects of knocking down Bicc1, Pkd1, and Pkd2 with morpholinos in Xenopus and, genetically, in mice. While the Bicc1[+/-] Pkd1 or 2[+/-] double heterozygote mice did not show phenotypes, the authors report that the Bicc1[-/-] Pkd1 or 2 [+/-] did instead show enlarged kidneys. What is the phenotype of a Bicc1[+/-] Pkd1 or 2 [-/-]? What we learn from the author's findings among the PKD population suggests that the latter situation would be potentially translationally relevant.

      The mouse experiments were designed to address a cooperativity between Bicc1 and either Pkd1 or Pkd2 and whether removal of one copy of Pkd1 or Pkd2 would further worsen the Bicc1 cystic kidney phenotype. Thus, the parental crosses were chosen to maximize the number of animals obtained for these genotypes. Unfortunately, these crosses did not yield the genotypes requested by the reviewer. To address the contribution of Bicc1 towards the PKD population, we will need to perform a different cross, where we eliminate Pkd1 or Pkd2 in a floxed background of Bicc1 postnatally in adult mice. While we are gearing up to perform such an experiment, this is timewise beyond the scope of the manuscript. In addition, please note that we have addressed the question about the translation towards the PKD population already in the discussion of the original submission (page 13/14, last/first paragraph).

      No changes have been made to the revised version of the manuscript.

      (9) How do the authors interpret the milder effects of the Bicc1[-/-] Pkd1[+/-] compared to Bicc1[-/-] Pkd2[+/-] relative to the respective protein-protein interactions?

      The milder effects are due to the nature of the crosses. While the Pkd2 mutant is a germline mutation, the Pkd1 mutant is a conditional allele eliminating Pkd1 only in the collecting ducts of the kidney. As such, we spare other nephron segments such as the proximal tubules, which also significantly contribute to the cyst load. As such these mouse data support the interaction between Pkd1 and Pkd2 with Bicc1, but do not allow us to directly compare the outcomes. While this was mentioned in the previous version of the manuscript, we have expanded on this in the revised version of the manuscript.

      We have expanded the results section in the revised version of the manuscript highlighting that the two different approaches cannot be directly compared.

      (10) How do the authors interpret that the strong Bicc1[Bpk] Pkd1 or Pkd2 double heterozygote mice did not have defects and "kidneys from Bicc1+/-:Pkd2+/- did not exhibit cysts (data not shown)", when the VEO PKD patients and - although not a genetic reduction - also the morpholino-treated Xenopus did?

      VEO PKD patients are characterized by a loss of function of PKD1 or PKD2 and – as we propose in this manuscript - that BICC1 further aggravates the phenotype. Yet, we do not address either in the mouse or Xenopus experiments whether BICC1 is a genetic modifier. We are simply addressing whether the two genes show a genetic interaction. In the mouse studies, we eliminate one copy of Pkd1 or Pkd2 in the background of a hypomorphic allele of Bicc1. Similarly, in the Xenopus experiments, we employ suboptimal doses of the morpholino oligomers, i.e., concentrations that did not yield a phenotypic change and then asked whether removing both together show cooperativity. It is important to state that this is based on a biological readout and not defined based on the amount of protein. While we have described this already in the original manuscript (page 7, first paragraph), we have amended our description of the Xenopus experiment to make this even clearer. 

      Finally, we agree with the reviewer that if we were to address whether Bicc1 is a modifier of the PKD phenotype in mouse, we would need to reduce Bicc1 function in a Pkd1 or Pkd2 mutants. Yet, we have recognized this already in the initial version of the manuscript in the discussion (page 14, first paragraph).

      We have expanded the results section when discussing the suboptimal amounts of the morpholino oligos (Page 6, 1<sup>st</sup> paragraph).

      (11) Unclear: "While variants in BICC1 are very rare, we could identify two patients with BICC1 variants harboring an additional PKD2 or PKD1 variant in trans, respectively." Shortly after, the authors state in apparent contradiction that "the patients had no other variants in any of other PKD genes or genes which phenocopy PKD including PKD1, PKD2, PKHD1, HNF1s, GANAB, IFT140, DZIP1L, CYS1, DNAJB11, ALG5, ALG8, ALG9, LRP5, NEK8, OFD1, or PMM2."

      The reviewer is correct. This should have been phrased differently. We have now added “Besides the variants reported below” to clarify this more adequately.

      The sentence was changed to start with “Besides the variants reported below, […].”

      (12) "The demonstrated interaction of BICC1, PC1, and PC2 now provides a molecular mechanism that can explain some of the phenotypic variability in these families." How do the authors reconcile this statement with their reported ultra-rare occurrence of the BICC1 mutations?

      As mentioned in the manuscript and also in response to the other two reviewers, Bicc1 has been shown to regulate Pkd2 gene expression in mice and frogs via an interaction with the miR-17 family of microRNAs. Moreover, the miR-17 family has been demonstrated to be critical in PKD (PMID: 30760828, PMID: 35965273, PMID: 31515477, PMID: 30760828). In fact, both other reviewers have pointed out that we should stress this more since Bicc1 is part of this regulatory pathway. Future experiments are needed to address whether Bicc1 contributes to the variability in ADPKD onset/severity. Yet, this is beyond the scope of this study. 

      Based on the comments of the two other reviewers we have further addressed the Bicc1/miR-17 interaction.

      (13) The manuscript should use correct genetic conventions of italicization and capitalization. This is an issue affecting the entire manuscript. Some exemplary instances are listed below.

      (a) "We also demonstrate that Pkd1 and Pkd2 modifies the cystic phenotype in Bicc1 mice in a dose-dependent manner and that Bicc1 functionally interacts with Pkd1, Pkd2 and Pkhd1 in the pronephros of Xenopus embryos." Genes? Proteins?

      The data presented in this section show that a hypomorphic allele of Bicc1 in mouse and a knockdown in Xenopus yields this. As both affect the proteins, the spelling should reflect the proteins.

      No changes have been made in the revised manuscript.

      (b) The sentence seems to use both the human and mouse genetic capitalization, although it refers to experiments in the mouse system “to define the Bicc1 interacting domains for PC2 (Fig. 2d,e). Full-length PC2 (PC2-HA) interacted with full-length myc-mBICC1.”

      We agree with the review that stating the species of the molecules used is critical, we have adapted a spelling of Bicc1, where BICC1 is the human homologue, mBicc1 is the mouse homologue and xBicc1 the Xenopus one.

      We have highlighted the species spelling in the methods section and labeled the species accordingly throughout the manuscript and figures. 

      (14) “Together these data supported our biochemical interaction data and demonstrated that BICC1 cooperated with PKD1 and PKD2.” Are the authors implying that these results in mice will translate to the human protein?

      We agree that we have not formally shown that the same applies to the human proteins. Thus, we have changed the spelling accordingly.

      We have revised the capitalization of the proteins. 

      (15) The text is often unclear, terse, or inconsistent.

      (a) “These results suggested that the interaction between PC1 and Bicc1 involves the SAM but not the KH/KHL domains (or the first 132 amino acids of Bicc1). It also suggests that the N-terminus could have an inhibitory effect on PC1-BICC1 association.” How do the authors define the N-terminus? The first 132 aa? KH/KHL domains?

      This was illustrated in the original Figure 2A. The DKH constructs lack the first 351 amino acids. 

      To make this more evident, we have specified this in the text as well.

      (b) Similarly, the authors state below, "Unlike PC1, PC2 interacted with mycmBICC1ΔSAM, but not myc-mBICC1-ΔKH suggesting that PC2 binding is dependent on the N-terminal domains but not the SAM domain." It is unclear if the authors refer to the KH/KHL domains or others. Whatever the reference to the N-terminal region, it should also be consistent with the section above.

      This is now specified in the text.

      (c) Unclear: "We have previously demonstrated that Pkd2 levels are reduced in a complete Bicc1 null mice,22 performing qRT-PCR of P4 kidneys (i.e. before the onset of a strong cystic phenotype), revealed that Bicc1, Pkd1 and Pkd2 were statistically significantly down9 regulated (Fig. 4h-j)".

      We have changed the text to clarify this. 

      (d) “Utilizing recombinant GST domains of PC1 and PC2, we demonstrated that BICC1 binds to both proteins in GST-pulldown assays (Fig. 1a, b)." GST-tagged domains? Fusions?

      We have changed the text to clarify this. 

      (e) "To study the interaction between BICC1, PKD1 and PKD2 we combined biochemical approaches, knockout studies in mice and Xenopus, genetic engineered human kidney cells" > genetically engineered.

      We have changed the text to clarify this.

      (f) Capitalization (e.g., see Figure S3, ref. the Bpk allele) and annotation (e.g., Gly821Glu and G821E) are inconsistent.

      We have homogenized the labeling of the capitalization and annotations throughout the manuscript. 

      (g) What do the authors mean by "homozygous evolutionarily well-conserved missense variant"?

      We have changed this is the revised version of the manuscript. 

      Reviewer #3 (Public review/Recommendations to the authors):

      (1) A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      (2) This study should ideally include experiments in HUREC material obtained from patients/families with BICC1 mutations and studying its effects on the PKD1/2 complex in primary cell lines.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected once the two patients with the BICC1 p.Ser240Pro variant passed away.

      No changes to the revised manuscript have been made to address this point.

      (3) Please remove repeated words in the following sentence in paragraph 2 of the introduction: "BICC1 encodes an evolutionarily conserved protein that is characterized by 3 K-homology (KH) and 2 KH-like (KHL) RNA-binding domains at the N-terminus and a SAM domain at the C-terminus, which are separated by a by a disordered intervening sequence (IVS).23-28".

      This has been changed.

    1. Author response:

      Reviewer #1 (Public review):

      The authors analysed large-scale brain-state dynamics while humans watched a short video. They sought to identify the role of thalamocortical interactions.

      Major concerns

      (1) Rationale for using the naturalistic stimulus

      In terms of brain state dynamics, previous studies have already reported large-scale neural dynamics by applying some data-driven analyses, like energy landscape analysis and Hidden Markov Model, to human fMRI/EEG data recorded during resting/task states. Considering such prior work, it'd be critical to provide sufficient biological rationales to perform a conceptually similar study in a naturalistic condition, i.e., not just "because no previous work has been done". The authors would have to clarify what type of neural mechanisms could be missed in conventional resting-state studies using, say, energy landscape analysis, but could be revealed in the naturalistic condition.

      We appreciate your insightful comments regarding the need for a biological rationale in our study. As you mentioned, there are similar studies, just like Meer et al. utilized Hidden Markov Models to identify various activation modes of brain networks that included subcortical regions[1], Song et al. linked brain states to narrative understandings and attentional dynamics[2, 3]. These studies could answer why we use naturalistic stimuli datasets. Moreover, there is evidence suggesting that the thalamus plays a crucial role in processing information in a more naturalistic context while pointing out the vital role in thalamocortical communications[4, 5]. So, we tended to bridge thalamic activity and cortical state transition using the energy landscape description.

      To address these gaps in conventional resting-state studies, we explored an alternative method—maximum entropy modeling based on the energy landscape. This allowed us to validate how the thalamus responds to cortical state transitions. To enhance clarity, we will update our introduction to emphasize the motivations behind our research and the significance of examining these neural mechanisms in a naturalistic setting.

      (2) Effects of the uniqueness of the visual stimulus and reproducibility

      One of the main drawbacks of the naturalistic condition is the unexpected effects of the stimuli. That is, this study looked into the data recorded from participants who were watching Sherlock, but what would happen to the results if we analyzed the brain activity data obtained from individuals who were watching different movies? To ensure the generalizability of the current findings, it would be necessary to demonstrate qualitative reproducibility of the current observations by analysing different datasets that employed different movie stimuli. In fact, it'd be possible to find such open datasets, like www.nature.com/articles/s41597-023-02458-8.

      We appreciate your concern regarding the reproducibility of our findings. The dataset from the "Sherlock" study is of high quality and has shown good generalizability in various research contexts. We acknowledge the importance of validating our results with different datasets to enhance the robustness of our conclusions. While we are open to exploring additional datasets, we intend to pursue this validation once we identify a suitable alternative. Currently, we are considering a comparison with the dataset from "Forrest Gump" as part of our initial plan.

      (3) Spatial accuracy of the "Thalamic circuit" definition

      One of the main claims of this study heavily relies on the accuracy of the localization of two different thalamic architectures: matrix and core. Given the conventional or relatively low spatial resolution of the fMRI data acquisition (3x3x3 mm^3), it appears to be critically essential to demonstrate that the current analysis accurately distinguished fMRI signals between the matrix and core parts of the thalamus for each individual.

      We acknowledge the importance of accurately localizing the different thalamic architectures, specifically the matrix and core regions. To address this, we downsampled the atlas of matrix and core cell populations from the previous study from a resolution of 2x2x2 mm<sup>3</sup> to 3x3x3 mm<sup>3</sup>, which aligns with our fMRI data acquisition. We would report the atlas as Supplementary Figures in our revision.

      (4) More detailed analysis of the thalamic circuits

      In addition, if such thalamic localisation is accurate enough, it would be greatly appreciated if the authors perform similar comparisons not only between the matrix and core architectures but also between different nuclei. For example, anterior, medial, and lateral groups (e.g., pulvinar group). Such an investigation would meet the expectations of readers who presume some microscopic circuit-level findings.

      We appreciate your suggestion regarding a more detailed analysis of thalamic circuits. We have touched upon this in the discussion section as a forward-looking consideration. However, we believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. That said, we are interested in exploring these nuclei-pathway connections to cortical areas in future studies with a proper 7T fMRI naturalistic dataset.

      (5) Rationale for different time window lengths

      The authors adopted two different time window lengths to examine the neural dynamics. First, they used a 21-TR window for signal normalisation. Then, they narrowed down the window length to 13-TR periods for the following statistical evaluation. Such a seemingly arbitrary choice of the shorter time window might be misunderstood as a measure to relax the threshold for the correction of multiple comparisons. Therefore, it'd be appreciated if the authors stuck to the original 21-TR time window and performed statistical evaluations based on the setting.

      Thank you for your valuable feedback regarding the choice of time window lengths. We aimed to maintain consistency in window lengths across our analyses. In light of your comments and suggestions from other reviewers, we plan to test our results using different time window lengths and report findings that generalize across these variations. Should the results differ significantly, we will discuss the implications of this variability in our revised manuscript.

      (6) Temporal resolution

      After identifying brain states with energy landscape analysis, this study investigated the brain state transitions by directly looking into the fMRI signal changes. This manner seems to implicitly assume that no significant state changes happen in one TR (=1.5sec), which needs sufficient validation. Otherwise, like previous studies, it'd be highly recommended to conduct different analyses (e.g., random-walk simulation) to address and circumvent this problem.

      Thank you for raising this important point regarding temporal resolution. Many fMRI studies, such as those examining event boundaries during movie watching, operate under similar assumptions concerning state changes within one TR. For example, Barnett et al. processed the dynamic functional connectivity (dFC) with a window of 20 TRs (24.4s). So, we do not think it is a limitation but is a common question related to fMRI scanning parameters. To strengthen our analysis of state transitions and ensure they are not merely coincidental, we plan to conduct random-walk simulations, as suggested, to validate our findings in accordance with methodologies used in previous research.

      Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. investigated cortical network dynamics during movie watching using an energy landscape analysis based on a maximum entropy model. They identified perception- and attention-oriented states as the dominant cortical states during movie watching and found that transitions between these states were associated with inter-subject synchronization of regional brain activity. They also showed that distinct thalamic compartments modulated distinct state transitions. They concluded that cortico-thalamo-cortical circuits are key regulators of cortical network dynamics.

      Strengths:

      A mechanistic understanding of cortical network dynamics is an important topic in both experimental and computational neuroscience, and this study represents a step forward in this direction by identifying key cortico-thalamo-cortical circuits. The analytical strategy employed in this study, particularly the LASSO-based analysis, is interesting and would be applicable to other data types, such as task- and resting-state fMRI.

      We thanks for this comment and encouragement.

      Weaknesses:

      Due to issues related to data preprocessing, support for the conclusions remains incomplete. I also believe that a more careful interpretation of the "energy" derived from the maximum entropy model would greatly clarify what the analysis actually revealed.

      Thank you for your valuable suggestions, and we apologize for any misunderstandings regarding the interpretation of the energy landscape in our study. To address this issue, we will include a dedicated paragraph in both the methods and results sections to clarify our use of the term "energy" derived from the maximum entropy model. This addition aims to eliminate any ambiguity and provide a clearer understanding of what our analysis reveals.

      (1) I think the method used for binarization of BOLD activity is problematic in multiple ways.

      a) Although the authors appear to avoid using global signal regression (page 4, lines 114-118), the proposed method effectively removes the global signal. According to the description on page 4, lines 117-122, the authors binarized network-wise ROI signals by comparing them with the cross-network BOLD signal (i.e., the global signal): at each time point, network-wise ROI signals above the cross-network signal were set to 1, and the rest were set to −1. If I understand the binarization procedure correctly, this approach forces the cross-network signal to be zero (up to some noise introduced by the binarization of network-wise signals), which is essentially equivalent to removing the global signal. Please clarify what the authors meant by stating that "this approach maintained a diverse range of binarized cortical states in data where the global signal was preserved" (page 4, lines 121-122).

      Thank you for highlighting the potential issue with our binarization method. We appreciate your insights regarding the comparison of network-wise ROI signals with the cross-network BOLD signal, as this may inadvertently remove the global signal. To address this, we will conduct a comparative analysis of results obtained from both our current approach and the original pipeline. If we decide to retain our current method, we will carefully reconsider the rationale and rephrase our descriptions to ensure clarity regarding the preservation of the global signal and the diversity of binarized cortical states.

      b) The authors might argue that they maintained a diverse range of cortical states by performing the binarization at each time point (rather than within each network). However, I believe this introduces another problem, because binarizing network-wise signals at each time point distorts the distribution of cortical states. For example, because the cross-network signal is effectively set to zero, the network cannot take certain states, such as all +1 or all −1. Similarly, this binarization biases the system toward states with similar numbers of +1s and −1s, rather than toward unbalanced states such as (+1, −1, −1, −1, −1, −1). These constraints and biases are not biological in origin but are simply artifacts of the binarization procedure. Importantly, the energy landscape and its derivatives (e.g., hard/easy transitions) are likely to be affected by these artifacts. I suggest that the authors try a more conventional binarization procedure (i.e., binarization within each network), which is more robust to such artifacts.

      Related to this point, I have a question regarding Figure S1, in which the authors plotted predicted versus empirical state probabilities. As argued above, some empirical state probabilities should be zero because of the binarization procedure. However, in Figure S1, I do not see data points corresponding to these states (i.e., there should be points on the y-axis). Did the authors plot only a subset of states in Figure S1? I believe that all states should be included. The correlation coefficient between empirical and predicted probabilities (and the accuracy) should also be calculated using all states.

      Thank you for your thoughtful examination of our data processing pipeline. We agree that a comparison between the conventional binarization method and our current approach is warranted, and we appreciate your suggestion. Upon reviewing Figure S1, we discovered that there was indeed an error related to the plotting style set to "log10." As you correctly pointed out, the data should reflect that the probabilities for states where all networks are either activated or deactivated are zero. We are very interested in exploring the state distributions obtained from both the original and current approaches, as your comments highlight important considerations. We sincerely appreciate your insightful feedback and will make sure to address these points thoroughly in our first revision.

      c) The current binarization procedure likely inflates non-neuronal noise and obscures the relationship between the true BOLD signal and its binarized representation. For example, consider two ROIs (A and B): both (+2%, +1%) and (+0.01%, −0.01%) in BOLD signal changes would be mapped to (+1, −1) after binarization. This suggests that qualitatively different signal magnitudes are treated identically. I believe that this issue could be alleviated if the authors were to binarize the signal within each network, rather than at each time point.

      Thank you for your important observation regarding the potential inflation of non-neuronal noise in our current binarization procedure. We recognize that this process could lead to qualitatively different signal magnitudes being treated similarly after binarization, as you illustrated with your example. While we acknowledge your point, we believe that conventional binarization pipelines may also encounter this issue, albeit by comparing signals to a network's temporal mean activity. To address this concern and maintain consistency with previous studies, we will discuss this limitation in our revised manuscript. Additionally, if deemed necessary, we will explore implementing a percentile-based threshold above the baseline to further refine our binarization approach. Your suggestion provides a valuable perspective, and we appreciate your insights.

      (2) As the authors state (page 5, lines 145-148), the "energy" described in the energy landscape is not biological energy but rather a statistical transformation of probability distributions derived from the Boltzmann distribution. If this is the case, I believe that Figure 2A is potentially misleading and should be removed. This type of schematic may give the false impression that cortical state dynamics are governed by the energy landscape derived from the maximum entropy model (which is not validated).

      Thank you for your valuable feedback regarding Figure 2A. We apologize for any confusion it may have created. While we recognize that similar figures are commonly used in literature involving energy landscapes (maximum entropy model), we agree that Figure 2A may mislead readers into thinking that cortical state dynamics are directly governed by the energy landscape derived from the maximum entropy model, which has not been validated. In light of your comments, we will remove Figure 2A and instead emphasize the analytical strategy presented in Figure 2B. Additionally, we will provide a simplified line graph as an illustrative example to clarify the concepts without the potential for misinterpretation.

      Reviewer #3 (Public review):

      Summary:

      In this study, Liu et al. analyze fMRI data collected during movie watching, applied an energy landscape method with pairwise maximum entropy models. They identify a set of brain states defined at the level of canonical functional networks and quantify how the brain transitions between these states. Transitions are classified as "easy" or "hard" based on changes in the inferred energy landscape, and the authors relate transition probabilities to inter-subject correlation. A major emphasis of the work is the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions.

      Strengths:

      The study is methodologically complex and technically sophisticated. It integrates advanced analytical methods into high-dimensional fMRI data. The application of energy landscape analysis to movie-watching data appears to be novel as well. The finding on the thalamus involved energy state transition and provides a strong linkage to several theories on thalamic control functions, which is a notable strength.

      Thanks for your comments on the novelty of our study.

      Weaknesses:

      The main weakness is the conceptual clarity and advances that this otherwise sophisticated set of analyses affords. A central conceptual ambiguity concerns the energy landscape framework itself. The authors note that the "energy" in this model is not biological energy but a statistical quantity derived from the Boltzmann distribution. After multiple reads, I still have major trouble mapping this measure onto any biological and cognitive operations. BOLD signal is a measure of oxygenation as a proxy of neural activity, and correlated BOLD (functional connectivity) is thought to measure the architecture of information communication of brain systems. The energy framework described in the current format is very difficult for most readers to map onto any neural or cognitive knowledge base on the structure and function of brain systems. Readers unfamiliar with maximum entropy models may easily misinterpret energy changes as reflecting metabolic cost, neural effort, or physiological variables, and it is just very unclear what that measure is supposed to reflect. The manuscript does not clearly articulate what conceptual and mechanistic advances the energy formalism provides beyond a mathematical and statistical report. In other words, beyond mathematical description, it is very hard for most readers to understand the process and function of what this framework is supposed to tell us in regards to functional connectivity, brain systems, and cognition. The brain is not a mathematical object; it is a biological organ with cognitive functions. The impact of this paper is severely limited until connections can be made.

      Thank you for your insightful and constructive comments regarding the conceptual clarity of our energy landscape framework. We appreciate your perspective on the challenges of mapping the statistical measure of "energy" derived from the Boltzmann distribution onto biological and cognitive operations. To address these concerns, we will revise our manuscript to clarify our expressions surrounding "energy" and emphasize its probabilistic nature. Additionally, we will incorporate a series of analyses that explicitly relate the features of the energy landscape to cognitive processes and key parameters, such as brain integration and functional connectivity. We believe these changes will help bridge the gap between our mathematical framework and its relevance to understanding brain systems and cognitive functions.

      Relatedly, the use of metaphors such as "valleys," "hills," and "routes" in multidimensional measures lacks grounding. Valleys and hills of what is not intuitive to understand. Based on my reading, these features correspond to local minima and barriers in a probability distribution over binarized network activation patterns, but similar to the first point, the manuscript does not clearly explain what it means conceptually, neurobiologically, or computationally for the brain to "move" through such a landscape. The brain is not computing these probabilities; they are measurement tools of "something". What is it? To advance beyond mathematical description, these measurements must be mapped onto neurobiological and cognitive information.

      Thank you for your valuable feedback. In our revisions, we would aim to link the concept of rapid transition routes in the energy landscape to cognitive processes, such as narrative understanding and related features. By exploring these connections, we hope to provide a clearer context for how our framework can enhance understanding of cognitive functions and their neural correlates.

      This conceptual ambiguity goes back to the Introduction. At the level of motivation, the purpose and deliverables of the study are not defined in the Introduction. The stated goal is "Transitions between distinct cortical brain states modulate the degree of shared neural processing under naturalistic conditions". I do not know if readers will have a clear answer to this question at the end. Is the claim that state transitions cause changes in inter-subject correlation, that they index moments of narrative alignment, or that they reflect changes in attentional or cognitive mode? This level of explanation is largely dissociated from the methods in their current form.

      Thank you for highlighting this important point regarding the conceptual clarity in our Introduction. We appreciate your feedback about the motivation and objectives of the study. To clarify the stated goal of investigating how transitions between distinct cortical brain states modulate shared neural processing under naturalistic conditions, we will revise the manuscript to explicitly define the specific claims we aim to address. We will ensure that these explanations are closely tied to the methods employed in our study, providing a clearer framework for our readers.

      Several methodological choices can use clarification. The use of a 21-TR window centered on transition offsets is unusually long relative to the temporal scale of fMRI dynamics and to the hypothesized rapidity of state transitions. On a related note, what is the temporal scale of state transition? Is it faster than 21 TRs?

      Thank you for your insightful questions regarding our methodological choices. Our focus on specific state transitions necessitated the use of a 21-TR window. While it’s true that other transitions may occur within this window, averaging across the same transitions at different times allows us to identify distinctive thalamic BOLD patterns that precede cortical state transitions. This methodology enables us to capture relevant dynamics while ensuring that we focus on the transitions of interest. We appreciate your feedback, and this clarification will be included in our revised manuscript. We would also add a figure that describe the dwell time of cortical states.

      The choice of movie-watching data is a strength. But, many of the analyses performed here, energy landscape estimation, clustering of states, could in principle be applied to resting-state data. The manuscript does not clearly articulate what is gained, mechanistically or cognitively, by using movie stimuli beyond the availability of inter-subject correlation.

      Thank you for your question, which closely aligns with a concern raised by Reviewer #1. Our core hypothesis posits that naturalistic stimuli yield a broader set of brain states compared to those observed during resting-state conditions. To support this assertion, we will clearly articulate the findings from previous studies that relate to this hypothesis. Additionally, if appropriate, we will provide a comparative analysis between our data and resting-state data to highlight the differences and emphasize the uniqueness of the brain states elicited by naturalistic stimuli.

      Because of the above issues, a broader concern throughout the results is the largely descriptive nature of the findings. For example, the LASSO analysis shows that certain state transitions predict ISC in a subset of regions, with respectable R² values. While statistically robust, the manuscript provides little beyond why these particular transitions should matter, what computations they might reflect, or how they relate to known cognitive operations during movie watching. Similar issues arise in the clustering analyses. Clustering high-dimensional fMRI-derived features will almost inevitably produce structure, whether during rest, task, or naturalistic viewing. What is missing is an explanation of why these specific clusters are meaningful in functional or mechanistic terms.

      Thank you for your questions. In our revisions, we will perform additional analyses aimed at linking state transitions to cognitive processes more explicitly. Regarding clustering, we will provide a thorough discussion in the revised manuscript.

      Finally, the treatment of the thalamus, while very exciting, could use a bit more anatomical and circuit-level specificity. The manuscript largely treats the thalamus as a unitary structure, despite decades of work demonstrating big functional and connectivity differences across thalamic nuclei. A whole-thalamus analysis without more detailed resolution is increasingly difficult to justify. The subsequent subdivision into PVALB- and CALB-associated regions partially addresses this, but these markers span multiple nuclei with overlapping projection patterns.

      This suggestion aligns with the feedback from Reviewer #1. We believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. Therefore, investigating core and matrix cell projections across different thalamic nuclei using 7T fMRI presents a promising avenue for further study.

      (1) Van Der Meer J N, Breakspear M, Chang L J, et al. Movie viewing elicits rich and reliable brain state dynamics [J]. Nature Communications, 2020, 11(1): 5004.

      (2) Song H, Park B Y, Park H, et al. Cognitive and Neural State Dynamics of Narrative Comprehension [J]. Journal of Neuroscience, 2021, 41(43): 8972-8990.

      (3) Song H, Shim W M, Rosenberg M D. Large-scale neural dynamics in a shared low-dimensional state space reflect cognitive and attentional dynamics [J]. Elife, 2023, 12.

      (4) Shine J M, Lewis L D, Garrett D D, et al. The impact of the human thalamus on brain-wide information processing [J]. Nature Reviews Neuroscience, 2023, 24(7): 416-430.

      (5) Yang M Y, Keller D, Dobolyi A, et al. The lateral thalamus: a bridge between multisensory processing and naturalistic behaviors [J]. Trends in Neurosciences, 2025, 48(1): 33-46.

    2. Reviewer #3 (Public review):

      Summary:

      In this study, Liu et al. analyze fMRI data collected during movie watching, applied an energy landscape method with pairwise maximum entropy models. They identify a set of brain states defined at the level of canonical functional networks and quantify how the brain transitions between these states. Transitions are classified as "easy" or "hard" based on changes in the inferred energy landscape, and the authors relate transition probabilities to inter-subject correlation. A major emphasis of the work is the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions.

      Strengths:

      The study is methodologically complex and technically sophisticated. It integrates advanced analytical methods into high-dimensional fMRI data. The application of energy landscape analysis to movie-watching data appears to be novel as well. The finding on the thalamus involved energy state transition and provides a strong linkage to several theories on thalamic control functions, which is a notable strength.

      Weaknesses:

      The main weakness is the conceptual clarity and advances that this otherwise sophisticated set of analyses affords. A central conceptual ambiguity concerns the energy landscape framework itself. The authors note that the "energy" in this model is not biological energy but a statistical quantity derived from the Boltzmann distribution. After multiple reads, I still have major trouble mapping this measure onto any biological and cognitive operations. BOLD signal is a measure of oxygenation as a proxy of neural activity, and correlated BOLD (functional connectivity) is thought to measure the architecture of information communication of brain systems. The energy framework described in the current format is very difficult for most readers to map onto any neural or cognitive knowledge base on the structure and function of brain systems. Readers unfamiliar with maximum entropy models may easily misinterpret energy changes as reflecting metabolic cost, neural effort, or physiological variables, and it is just very unclear what that measure is supposed to reflect. The manuscript does not clearly articulate what conceptual and mechanistic advances the energy formalism provides beyond a mathematical and statistical report. In other words, beyond mathematical description, it is very hard for most readers to understand the process and function of what this framework is supposed to tell us in regards to functional connectivity, brain systems, and cognition. The brain is not a mathematical object; it is a biological organ with cognitive functions. The impact of this paper is severely limited until connections can be made.

      Relatedly, the use of metaphors such as "valleys," "hills," and "routes" in multidimensional measures lacks grounding. Valleys and hills of what is not intuitive to understand. Based on my reading, these features correspond to local minima and barriers in a probability distribution over binarized network activation patterns, but similar to the first point, the manuscript does not clearly explain what it means conceptually, neurobiologically, or computationally for the brain to "move" through such a landscape. The brain is not computing these probabilities; they are measurement tools of "something". What is it? To advance beyond mathematical description, these measurements must be mapped onto neurobiological and cognitive information.

      This conceptual ambiguity goes back to the Introduction. At the level of motivation, the purpose and deliverables of the study are not defined in the Introduction. The stated goal is "Transitions between distinct cortical brain states modulate the degree of shared neural processing under naturalistic conditions". I do not know if readers will have a clear answer to this question at the end. Is the claim that state transitions cause changes in inter-subject correlation, that they index moments of narrative alignment, or that they reflect changes in attentional or cognitive mode? This level of explanation is largely dissociated from the methods in their current form.

      Several methodological choices can use clarification. The use of a 21-TR window centered on transition offsets is unusually long relative to the temporal scale of fMRI dynamics and to the hypothesized rapidity of state transitions. On a related note, what is the temporal scale of state transition? Is it faster than 21 TRs?

      The choice of movie-watching data is a strength. But, many of the analyses performed here, energy landscape estimation, clustering of states, could in principle be applied to resting-state data. The manuscript does not clearly articulate what is gained, mechanistically or cognitively, by using movie stimuli beyond the availability of inter-subject correlation.

      Because of the above issues, a broader concern throughout the results is the largely descriptive nature of the findings. For example, the LASSO analysis shows that certain state transitions predict ISC in a subset of regions, with respectable R² values. While statistically robust, the manuscript provides little beyond why these particular transitions should matter, what computations they might reflect, or how they relate to known cognitive operations during movie watching. Similar issues arise in the clustering analyses. Clustering high-dimensional fMRI-derived features will almost inevitably produce structure, whether during rest, task, or naturalistic viewing. What is missing is an explanation of why these specific clusters are meaningful in functional or mechanistic terms.

      Finally, the treatment of the thalamus, while very exciting, could use a bit more anatomical and circuit-level specificity. The manuscript largely treats the thalamus as a unitary structure, despite decades of work demonstrating big functional and connectivity differences across thalamic nuclei. A whole-thalamus analysis without more detailed resolution is increasingly difficult to justify. The subsequent subdivision into PVALB- and CALB-associated regions partially addresses this, but these markers span multiple nuclei with overlapping projection patterns.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. investigated cortical network dynamics during movie watching using an energy landscape analysis based on a maximum entropy model. They identified perception- and attention-oriented states as the dominant cortical states during movie watching and found that transitions between these states were associated with inter-subject synchronization of regional brain activity. They also showed that distinct thalamic compartments modulated distinct state transitions. They concluded that cortico-thalamo-cortical circuits are key regulators of cortical network dynamics.

      Strengths:

      A mechanistic understanding of cortical network dynamics is an important topic in both experimental and computational neuroscience, and this study represents a step forward in this direction by identifying key cortico-thalamo-cortical circuits. The analytical strategy employed in this study, particularly the LASSO-based analysis, is interesting and would be applicable to other data types, such as task- and resting-state fMRI.

      Weaknesses:

      Due to issues related to data preprocessing, support for the conclusions remains incomplete. I also believe that a more careful interpretation of the "energy" derived from the maximum entropy model would greatly clarify what the analysis actually revealed.

      (1) Major Comment 1:

      I think the method used for binarization of BOLD activity is problematic in multiple ways.

      a) Although the authors appear to avoid using global signal regression (page 4, lines 114-118), the proposed method effectively removes the global signal. According to the description on page 4, lines 117-122, the authors binarized network-wise ROI signals by comparing them with the cross-network BOLD signal (i.e., the global signal): at each time point, network-wise ROI signals above the cross-network signal were set to 1, and the rest were set to −1. If I understand the binarization procedure correctly, this approach forces the cross-network signal to be zero (up to some noise introduced by the binarization of network-wise signals), which is essentially equivalent to removing the global signal. Please clarify what the authors meant by stating that "this approach maintained a diverse range of binarized cortical states in data where the global signal was preserved" (page 4, lines 121-122).

      b) The authors might argue that they maintained a diverse range of cortical states by performing the binarization at each time point (rather than within each network). However, I believe this introduces another problem, because binarizing network-wise signals at each time point distorts the distribution of cortical states. For example, because the cross-network signal is effectively set to zero, the network cannot take certain states, such as all +1 or all −1. Similarly, this binarization biases the system toward states with similar numbers of +1s and −1s, rather than toward unbalanced states such as (+1, −1, −1, −1, −1, −1). These constraints and biases are not biological in origin but are simply artifacts of the binarization procedure. Importantly, the energy landscape and its derivatives (e.g., hard/easy transitions) are likely to be affected by these artifacts. I suggest that the authors try a more conventional binarization procedure (i.e., binarization within each network), which is more robust to such artifacts.

      Related to this point, I have a question regarding Figure S1, in which the authors plotted predicted versus empirical state probabilities. As argued above, some empirical state probabilities should be zero because of the binarization procedure. However, in Figure S1, I do not see data points corresponding to these states (i.e., there should be points on the y-axis). Did the authors plot only a subset of states in Figure S1? I believe that all states should be included. The correlation coefficient between empirical and predicted probabilities (and the accuracy) should also be calculated using all states.

      c) The current binarization procedure likely inflates non-neuronal noise and obscures the relationship between the true BOLD signal and its binarized representation. For example, consider two ROIs (A and B): both (+2%, +1%) and (+0.01%, −0.01%) in BOLD signal changes would be mapped to (+1, −1) after binarization. This suggests that qualitatively different signal magnitudes are treated identically. I believe that this issue could be alleviated if the authors were to binarize the signal within each network, rather than at each time point.

      (2) Major Comment 2:

      As the authors state (page 5, lines 145-148), the "energy" described in the energy landscape is not biological energy but rather a statistical transformation of probability distributions derived from the Boltzmann distribution. If this is the case, I believe that Figure 2A is potentially misleading and should be removed. This type of schematic may give the false impression that cortical state dynamics are governed by the energy landscape derived from the maximum entropy model (which is not validated).

    4. Reviewer #1 (Public review):

      The authors analysed large-scale brain-state dynamics while humans watched a short video. They sought to identify the role of thalamocortical interactions.

      Major concerns

      (1) Rationale for using the naturalistic stimulus

      In terms of brain state dynamics, previous studies have already reported large-scale neural dynamics by applying some data-driven analyses, like energy landscape analysis and Hidden Markov Model, to human fMRI/EEG data recorded during resting/task states. Considering such prior work, it'd be critical to provide sufficient biological rationales to perform a conceptually similar study in a naturalistic condition, i.e., not just "because no previous work has been done". The authors would have to clarify what type of neural mechanisms could be missed in conventional resting-state studies using, say, energy landscape analysis, but could be revealed in the naturalistic condition.

      (2) Effects of the uniqueness of the visual stimulus and reproducibility

      One of the main drawbacks of the naturalistic condition is the unexpected effects of the stimuli. That is, this study looked into the data recorded from participants who were watching Sherlock, but what would happen to the results if we analyzed the brain activity data obtained from individuals who were watching different movies? To ensure the generalizability of the current findings, it would be necessary to demonstrate qualitative reproducibility of the current observations by analysing different datasets that employed different movie stimuli. In fact, it'd be possible to find such open datasets, like www.nature.com/articles/s41597-023-02458-8.

      (3) Spatial accuracy of the "Thalamic circuit" definition

      One of the main claims of this study heavily relies on the accuracy of the localization of two different thalamic architectures: matrix and core. Given the conventional or relatively low spatial resolution of the fMRI data acquisition (3x3x3 mm^3), it appears to be critically essential to demonstrate that the current analysis accurately distinguished fMRI signals between the matrix and core parts of the thalamus for each individual.

      (4) More detailed analysis of the thalamic circuits

      In addition, if such thalamic localisation is accurate enough, it would be greatly appreciated if the authors perform similar comparisons not only between the matrix and core architectures but also between different nuclei. For example, anterior, medial, and lateral groups (e.g., pulvinar group). Such an investigation would meet the expectations of readers who presume some microscopic circuit-level findings.

      (5) Rationale for different time window lengths

      The authors adopted two different time window lengths to examine the neural dynamics. First, they used a 21-TR window for signal normalisation. Then, they narrowed down the window length to 13-TR periods for the following statistical evaluation. Such a seemingly arbitrary choice of the shorter time window might be misunderstood as a measure to relax the threshold for the correction of multiple comparisons. Therefore, it'd be appreciated if the authors stuck to the original 21-TR time window and performed statistical evaluations based on the setting.

      (6) Temporal resolution

      After identifying brain states with energy landscape analysis, this study investigated the brain state transitions by directly looking into the fMRI signal changes. This manner seems to implicitly assume that no significant state changes happen in one TR (=1.5sec), which needs sufficient validation. Otherwise, like previous studies, it'd be highly recommended to conduct different analyses (e.g., random-walk simulation) to address and circumvent this problem.

    5. eLife Assessment

      This study investigated the dynamics of human cortical network activity with functional magnetic resonance imaging during movie watching and studied the modulation of these dynamics by subcortical areas using an energy landscape mapping method. The authors identified a set of brain states defined at the level of canonical functional networks, quantified how the brain transitions between these states, and related transition probabilities to inter-subject correlations in evoked brain activity. A major emphasis of the work concerns the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions. The analytical strategy developed in this study is applicable to other task- and resting-state fMRI data and would be useful for many researchers in the field; however, the evidence supporting the overall conclusions remains incomplete due to limitations associated with fMRI data preprocessing, analysis, and cross-validation.

    1. eLife Assessment

      This study investigates whether heavy metal stress can induce maize-like phenotypic and molecular responses in teosinte and whether these responses overlap with genomic regions implicated in domestication. By combining copper and cadmium treatments with quantitative phenotyping, gene-expression analyses, and expanded assessments of nucleotide diversity across a key chromosome 5 interval, the authors provide an integrated view of how abiotic stress responses intersect with domestication-related traits. The significance of the findings is valuable, as the work offers meaningful insights for the subfield of maize evolution and stress biology by extending heavy-metal response analyses to teosinte and linking them to domestication-associated loci, although the evolutionary implications remain indirect. The strength of evidence is solid, with appropriately designed and quantitatively supported experiments that broadly support the claims, but do not yet establish a causal or historical role for heavy metal stress in domestication.

    2. Reviewer #1 (Public review):

      In this study, Acosta-Bayona et al. investigate whether heavy metal (HM) stress can induce phenotypic and molecular responses in teosinte parviglumis that resemble traits associated with domestication, and whether genes within a domestication-linked region show patterns consistent with reduced genetic diversity and signatures of selection. The authors exposed both maize and teosinte parviglumis to a fixed dose of copper and cadmium, representing an essential and a non-essential element, respectively. They assessed shoot and root phenotypic traits at a defined developmental stage in plants exposed to HM stress versus control. They then integrated these phenotypic results with expanded analyses of genetic diversity across a broader chromosome 5 interval, which was previously associated with domestication-related traits. Overall, the revisions improve the clarity and the robustness of the analyses, as well as make the conclusions better aligned with the evidence.

      The revised manuscript is strengthened by several additions.

      (1) The authors broaden the genetic analysis beyond a small set of loci and evaluate nucleotide variability across several genes within the linked chromosome 5 interval, which improves the interpretation of diversity patterns and reduces concerns about a too narrow locus selection or regional linkage effects driving the conclusions.

      (2) The expression analyses are now presented with clearer methodological separation and stronger quantitative support. Now, tissue/developmental RT-PCR profiles are distinguished from real-time qPCR assays used to test HM-induced expression changes, with appropriate replication and statistical reporting.

      (3) The authors include a transcriptome-scale element by analyzing multiple published and publicly available HM-stress transcriptome datasets and reporting shared differentially expressed genes across studies, which supports the interpretation that the observed expression changes align with broader HM-responsive transcriptional programs.

      However, it remains challenging to distinguish which aspects of the HM responses observed here represent novel insight versus patterns already reported in maize HM-stress studies. In addition, the link between HM exposure and domestication history remains indirect: reduced diversity patterns and stress-responsive expression do not, on their own, demonstrate human-driven selection or a specific paleoenvironmental scenario, and alternative explanations related to general stress responses or regional evolutionary processes cannot be fully excluded.

    3. Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication. This includes heavy metal transporters which are unregulated during stress. To study that, authors compare the plant architecture of maize defective in ZmHMA1 and speculate on the association of heavy metals with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize is also valuable. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      Some conclusions are still speculative and future experiment could provide more clues about potential molecular mechanisms for the ideas proposed here.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1(Public review):

      In this study, Acosta-Bayona et al. aim to better understand how environmental conditions could have influenced specific gene functions that may have been selected for during the domestication of teosinte parviglumis into domesticated maize. The authors are particularly interested in identifying the initial phenotypic changes that led to the original divergence of these two subspecies. They selected heavy metal (HM) stress as the condition to investigate. While the justification for this choice remains speculative, paleoenvironmental data would add value; the authors hypothesize that volcanic activity near the region of origin could have played a role.

      The justification of choice to investigate the effects of heavy metal stress is not speculative. As mentioned now in the Abstract, the elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., Science 2009). Our aim was to test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte parviglumis to maize.

      (1) Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciated the depth and value of this comment.

      Maize phenotypic responses to sublethal concentrations to heavy metals – copper (Cu) and cadmium (Cd) in particular - are well characterized and published, and in agreement with our results. In the first section of the Results (pgs 7 and 8), we added pertinent references to clearly show which observations are already known. By contrast, teosinte parviglumis responses are in all cases novel. To our knowledge this is the first study that analyzed in detail the phenotypic response of teosinte to sublethal concentrations of heavy metals, specifically Cu and Cd. We have now emphasized the novelty of these observations (pg 8).

      To address the fact that we only focused on three known HM-related genes without discussing others in the statistically significant region identified via LOD score on chr.5, we have added a full section that reads as follows (pgs. 11 to 13 of the new version):

      “Large-scale genomic and transcriptomic comparisons indicate that many HM response genes were positively selected across the maize genome.

      To expand the results well beyond the analysis of the three genes previously described, we performed a detailed analysis of genetic diversity across the 11.47 Mb genomic region comprised between Z_mSKUs5_ and ZmHMA1. This additional analysis reveals general tendencies in the quantity and nature of loci that were affected by positive selection during the teosinte parviglumis to maize transition in a region identified via LOD score on chr.5. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). Two types of statistical tests (ANOVA and Wilcoxon) were applied to nucleotide variability comparisons using the entirety of each locus. The Benjamini-Hochber procedure allowed an estimation of the false discovery rate (FDR<0.05) to avoid type I errors (false positives). Although some individual loci appear as differently classified depending on the statistical test applied (22 out of 173 loci), the general differences in nucleotide variability are consistently maintained within the subregions described below. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. The first six loci are consecutively ordered in a 402 Kb subregion that includes ZmSKUs5. A second group of 13 consecutive loci expands over a 1.44 Mb subregion that contains NRAMP ALUMINUM TRANSPORTER1, also involved in HM response through uptake of divalent ions. A third group of 17 consecutive loci expands over 1.28 Mb; eleven contain genes encoding for uncharacterized proteins. The fourth group is composed of 57 consecutive loci expanding over 3.22 Mb and contains genes encoding for DEFECTIVE KERNEL55, AUXIN RESPONSE FACTOR16, and peroxydases involved in responses to oxydative stress. The fifth group contains 12 consecutive loci expanding over 713 Kb and contains ZmHMA1. An additional segment of approximately 1.17 Mb and containing 25 consecutive loci that were positively selected expands away from the ZmSKUs5-ZmHMA1 segment; it also contains several genes encoding for peroxydases. Although multiple loci include genes that could be involved in abiotic stress and oxidative responses, these results suggest that multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5 during the teosinte parviglumis to maize transition.

      To further analyze the possibility that HM response could have played a role in maize emergence and subsequent domestication, we analyzed large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress. Six available transcriptomes were selected for in-depth analysis because they presented a fold change strictly higher than 1, and their results were supported by false discovery rates (FDR<0.05). These six transcriptomes (Table S5) included HM response datasets corresponding to growth conditions that not only incorporated Cu, but also lead (Pb) and chromium (Cr) that were not included in the substrate of our experiments. Transcriptional profiles were obtained from roots of plants at different stages: maize seedlings (Shen et al., 2012; Gao et al., 2015; Zhang et al., 2024a), three week old plantlets (Yang et al., 2023), and plants at V2 stage (Zhang et al., 2024b; Fengxia et al., 2025). A total of 120 genes shared by all six transcriptomes were found to be differentially expressed under HM stress conditions (66 upegulated and 54 downregulated; Figure S3), including ZmSKUs5, ZmHMA1 and ZmHMA7; 52 of them (43.3%) are located in maize loci showing less than 70% of the nucleotide variability found in teosinte parviglumis, suggesting that they were affected by positive selection (Yamasaki et al., 2005; Supplementary File 7). Of 18 mapping in chr.5, twelve are within the 82 cM that fractionates into multiple QTLs under selection during the parviglumis to maize transition. Interestingly, five additional loci containing HM response genes completely lack SNPs within their total length in both parviglumis and maize, and 19 additional loci lack SNPs in at least one 30 Kb segment or their coding region (Supplementary File 7), suggesting the frequent presence of ultraconserved genomic regions in many loci containing HM response genes. When this same analysis was conducted in a set of loci comprising 63 genes previously identified as differentially expressed in response to abiotic stress not directly related to HM responses (hypoxia; nutritional deficiency; soil alkalinity; drought; soil salinity), 18 loci (28.6%) showed less than 70% of the nucleotide variability found in teosinte parviglumis. Only one of them maps in chr.5 and none contained segments or coding regions lacking SNPs in parviglumis or maize. These results suggest that in contrast to other types of abiotic stress response genes, loci comprising a large set of genes that unambiguously respond to HM stress caused by chemical elements of diverse nature were affected by positive selection during the parviglumis to maize transition, irrespectively of their position in the genome.”

      The detailed analysis of genetic diversity across 11.47 Mb of chr.5 in the genomic region comprised between ZmSKUs5 and ZmHMA1 in presented as Supplementary File 6.

      The analysis of genetic diversity in loci encompassing heavy metal response genes shared by six transcriptomes and abiotic stress controls are described in Supplementary File 7.

      In the Discussion (pgs. 21 and 22), we added a paragraph section that reads as follows:

      “Although loss of genetic diversity is usually the result of human selection during domestication, it can also represent a consequence of natural selective pressures favoring fitness of specific teosinte parviglumis allelic variants better adapted to environmental changes and subsequently affected by human selection during the domestication process. This possibility is reflected by widely spread selective sweeps affecting a large portion of chr.5 that contains hundreds of genes showing signatures of positive selection. The analysis of 11.47 Mb covering the ZmHMA1ZmSKUs5 segment confirms the presence of large but discrete genomic subregions that were positively selected during the teosinte parviglumis to maize transition. Although several contain genes involved in HM response and oxidative stress, the diversity of gene functions does not necessarily favor abiotic stress over other factors that could be at the origin of selective forces affecting these regions. By contrast, a large scale transcriptomic survey indicates that genes consistently responding to HMs (Cu, Cd, Pb and Cr ) show signatures of positive selection at unusual high frequencies (43.3%) as compared to loci containing genes responding to other types of abiotic stress (28.6%). Our identification of HM response genes affected by positive selection is far from being exhaustive. Nevertheless, it agrees with the expected effects of a widespread selective sweep caused by environmental changes that influenced the parviglumis to maize transition at the genetic level. Of intriguing interest are 24 loci that partially or completely lack SNPs in both teosinte parviglumis and maize, suggesting possible genetic bottlenecks occurred before the teosinte to maize transition. Examples of other edaphological factors driving genetic divergence either in the teosintes or maize include local adaptation to phosphorus concentration in mexicana and parviglumis (Aguirre-Liguori et al. 2019), and fast maize adaptation to changing iron availability through the action of genes involved in its mobilization, uptake, and transport (Benke and Stich 2011). Our results reveal a teosinte parviglumis environmental plasticity that could be related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition. Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication.

      (2) The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We think that the detailed analysis of genetic diversity across 11.46 Mb covering the ZmSKUs5 to ZmHMA1 genomic segment – and its statistical validation - provides a precise understanding of the selective sweep dimensions in chr.5.

      We do agree that lower nucleotide diversity values in maize are not sufficient to infer human selection. Because many HM response loci show unusually low nucleotide variability in teosinte parviglumis (see the results of the transcriptomic analysis presented above), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native populations of teosinte parviglumis.

      To further explore the link between environmental factors, natural or human-driven selection, and the paleoenvironmental context of the parviglumis to maize transition, we revised paleoenvironmental and geological records and added results in two sections that read as follows (pgs. 17 to 20):

      “Paleoenvironmental studies reveal periods of climatic instability in the presumed region of maize emergence during the early Holocene.

      It is well accepted that temperature fluctuations, volcanism and anthropogenic impact shaped the distribution and abundance of plant species in the Transmexican Volcanic Belt (TMVB) during the last 14,000 years (Torrescano-Valle et al. 2019). The TMVB has produced close to 8000 volcanic structures (Ferrari et al., 2011), transforming the relief multiple times, and causing hydrographic and soil changes that actively modified the distribution and composition of plant communities in Central Mexico. Detailed paleoenvironmental data for the Pleistocene and Holocene is available for several lacustrine zones located within the 50 to 100 km range of the region currently considered the cradle of maize domestication (Matzuoka et al. 2002; Figure 5a). In Lake Zirahuén (102°44′ W; 19°26′ N and approximately 2075 meters above sea level; index [i] in Figure 5a), pollen, microcharcoal and magnetic susceptibility analyses of two sedimentary sequences reveals three periods of major ecological change during the early and middle Holocene.

      Between 9500 and 9000 calibrated years before present (cal yr BP), pine forests seem to have been associated with summer insolation increases. A second peak of forest change occurred at around 8200 cal yr BP, coinciding with cold oscillations documented in the North Atlantic. Finally, events occurred between 7500 and 7100 cal yr BP shows an abrupt change in the plant community related to humid Holocene climates and a presumed volcanic event (Lozano-García et al., 2013). The environmental history of the central Balsas watershed has also been documented by pollen, charcoal, and sedimentary analysis conducted in three lakes and a swamp of the Iguala valley (Piperno et al. 2007). Paleoecological records of lake Ixtacyola (8°20N, 99°35W and approximately 720 meters above sea level; index [ii] in Figure 5a) and lake Ixtapa (8°21N, 99°26W) indicate that an important increase in temperature and precipitation occurred between 13000 and 10000 cal yr BP. The pollen record of Ixtacyola showed that members of the genus Zea were already part of the vegetation coverage by 12900 to 13000 cal yr BP, suggesting that some teosintes – likely including parviglumis - were commonly found at elevation areas where they do not presently occur. Lake Almoloya (also named Chignahuapan; 19°05N, 99°20E and approximately 2575 meters above sea level; index [iii] in Figure 5a) in the upper Lerma basin is only 20 Km from the crater of the Nevado de Toluca that is responsible for creating the late Pleistocene Upper Toluca Pumice layer over which the Lerma basin is deposited. Pollen records indicate the presence of Zea species by 11080 to 10780 cal yr BP. As for other locations, an important period of climatic instability prevailed between 11500 and 8500 cal yr BP (Ludlow-Wiechers et al., 2005). Humidity fluctuations occurred until 8000 cal yr BP, with a stable temperate climate between 8500 and 5000 cal yr BP. Although pollen and diatom studies are often difficult to interpret at a regional scale, the overall results presented above suggest consistent periods of Zea plants present in periods of environmental and climatic instability that correlate with the history of volcanic activity during the early Holocene, as described in the next section.

      Temporal and geographical convergence between volcanic eruptions and maize emergence during the Holocene.

      Current evidence indicates that the emergence and domestication of maize initiated in Mesoamerica some time around 9,000 yr BP (Matsuoka et al. 2002). The current location of teosinte parviglumis populations that are phylogenetically most closely allied with maize are currently distributed in a region located between the Michoacan-Guanajuato Volcanic Field (MGVF) at their northwest, and the Nevado de Toluca and Popocatéptl volcanoes at their east and northeast (Figure 5a; Matsuoka et al. 2002). Precise records of field data indicate that ten accessions were collected in the Balsas river drainage near Teloloapan and Sierra de Huautla (Guerrero), at approximately 100 km south of the Nevado de Toluca crater. Three other accessions were collected near Tejupilco de Hidalgo and Zacazonapan (Estado de México), at approximately 50 to 60 km from the Nevado de Toluca crater (8762, JSG y LOS-161, and JSG-391). And four other accessions were located in Michoacan, at a location within the MGVF (accession 8763), or at mid-distance between the MGVF and the Nevado de Toluca crater (accessions JSG y LOS-130, 8761, and 8766).

      The most important source of HMs in ancient soils of Mesoamerica is TMBV-dependent volcanic activity through short- and long-term effects related to lava deposits, ores, hydrothermal flow, and ash (Torrescano-Valle et al. 2019). The Nevado de Toluca volcano produced one of the most powerful eruptions from central Mesoamerica in the Holocene, giving rise to the Upper Toluca Pumice deposit at 12621 to 12025 cal yr BP (Arce et al., 2003; Figure 5b). The pumice fallout blanketed the Lerma and Mexico basins with 40 cm of coarse ash (Bloomfield and Valastro 1977; Arce et al. 2003). A second eruption dated by 36Cl exposure occurred at 9700 cal yr BP (Arce et al. 2003; Figure 5b), and the most recent eruption occurred at 3580 to 3831 cal yr BP (Macías et al. 1997). During the early and middle Holocene, the Popocatéptl volcano produced at least four eruptions dated 13037-12060, 10775–9564, 8328-7591, and 6262-5318 cal yr BP (Siebe et al. 1997); three other important eruptions occurred during the late Holocene, between 2713 and 733 cal yr BP (Siebe and Macías, 2006). In addition, the MGFV is a monogenetic volcanic field for which 23 independent eruptions have been documented during the Holocene, 21 of them located towards the southern part of the field, in close proximity to the region harboring some of the teosinte parviglumis populations most closely related to maize. Three of these eruptions occurred in the early Holocene (El Huanillo 1130 to 9688 cal yr BP; La Taza 10649 to 10300 cal yr BP; Cerro Grande 10173 to 9502 cal yr BP; Figure 5b), and three others during the initial period of the middle Holocene, between 8400 and 7696 cal yr BP (La Mina, Los Caballos, and Cerro Amarillo; Figure 5b). On average, a new volcano forms every ~435 years in the MGFV (Macías and Arce, 2019). No less than 16 other eruptions occurred between 7159 cal yr BP and the present time (Figure 5b). Soils of volcanic origin (andosols) are currently distributed in regions north-west from the Nevado de Toluca and Popocatéptl craters, in close proximity with teosinte parviglumis populations most closely related to maize (Figure S5). Although modern distribution of teosinte populations may differ from their distribution around 9000 yr BP, and unknown populations more closely related to maize may yet to be discovered, this data indicates that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the Holocene in that same region.”

      (3) Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. We have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      On the other hand, the transcriptional analysis the identification of 52 additional HM response genes showing signatures of positive selection occurred during the parviglumis to maize transition; 12 of them map to chr.5 within the region having linked QTLs within the short arm of chr.5. So far, genes involved in HM response and oxidative stress represent the most prevalent class of genes identified within the genomic region showing pleiotropic effects on domestication and multiple linked QTLs in chr.5.

      Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication identified in previous studies. This includes heavy metal transporters, which are unregulated during stress. To study that, the authors compare the plant architecture of maize defective in ZmHMA1 and speculate on its association with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize also gives some novelty in this study. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We have now emphasized our hypothesis in the abstract and the last paragraph of the Introduction (pg. 6):

      “To test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte to maize, we exposed both subspecies to sublethal concentrations of copper and cadmium etc…”

      A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation. Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)

      Based on comments by both reviewers, we present now a large transcriptional analysis that incorporates HM responses to lead (Pb) and chromium (Cr), in addition to Cu. Results show that many genes responding to Pb and Cr were also positively selected across the maize genome, suggesting that HM stress led to a ubiquitous rather than a specific evolutionary response to heavy metals (please see our response to Reviewer#1 and sections in pgs. 11 to 13) .

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). These changes have been now emphasized in the Abstract and the description of the results.

      Regarding the possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor, we expanded the genomic analysis of genetic diversity well beyond the analysis of the three genes under initial study, to cover a segment of 11.47 Mb comprised between ZmSKUs5 and ZmHMA1. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). The full analysis is presented in a new section pgs. 11 and 12. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. Four out of five subregions contain more than one HM or oxidative stress response gene within loci showing signatures of positive selection. Although multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5, large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress allowed the identification of 49 additional HM response genes within loci showing positive selection across the genome, a proportion (43.3%) far greater than the proportion of loci containing response genes to other types of abiotic stress not related to HMs (28.6%). These results are described in detail in pgs. 12 and 13 (Figure S3 and Supplementary File 7). These results provide strong evidence in favor of HM stress and not another factor driving positive selection.

      We now provide precise and pertinent paleoenvironmental data on the potential influence of heavy metals in the field. In sections pgs. 17 to 20 we review paleoenvironmental studies revealing periods of climatic instability in the presumed region of maize emergence during the early Holocene, and data indicating that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the early and middle Holocene in that same region. Please see responses to Reviewer#1 for details.

      We agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the dataset generated provides an interesting foundation for hypothesis testing on HM stress and domestication, the current data do not sufficiently support the conclusions of the manuscript.

      (1) The description of maize and teosinte architecture under HM stress is well presented.

      However, traits like shoot height, leaf size reduction, and biomass loss also occur under other environmental stresses such as drought and salinity. Additional evidence beyond shoot and root architecture would help validate the link between tb1 expression and specific ZmHMA genes under HM stress, or whether it reflects a more generalized stress response.

      We have already addressed in detail this point in the public response to Reviewer#1.

      (2) The nucleotide variability analysis is interesting, but I would have liked to see additional information to clarify the choice of the data selection and the strength of the conclusions with human selection.

      We have already addressed in detail this point in the public response to Reviewer#1.

      a) The choice of Tripsacum dactyloides as the outgroup to determine nucleotide variability seems to be distant, and I wonder whether other combinations with a closer outgroup or multiple outgroups were tried to provide a more accurate context.

      Nucleotide variability in Tripsacum dactyloides is used to graphically illustrate an external reference and not as an outgroup in the extended analysis of genetic diversity at the locus and genomic level. We did not used Tripsacum dactyloides as an outgroup in our statisticalm analysis. We could have indeed a closer teosinte subspecies as an outgroup, but at this stage no data warrants that environmentally-related selective pressures could have affected genetic diversite in other teosintes. This possibility in currently being investigated.

      b) Evolutionary differences not related to human influence could affect the results. The phrase "order of magnitude difference in π values" needs statistical validation (e.g., confidence intervals, p-values).

      We agree and have eliminated the sentence, as it is no longer relevant at the light of the detailed genomic analysis of genetic diversity prsented in Supplementary File 6.

      c) The comparison with ZmGLB1, a neutral control locus, suggests that domestication-related changes in nucleotide variability are specific to the three candidate genes. However, the concept of neutrality is complex, and while ZmGLB1 may be considered neutral in this case, the argument does not address the possibility of other factors, such as linked selection, that could influence variability in these genes. Referencing Hufford et al. is insufficient and would require a deeper argument.

      We also agree with this comment. We think that the influence and consequences of linked selection are now well documented for 11.46 Mb analyzed in chr.5 (pgs 11 and 12) in the main text and Supplementary File 6).

      (3) The statement: "Our evidence indicates that HM stress revealed a teosinte parviglumis environmental plasticity that is directly related to the function of specific HM response genes that were affected by domestication through human selection" is not supported by the presented data. The rationale for the specific Cd/Cu dosage used is unclear. A dose-response gradient would better demonstrate the nature and strength of the plastic response.

      Previous reports support the rationale for the specific HM dosage in this study; Cu/Cd dosage response gradients have been conducted in maize (AbdElgawad et al. 2020; Atta et al., 202), but since no studies have been conducted in teosinte, we reasoned that it was important to apply the same treatment to both subspecies. We have now emphasized this rationale by adding the following in pg XX: “Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)”.

      We agree that the statement raised by the reviewer needed revision at the light of our results. We did revise the statement to accurately reflect our current evidence as follows: “Our results reveal a teosinte parviglumis environmental plasticity that is likely related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition.”

      (4) In maize, TEs are known to influence gene expression under abiotic stress, including for tb1 (PMID: 25569788). Since the author appears to make a causative conclusion between ZmHMA1, TB1, and HM stress, I would have liked to see a whole-transcriptome analysis and not a curation of two genes to determine whether other factors, such as TEs, can have that would lead to similar outcomes.

      We agree that is definetely a possibility that we have not investigated at this stage. However, we added a pargraph to reflect this pertinent suggestion:

      “Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      (5) I would suggest that the authors carefully review the tables, figures, and the corresponding legends. For example :

      a) Table 2 is called before Table 1, I would therefore suggest changing the numbering to reflect the paragraph order.

      Thank you for your help, we did change the order of the Tables in the new version.

      b) In Table 2, it is not clear whether the P value applies to the mean difference between WT and the mutant zmhma1, either in the presence or the absence of heavy metals. In addition, the authors need to use the P-value to estimate the differences between WT in the absence vs presence of HM, and WT in the absence of HM versus the mutant in the absence of HM (idem for presence).

      We did address this issue in detail and added P-values and specific pairwise comparisons to that Table (now Table 1). Data are presented as mean ± standard deviation and were tested by a paired Student’s T-Test. When the effects were significant according to T-Test, the treatments were compared with the Welch two sample T-Test at P < 0.05.

      c) Table 1 and Table 2: Indicate what type of statistical test was used and the number of plants used for each experiment (n). Also, I recommend the use of scientific notation for the P-values.

      The statistical tests have now been indicated, scientific notation has been added to the P-values; the number of plants and biological replicates are indicated in the Methods section.

      d) Lines 202 and 204: I assume Table 1 should be called instead of Table 2.

      This error has been corrected.

      e) General: In the text, when significance is highlighted along with measurements, the p-value needs to be added.

      We have added the P-value along the measurement for all significant differences.

      f) In the text, it is also mentioned that "the expression of ZMHMA1 was significantly increased in the presence of HMs (Figure 3c)". We are looking here at an RT-PCR, which is qualitative and without a robust quantitative comparison and statistics, I cannot conclude this assessment based on the presented evidence. No statistical measure is indicated here.

      Panel 3c is not RT-PCR but a real-time qPCR, showing relative fold-change, normalized to actin, with a 3-technical triplicate per 3 biological replicates). We have added error bars (SD) and P-values represented by asterisks (calculated with Student's t statistic) to support significant differences (P<0.05 and P<0.01). ZmHMA1 expression was significantly increased in the presence of HMs only in teosinte; there was no significant difference in maize.

      g) Figure 3 should at least have the gene name in the figure to quickly understand the figure panel. The key conserved domains should also be identified.

      We agree and apologize for the omission. The gene names have been added adjacent to the structures.

      h) Sentence at lines 459-460 lacks words and punctuation.

      This unfortunate rror has also been corrected.

      i) Figure S1, the reference Lemmon and Doebley, 2024 should be Lemmon and Doebley, 2014 to harmonize with the text.

      The correct year is 2014. We have corrected this error.

      Reviewer #2 (Recommendations for the authors):

      (1) The narrative should be clearer, starting with a clearer hypothesis that is later sustained or not in the results, and then discussed in the idea and speculation section.

      Thank you for the comment. We have clarified the hypothesis, it is included in the abstract and the last paragraph of the Introduction. We hope it is now clear that the evidence presented supports our hypothesis

      (2) Focus more on traits that are relevant, for example, nodal and seminal roots.

      We modified the text to emphasize three relevant traits. In the case of teosinte under HM stress, absence of tillering and increase in the number of female inflorescences. In the case of the zmha1 mutant under HM stress, differences in the number of nodal roots, and differences in height.

      (3) RNA-seq in Cu/Cd stress could make the work much more useful and complete.

      As previously mentioned, we have incorporated a large scale transcriptional analysis on the basis of six transcriptomes statistically validated (Table S5). Please see sections pgs. 11 to 13 for details.

    1. eLife Assessment

      This is an important work implementing data mining methods on IMC data to discover spatial protein patterns related to the triple-negative breast cancer patients' chemotherapy response. The evidence supporting the claims of the authors is solid, although more detailed methodology clarification and validation are needed. While the accuracy of the methods is not very high, the work shows potential for translational application.

    2. Reviewer #1 (Public review):

      Summary:

      The study presents a computational pipeline for Imaging Mass Cytometry (IMC) analysis in triple-negative breast cancer (TNBC). Analyzing over 4 million cells from 63 patients, it uncovers a distinct spatial organization of cell types between chemotherapy responders and non-responders. Using graph neural networks, the framework predicts treatment response from pre-treatment samples and identifies key predictive protein markers and cell types associated with therapeutic outcomes.

      Strengths:

      (1) The study presents a novel framework leveraging Imaging Mass Cytometry (IMC) to investigate spatial patterns and differences among patient groups, which has been rarely explored.

      (2) It uncovers several compelling biological insights, providing a deeper understanding of the complex interactions within the tumor microenvironment.

      (3) The analysis pipeline is comprehensive, incorporating batch correction, cell type clustering, and a graph neural network based on cell-cell interactions to predict chemotherapy response, demonstrating methodological innovation and thoughtful design.

      Weaknesses:

      (1) Some figure references are inconsistent. For example, Figure 4C is cited on Page 11, but it does not appear in the manuscript.

      (2) Several explanations and methodological details related to the figures remain unclear. For instance, it is not explained how the overall abundance of cell types in Figures 3D and 3E was calculated, how relative abundance was derived, or how these calculations were adjusted when split by proliferation status. In Table 2, it seems that model performance is reported using different node features (protein abundance or cell type), but the text in the second paragraph suggests that both were used simultaneously. This inconsistency is confusing. Additionally, the process for constructing the cell-cell contact graph, including how edges are defined, should be described more clearly.

      (3) The GNN performance appears modest. An AUROC of 0.71 can indicate meaningful predictive power for chemotherapy response, but it remains moderate. Including a baseline comparison would help contextualize the model's effectiveness. Furthermore, the reported value of 0.58 in Table 2 is relatively low, and its meaning or implication is not clearly explained.

      (4) Some methodological choices are not well justified. For example, the rationale for selecting the Self-Organizing Map (SOM) for clustering over other clustering methods is not discussed.

      (5) The manuscript would benefit from a more explicit discussion of how studies using IMC-based spatial analysis relate to or differ from those employing spatial transcriptomics, particularly in terms of their interpretability.

    3. Reviewer #2 (Public review):

      Summary:

      The current research presents an end-to-end computational workflow for large-scale Imaging Mass Cytometry (IMC) data and applies it to 813 regions of interest (ROIs) comprising over 4 million cells from 63 TNBC patients. The study integrates image preprocessing (IMC-Denoise and CLAHE), cell segmentation (Mesmer), phenotyping (Pixie), spatial neighborhood analysis (SquidPy), collagen feature extraction, and graph neural network (GNN) modeling to identify spatial-molecular determinants of chemotherapy response. The major observations include T-cell exclusion in non-responders, persistent fibroblast-macrophage co-localization post-therapy, and the identification of B7H4, CD11b, CD366, and FOXP3 as predictive markers via GNN explainability analysis. The work has been implemented on a rich dataset and integrated with spatial and molecular information. The manuscript is well written and addresses an important clinical question.

      Strengths:

      (1) The study analyzes 813 ROIs and over 4 million cells, which is an exceptionally large IMC dataset, and allows the authors to investigate spatial determinants of chemotherapy response in TNBC with considerably more statistical power than prior studies. It clearly shows an integrated spatial-proteomic analysis on a large IMC dataset.

      (2) The work reveals robust, conceptually meaningful tissue patterns with CD8+ T-cell exclusion from tumor regions in non-responders and increased fibroblast-macrophage spatial proximity that align with existing biological understanding of immunosuppressive microenvironments in TNBC. These findings highlight spatial organization, rather than simple cell abundance, as a key differentiator of treatment response.

      (3) Novel use of GNNs for chemoresponse prediction in IMC data helps in demonstrating that spatial and molecular features captured simultaneously can provide predictive information about treatment response. The use of GNNExplainer adds interpretability of the selected features, identifying immune-regulatory markers such as B7H4, CD366, FOXP3, and CD11b as contributors to chemoresponse heterogeneity.

      (4) The work complements emerging spatial transcriptomic analyses from the same SMART cohort and provides a scalable computational framework likely to be useful to other IMC and spatial-omics researchers.

      Weaknesses:

      (1) Some analytical components lack quantitative validation, limiting confidence in specific claims, such as CLAHE-based batch correction applied before segmentation are evaluated primarily through qualitative visualization rather than quantitative metrics. Similarly, the cell-type annotations produced via Pixie and manual thresholds lack independent validation, making it harder to assess the accuracy of downstream spatial and predictive analyses.

      (2) Predictive modeling performance is moderate and may be influenced by dataset structure; the GNN achieves AUROC ~0.71, which is meaningful but still limited, and the absence of external validation or multiple cross-validation strategies raises questions about generalizability. The predictive insights are promising but not yet sufficiently strong to support clinical decision-making.

      (3) Pre- and post-treatment comparisons are constrained to non-responders and pathologist-selected ROIs.

    4. Reviewer #3 (Public review):

      Summary:

      Luque et al. proposed stratifying chemotherapy response in triple-negative breast cancer based on spatial protein patterns from IMC data. This proposed method combines GNN with GNNexplainer to identify several important protein markers and cell types related to chemotherapy. As one of the most significant challenges in cancer research, this work holds great potential for translational medicine.

      Strengths:

      (1) Targeting the invention decision-making of TNBC, one of the prominent challenges in the field.

      (2) Cutting-edge spatial proteomics data with enough cohort and clinical outcome.

      (3) Appropriate usage of cutting-edge machine learning models and comprehensive analysis.

      Weaknesses:

      (1) More scientific rigor is needed for machine learning benchmarking.

      (2) More depth is needed, comparing related works with using similar approaches.

    1. eLife Assessment

      This important study focuses on the molecular mechanisms underlying the generation of neuronal diversity. Taking advantage of a well-defined neuroblast lineage in Drosophila, the authors provide convincing evidence that two transcription factors of the conserved forkhead box (FOX) family offer a mechanistic link between transient spatial cues that specify neuroblast identity and terminal selector genes that define post-mitotic neuron identity. The findings will be of interest to developmental neurobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for late-born NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How cell identities defined by initial transient developmental cues can be maintained in the progeny cells, even if the molecular mechanism remains to be investigated. In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Strengths:

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and are present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g. in the NB5-6 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for most experiments.

      Original weaknesses and potential extensions:

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      (3) Several observations suggest that lineage identity maintenance involves both Fd4-dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as what was done in Seroka and Doe (2019) would be an option.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth discussing whether it is a Fd4 feature or a NB5-6 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of later-born neurons in fd4/fd5 mutants.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Comments on latest version:

      We appreciate the thorough revision and the detailed point-by-point responses. Overall, the updated manuscript has addressed the main issues we raised previously, especially around the potential potency differences of Fd4 along the birth order axis and possible redundancy with Vnd in early-born neurons. The additional data are convincing and presented clearly, with figures and supplements that are informative and appropriately labeled.

      We noticed one remaining point that could be considered, the necessary-and-sufficient phrasing for Fd4 regulating NB7-1 fates. Given the possible redundancy among Fd4/5 and Vnd and the fact that early-born outputs (U1-3, Figure 3F) are not dependent on Fd4/5, we suggest revising this claim and either (a) limit the claim to necessary and sufficient for late-born NB7-1 progeny identity, or (b) frame Fd4 as sufficient for NB7-1 program induction while being required but redundant (e.g., with Vnd) for early-born features, rather than universally necessary/sufficient across the entire lineage output.

      Regarding the lack of phenotype of single Fd4/5 mutants and Fd5 gain of function, we still encourage the authors to include the fd4 and fd5 single-mutant negative results as a brief supplemental item (e.g., a representative panel plus a simple quantification on Eve and Dbx would be sufficient). This would strengthen transparency, remove "data not shown" statements that are not necessary when these data can be presented as supplementary data with no space limitation, and make it easier for readers to evaluate redundancy claims.

      Overall, we view the work as substantially complete and appreciate its contribution and conceptual framing. We have updated our public review to reflect the current version and the authors' efforts to address the major points raised in the prior round.

    3. Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STF's) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the new-born post-mitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny.

      The study is systematic, concise and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes.

      Experimental support for the "bridging" role of Fd4 comes from set of loss-of-function and gain-of-function manipulations. The loss of function of fd4, and the partially redundant gene fd5, from lineage 7-1 does not affect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and dbx throughout diverse VNC lineages.

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well characterized lineages in Drosophila. Lineage 7-3 is much smaller that 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6 was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to fd4 expression confirms that these cells do, indeed, show a shift in their cellular identity.

      Comments on revisions:

      The authors adequately addressed all of the issues that I had with the original submission.

      Their responses to the other reviewers are also well-reasoned

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for lateborn NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How can cell identities defined by initial transient developmental cues be maintained in the progeny cells, even if the molecular mechanism remains to be investigated? In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Thanks for the accurate summary and positive comments!

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and is present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g., in the NB56 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for all experiments.

      Thanks for the positive comments!

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      We quantified the percentage of Hb+ and Runt+ cells among Eve+ cells with sca-gal4, and the results are shown in Figure 4-figure supplement 1. We found that the proportion of early-born cells is slightly reduced but the proportion of later-born cells remain similar. Interestingly, we also found a subset of Eve+ cells with a mixed fate (Hb+Runt+) but the reason remains unclear.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      Because every hemisegment in an fd4 single mutant is normal, we just added it as the following text: “In fd4 mutants, we observe no change in the number of Eve+ neurons or Dbx+ neurons (n=40 hemisegments).”

      (3) Several observations suggest that lineage identity maintenance involves both Fd4dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      We agree, thanks for raising this point. We add the following text to the Discussion. “Interestingly, the fd4 fd5 mutant maintains expression of fd4:gal4, suggesting that the fd4/fd5 locus may have established a chromatin state that allows “permanent” expression in the absence of Vnd, En, and Fd4/Fd5 proteins.”

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as was done in Seroka and Doe (2019), would be an option.

      We agree it is interesting that the NB7-3 and NB5-6 drivers remain on following Fd4 misexpression. To explore this, we used sca-gal4 to overexpress Fd4 and observed that Lbe expression persisted while Eg was largely repressed (Author response image 1). The results show that Lbe and Eg respond differently to Fd4. A non-mutually exclusive possibility is that the continued expression of lbe-Gal4 UAS-GFP or eg-Gal4 UAS-GFP may be due to the lengthy perdurance of both Gal4 and GFP.

      Author response image 1.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth the authors discussing whether it is an Fd4 feature or an NB56 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      In the NB7-3 lineages misexpressing Fd4, only 5 lineages generated Dbx+ cells (0.1±0.4, n=64 hemisegments), suggesting that the low penetrance of Dbx+ induction is an intrinsic feature of Fd4 rather than lineage context. We have added this information in the results section.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly, so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of laterborn neurons in fd4/fd5 mutants.

      When we used en-gal4 driver to express UAS-vnd in the fd4/fd5 mutant background, we found an average 7.4±2.2 Eve+ cells per hemisegment (n=36), significantly higher than fd4/fd5 mutant alone (3.9±0.8 cells, n=52, p=2.6x10<sup>-11</sup>) (Figure 3J). In addition, 0.2±0.5 Eve+ cells were ectopic Hb+ (excluding U1/U2), indicating that Vnd-En integration is sufficient to generate both early-born and late-born Eve+ cells in the fd4/fd5 mutants. We have added the results to the text.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Thanks for the suggestion. Because the results are exactly the same as the wild type, we don’t think it is necessary to provide an additional images or analysis as supplemental information.

      Reviewer #2 (Public review):

      Via a detailed expression analysis, they find that Fd4 is selectively expressed in embryonic NB7-1 and newly born neurons within this lineage. They also undertake a comprehensive genetic analysis to provide evidence that fd4 is necessary and sufficient for the identity of NB7-1 progeny.

      Thanks for the accurate summary!

      The analysis is both careful and rigorous, and the findings are of interest to developmental neurobiologists interested in molecular mechanisms underlying the generation of neuronal diversity. Great care was taken to make the figures clear and accessible. This work takes great advantage of years of painstaking descriptive work that has mapped embryonic neuroblast lineages in Drosophila.

      Thanks for the positive comments!

      The argument that Fd4 is necessary for NB7-1 lineage identity is based on a Fd4/Fd5 double mutant. Loss of fd4 alone did not alter the number of NB7-1-derived Eve+ or Dbx+ neurons. The authors clearly demonstrate redundancy between fd4 and fd5, and the fact that the LOF analysis is based on a double mutant should be better woven through the text.The authors generated an Fd5 mutant. I assume that Fd5 single mutants do not display NB7-1 lineage defects, but this is not stated. The focus on Fd4 over Fd5 is based on its highly specific expression profile and the dramatic misexpression phenotypes. But the LOF analysis demonstrates redundancy, and the conclusions in the abstract and through the results should reflect the existence of Fd5 in the conclusions of this manuscript.

      We agree, and have added new text to clarify the single mutant phenotypes (there are none) and the double mutant phenotype (loss of NB7-1 molecular and morphological features. The following text is added to the manuscript: “Not surprisingly, we found that fd4 single mutants or fd5 single mutants had no phenotype (Eve+ neurons were all normal). Thus, to assess their roles, we generated a fd4 and fd5 double mutant. Because many Eve+ and Dbx+ cells are generated outside of NB7-1 lineage, it was also essential to identify the Eve+ or Dbx+ cells within NB7-1 lineage in wild type and fd4 mutant embryos. To achieve this, we replaced the open reading frame of fd4 with gal4 (called fd4-gal4) (see Methods); this stock simultaneously knocked out both fd4 and fd5 (called fd4/fd5 mutant hereafter) while specifically labeling the NB7-1 lineage. For the remainder of this paper we use the fd4/fd5 double mutant to assay for loss of function phenotypes.”

      It is notable that Fd4 overexpression can rewire motor circuits. This analysis adds another dimension to the changes in transcription factor expression and, importantly, demonstrates functional consequences. Could the authors test whether U4 and U5 motor axon targeting changes in the fd4/fd5 double mutant? To strengthen claims regarding the importance of fd4/fd5 for lineage identity, it would help to address terminal features of U motorneuron identity in the LOF condition.

      Thanks for raising this important point. We examined the axon targeting on body wall muscles in both wild type and in fd4/fd5 mutant background and added the results in Figure 3-figure supplement 2. We found that the axon targeting in the late-born neuron region (LL1) is significantly reduced, suggesting that the loss of late-born neurons in fd4/fd5 mutant leads to the absence of innervation of corresponding muscle targets.

      Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STFs) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the newborn postmitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny.

      Thanks for the positive comments!

      The study is systematic, concise, and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes.

      Thanks for the accurate summary!

      Experimental support for the "bridging" role of Fd4 comes from a set of loss-of-function and gain-of-function manipulations. The loss of function of Fd4, and the partially redundant gene Fd5, from lineage 7-1 does not aoect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and Dbx throughout diverse VNC lineages.

      Thanks for the accurate summary!

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well-characterized lineages in Drosophila. Lineage 7-3 is much smaller than 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic Fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Thanks for the positive comments!

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6, was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages, but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to Fd4 expression confirm that these cells do, indeed, show a shift in their cellular identity.

      We appreciate the positive comments. We agree double misexpression of Fd4 and Fd5 might give a stronger phenotype (as the reviewer says) but the lack of this experiment does not change the conclusions that Fd4 can promote NB7-1 molecular and morphological aspects at the expense of NB5-6 molecular markers.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The title of Figure 4 may be intended to include the term "Widespread", not "Wild spread". (Though the expansion of the Eve and Dbx with Fd4 is quite remarkable…).

      Done!

      Reviewer #3 (Recommendations for the authors):

      (1) Line 138. Is part of the sentence missing? Did the authors mean to say "that fd5 is coexpressed with fd4 in NB7-1 and its .....".

      Done!

      (2) ln 237: In trying to explain the "U-like" phenotype of the transformed motoneurons in lineage 7-3, the authors speculate that "perhaps their late birth did not give them time to extend to the most distant dorsal muscles ". It is very difficult to convince a motoneuron to stop growing in the absence of a target! An alternate possibility is that since there is only one or two U neurons made instead of the normal five, the growing motoneuron has enough information to direct them to the dorsal domain, but they lack the specification that allows them to recognize a specific muscle target.

      We agree there are additional possibilities, and now update the text to say: “We observed that these transformed neurons did not innervate the dorsal muscles, perhaps their late birth did not give them time to extend to the most distant dorsal muscles, or they were incompletely specified.”

      (3) In the References, I think that the Anderson et al. reference should also include "BioRxiv" before the DOI.

      Done!

      (4) Figure 6A for wild-type 7-3 lineage. The corazonin expression appears to be expressed in EW2 as well as EW3. This should be explained.

      We agree it looks that way, due to the 3D rotation used; we now replace it with a more representative image. Note that our quantification always shows a single Cor+ neuron per hemisegment.

      (5) Figure 7: Issues of terminology. The designation of "longitudinal" for muscles is traditionally in reference to the body axis, such as the Dorsal Longitudinal Muscles (DLM) of the adult thorax. The "longitudinal" muscles in the figure are really "transverse" muscles. I also suggest using "axon" or "neurites" rather than "filament". For the middle and bottom parts of E and F, are these lateral and ventral views? They should be designated as such.

      Thanks, we agree and have made the changes, using Axon instead of Filament, and labeling the views (lateral and ventro-lateral).

    1. eLife Assessment

      This study presents experiments suggesting intriguing mesoscale reorganization of functional connectivity across distributed cortical and subcortical circuits during learning. The approach is technically impressive and the results are potentially of valuable significance. The authors have also made clear effort to address concerns in revision. However, the strength of evidence remains incomplete. Acquisition of data from additional animals in the primary experiment could bolster these findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to address an important and timely question: how does the mesoscale architecture of cortical and subcortical circuits reorganize during sensorimotor learning? By using high-density, chronically implanted ultra-flexible electrode arrays, the authors track spiking activity across ten brain regions as mice learn a visual Go/No-Go task. The results indicate that learning leads to more sequential and temporally compressed patterns of activity during correct rejection trials, alongside changes in functional connectivity ranks that reflect shifts in the relative influence of visual, frontal, and motor areas throughout learning. The emergence of a more task-focused subnetwork is accompanied by broader and faster propagation of stimulus information across recorded regions.

      Strengths:

      A clear strength of this work is its recording approach. The combination of stable, high-throughput multi-region recordings over extended periods represents a significant advance for capturing learning-related network dynamics at the mesoscale. The conceptual framework is well motivated, building on prior evidence that decision-relevant signals are widely distributed across the brain. The analysis approach, combining functional connectivity rankings with information encoding metrics is well motivated but needs refinement. These results provide some valuable evidence of how learning can refine both the temporal precision and the structure of interregional communication, offering new insights into circuit reconfiguration during learning.

      Weaknesses:

      Several important aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results. The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state. Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled. The optogenetic experiments, while intended to test the functional relevance of rank-increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. measure from 10 cortical and subcortical brain as mice learn a go/no-go visual discrimination task. They found that during learning, there is a reshaping of inter-areal connections, in which a visual-frontal subnetwork emerges as mice gain expertise. Also visual stimuli decoding became more widespread post-learning. They also perform silencing experiments and find that OFC and V2M are important for the learning process. The conclusion is that learning evoked a brain-wide dynamic interplay between different brain areas that together may promote learning.

      Strengths:

      The manuscript is written well and the logic is rather clear. I found the study interesting and of interest to the field. The recording method is innovative and requires exceptional skills to perform. The outcomes of the study are significant, highlighting that learning evokes a widespread and dynamics modulation between different brain areas, in which specific task-related subnetworks emerge.

      Weaknesses:

      I had some major concerns that make the claims of the study less convincing: Low number of mice, insufficient movement analysis, figure visualization and analytic methods.

      Nevertheless, I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis they minimize their analysis 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Fig. S4 but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      (3) Most of the figures are over-detailed and it is hard to understand the take home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio maps is enough, and the rest could be bumped to the Supp, if at all. In general, the figures in several cases do not convey the main take home messages.

      (4) Analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between output and input analysis? Also time period seem sometimes redundant. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist.

      Reviewer comments to the authors' revision:

      Thank you for the extensive revision. Most of my concerns were answered and the manuscript is much improved. Still, there are some major issues that remain unconvincing:

      (1) The number of learning mice is only 3 which is substantially low as compared to other studies in the field. Thus, statistics are across trials and session pooled from all mice. This is a big limitation in supporting the authors' claims

      (2) There is no measurement of movement during the task. Since there are already several studies showing that movement has a strong effect on brain-wide dynamics, and since it is well known that mice change their body movement during learning (at least some mice) the authors cannot disentangle between learning-related and movement-related dynamics. This issue is properly discussed in the paper and also partially addressed with a control group where movement was measured without neural recordings.

      (3) The authors do not know exactly where they recorded from, with emphasis on subcortical areas. The authors partially address this in a separate cohort where they regenerate the reproducibility rate of penetration locations, but still this is not a complete address to this concern.

      Given the issues above, I strongly recommend including additional mice with body movement measurement in the future. Great job and congratulations on this study!

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript " Dynamics of mesoscale brain network during decision-making learning revealed by chronic, large-scale single-unit recording", Wang et al investigated mesoscale network reorganization during visual stimulus discrimination learning in mice using chronic, large-scale single-unit recordings across 10 cortical/subcortical regions. During learning, mice improved task performance mainly by suppressing licking on no-go trials. The authors found that learning induced restructuring of functional connectivity, with visual (V1, V2M) and frontal (OFC, M2) regions forming a task-relevant subnetwork during the acquisition of correct No-Go (CR) trials. Learning also compressed sequential neural activation and broadened stimulus encoding across regions. In addition, a region's network connectivity rank correlated with its timing of peak visual stimulus encoding. Optogenetic inhibition of orbitofrontal cortex (OFC) and high order visual cortex (V2M) impaired learning, validating its role in learning. The work highlights how mesoscale networks underwent dynamic structuring during learning.

      Strengths:

      The use of ultra-flexible microelectrode arrays (uFINE-M) for chronic, large-scale recordings across 10 cortical/subcortical regions in behaving mice represents a significant methodological advancement. The ability to track individual units over weeks across multiple brain areas will provide a rare opportunity to study mesoscale network plasticity.<br /> While limited in scope, optogenetic inhibition of OFC and V2M directly ties connectivity rank changes to behavioral performance, adding causal depth to correlational observations.

      Weaknesses:

      The weakness is also related to the strength provided by the method. While the method in principle enables chronic tracking of individual units, the authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording in individual days across learning, weaking the attractiveness of the methodology and this study.

      Another weakness is that major results are based on analyses of functional connectivity. Functional connection strengthen across areas is ranked 1-10 based on relative strength. And the regional input/out is compared across learning. This approach reveals differential changes in some cortical and subcortical areas. In my view, learning-related changes should be validated using complementary methods.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The technical approach is strong and the conceptual framing is compelling, but several aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results.

      We agree that our functional connectivity ranking analyses cannot establish causal influences. As discussed in the manuscript, besides learning-related activity changes, the functional connectivity may also be influenced by neuromodulatory systems and internal state fluctuations. In addition, the spatial scope of our recordings is still limited compared to the full network implicated in visual discrimination learning, which may bias the ranking estimates. In future, we aim to achieve broader region coverage and integrate multiple complementary analyses to address the causal contribution of each region.

      The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state.

      We believe this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled.

      We agree that a larger sample size would strengthen the robustness of the findings. However, as noted above, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      The optogenetic experiments, while intended to test the functional relevance of rank increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy.

      Details on spike sorting are limited.

      We have provided more details on spike sorting in method section, including the exact parameters used in the automated sorting algorithm and the subsequent manual curation criteria.

      Reviewer #2 (Public review):

      Weaknesses:

      I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis, they minimize their analysis to 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case, all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      As we noted in our response to Reviewer #1, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve high-quality unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. These improvements will enable us to collect data from a larger sample size and extract more precise insights into mesoscale dynamics during learning.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Figure S4, but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (3) Most of the figures are over-detailed, and it is hard to understand the take-home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially Figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio map is enough, and the rest could be bumped to the Supplementary section, if at all. In general, the figure in several cases do not convey the main take home messages. See more details below.

      We thank the reviewer for this valuable critique. The statistical significance corresponding to the brain plots (Figure 4 and Figure 5) was presented in Figure S3 and S5 (now Figure S5 and S7 in the revised manuscript), but we agree that the figure can be simplified to focus on the key results.

      In the revised manuscript, we have condensed these figures to focus on the most important comparisons to make the visual presentation more concise and the take-home message clearer.

      (4) The analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between the output and input analysis? Also, the time period seems redundant sometimes. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist

      We appreciate the reviewer’s comment. In brief, the input- and output-rank analyses yielded largely similar patterns across regions in CR trials, although some differences were observed in certain areas (e.g., striatum) in Hit trials, where the magnitude of rank change was not identical between input and output measures. We have condensed the figures to only show averaged rank results, and the colormap was updated to better covey the message.

      We did explore dimensionality reduction applied to the ranking data. However, the results were not intuitive as well and required additional interpretation, which did not bring more insights. Still, we acknowledge that other analysis approaches might provide complementary insights.

      Reviewer #3 (Public review):

      Weaknesses:

      The weakness is also related to the strength provided by the method. It is demonstrated in the original method that this approach in principle can track individual units for four months (Luan et al, 2017). The authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording across multiple days during learning. Many studies have achieved acute recording across learning using similar tasks. These studies have recorded units from a few brain areas or even across brain-wide areas.

      We appreciate the reviewer’s important point. We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses. Concentrating probes in fewer regions would allow us to obtain enough units tracked across learning in future studies to fully exploit the advantages of this method.

      Another weakness is that major results are based on analyses of functional connectivity that is calculated using the cross-correlation score of spiking activity (TSPE algorithm). Functional connection strengthen across areas is then ranked 1-10 based on relative strength. Without ground truth data, it is hard to judge the underlying caveats. I'd strongly advise the authors to use complementary methods to verify the functional connectivity and to evaluate the mesoscale change in subnetworks. Perhaps the authors can use one key information of anatomy, i.e. the cortex projects to the striatum, while the striatum does not directly affect other brain structures recorded in this manuscript

      We agree that the functional connectivity measured in this study relies on statistical correlations rather than direct anatomical connections. We plan to test the functional connection data with shorter cross-correlation delay criteria to see whether the results are consistent with anatomical connections and whether the original findings still hold.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The small number of mice, each contributing many sessions, complicates the  interpretation of the data. It is unclear how statistical analyses accounted for the small  sample size, repeated measures, and non-independence across sessions, or whether  multiple comparisons were adequately controlled.

      We realized the limitation from the small number of animal subjects, yet the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size. Though we agree that a larger sample size would strengthen the robustness of the findings, however, as noted below the current dataset has inherent limitations in both the scope of recorded regions and the behavioral paradigm.

      Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      (2) The ranking approach, although intuitive for visualizing relative changes in  connectivity, is fundamentally descriptive and does not reflect the magnitude or  reliability of the connections. Converting raw measures into ordinal ranks may obscure  meaningful differences in strength and can inflate apparent effects when the underlying  signal is weak.

      We agree with this important point. As stated in the manuscript, our motivation in taking the ranking approach was that the differences in firing rates might bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (3) The absolute response onset latencies also appear quite slow for sensory-guided  behavior in mice, and it remains unclear whether this reflects the method used to  determine onset timing or factors such as task design, sensorimotor demands, or  internal state. The approach for estimating onset latency by comparing firing rates in  short windows to baseline using a t-test raises concerns about robustness, as it may  be sensitive to trial-to-trial variability and yield spurious detections.

      We agree this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      (4) Details on spike sorting are very limited. For example, defining single units only by  an interspike interval threshold above one millisecond may not sufficiently rule out  contamination or overlapping clusters. How exactly were neurons tracked across days  (Figure 7B)?

      We have added more details on spike sorting, including the processing steps and important parameters used in the automated sorting algorithm. Only the clusters well isolated in feature space were accepted in manual curation.

      We attempted to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      This is now stated more clearly in the discussion section.

      (5) The optogenetic experiments, while designed to test the functional relevance of  rank-increasing regions, also raise questions. The physiological impact of the inhibition  is not characterized, making it unclear how effectively the targeted circuits were  actually silenced. Without clearer evidence that the manipulations reliably altered local  activity, the interpretation of the observed or absent behavioral effects remains  uncertain.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy. 

      (6) The task itself is relatively simple, and the anatomical coverage does not include  midbrain or cerebellar regions, limiting how broadly the findings can be generalized to more flexible or ethologically relevant forms of decision-making.

      We appreciate this advice and have expanded the existing discussion to more explicitly state that the relatively simple task design and anatomical coverage might limit the generalizability of our findings.

      (7) The abstract would benefit from more consistent use of tense, as the current mix of  past and present can make the main findings harder to follow. In addition, terms like  "mesoscale network," "subnetwork," and "functional motif" are used interchangeably in  places; adopting clearer, consistent terminology would improve readability.

      We have changed several verbs in abstract to past form, and we now adopted a more consistent terminology by substituting “functional motif” as “subnetwork”. We still feel the use of

      “mesoscale network” and “subnetwork” could emphasize different aspects of the results according to the context, so these words are kept the same.

      (8) The discussion could better acknowledge that the observed network changes may  not reflect task-specific learning alone but could also arise from broader shifts in  arousal, attention, or motivation over repeated sessions.

      We have expanded the existing discussion to better acknowledge the possible effects from broader shifts in arousal, attention, or motivation over repeated sessions.

      (9) The figures would also benefit from clearer presentation, as several are dense and  not straightforward to interpret. For example, Figure S8 could be organized more  clearly to highlight the key comparisons and main message

      We have simplified the over-detailed brain plots in Figure 4-5, and the plots in Figure 6 and S8 (now S10 in the revised manuscript).

      (10) Finally, while the manuscript notes that data and code are available upon request,  it would strengthen the study's transparency and reproducibility to provide open access  through a public repository, in line with best practices in the field.

      The spiking data, behavior data and codes for the core analyses in the manuscript are now shared in pubic repository (Dryad). And we have changed the description in the Data Availability secition accordingly.

      Reviewer #2 (Recommendations for the authors):

      (A) Introduction:

      (1) "Previous studies have implicated multiple cortical and subcortical regions in visual  task learning and decision-making". No references here, and also in the next sentence.

      The references were in the following introduction and we have added those references here as well.

      We also added one review on cortical-subcortical neural correlates in goal-directed behavior (Cruz et al., 2023).

      (2) Intro: In general, the citation of previous literature is rather minimal, too minimal.  There is a lot of studies using large scale recordings during learning, not necessarily  visual tasks. An example for brain-wide learning study in subcortical areas is Sych et  al. 2022 (cell reports). And for wide-field imaging there are several papers from the  Helmchen lab and Komiyama labs, also for multi-area cortical imaging.

      We appreciate this advice. We included mainly visual task learning literature to keep a more focused scope around the regions and task we actually explored in this study. We fear if we expand the intro to include all the large-scale imaging/recording studies in learning field, the background part might become too broad.

      We have included (Sych, Fomins, Novelli, & Helmchen, 2022) for its relevance and importance in the field.

      (3) In the intro, there is only a mention of a recording of 10 brain regions, with no  mention of which areas, along with their relevance to learning. This is mentioned in the  results, but it will be good in the intro.

      The area names are now added in intro.

      (B) Results:

      (1) Were you able to track the same neurons across the learning profile? This is not  stated clearly.

      We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      We now stated this more clearly in the discussion section.

      (2) Figure 1 starts with 7 mice, but only 5 mice are in the last panel. Later it goes down  to 3 mice. This should be explained in the results and justified.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      (3) I can't see the electrode tracks in Figure 1d. If they are flexible, how can you make  sure they did not bend during insertion? I couldn't find a description of this in the  methods also.

      The electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      The ultra-flexible probes could not penetrate brain on their own (since they are flexible), and had to be shuttled to position by tungsten wires through holes designed at the tip of array shanks. The tungsten wires were assembled to the electrode array before implantation; this was described in the section of electrode array fabrication and assembly. We also included the description about the retraction of the guiding tungsten wires in the surgery section to avoid confusion.

      As an further attempt to verify the accuracy of implantation depth, we also measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      (4) In the spike rater in 1E, there seems to be ~20 cells in V2L, for example, but in 1F,  the number of neurons doesn't go below 40. What is the difference here? 

      We checked Figure 1F, the plotted dots do go below 40 to ~20. Perhaps the file that reviewer received wasn’t showing correctly?

      (5) The authors focus mainly on CR, but during learning, the number of CR trials is  rather low (because they are not experts). This can also be seen in the noisier traces  in Figure 2a. Do the authors account for that (for example by taking equal trials from  each group)? 

      We accounted this by reconstructing bootstrap-resampled datasets with only 5 trials for each session in both the early stage and the expert stage. The mean trace of the 500 datasets again showed overall decrease in CR trial firing rate during task learning, with highly similar temporal dynamics to the original data.

      The figure is now added to supplementary materials (as Figure S3 in the revised manuscript).

      (6) From Figure 2a, it is evident that Hit trials increase response when mice become  experts in all brain areas. The authors have decided to focus on the response onset  differences in CRs, but the Hit responses display a strong difference between naïve  and expert cases.

      Judged from the learning curve in this task the mice learned to inhibit its licking action when the No-Go stimuli appeared, which is the main reason we focused on these types of trials.

      The movement effects and potential licking artefacts in Hit trials also restricted our interpretation of these trials.

      (7) Figure 3 is still a bit cumbersome. I wasn't 100% convinced of why there is a need  to rank the connection matrix. I mean when you convert to rank, essentially there could  be a meaningful general reduction in correlation, for example during licking, and this  will be invisible in the ranking system. Maybe show in the supp non-ranked data, or  clarify this somehow

      We agree with this important point. As stated in the manuscript and response to Reviewer #1, our motivation in taking the ranking approach was that the differences in firing rates could bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (8) Figure 4a x label is in manuscript, which is different than previous time labels,  which were seconds.

      We now changed all time labels from Figure 2 to milliseconds.

      (9) Figure 4 input and output rank look essentially the same.

      We have compressed the brain plots in Figures 4-5 to better convey the take-home message.

      (10) Also, what is the late and early stim period? Can you mark each period in panel A? Early stim period is confusing with early CR period. Same for early respons and late response.

      The definition of time periods was in figure legends. We now mark each period out to avoid confusion.

      (11) Looking at panel B, I don't see any differences between delta-rank in early stim,  late stim, early response, and late response. Same for panel c and output plots.

      The rankings were indeed relatively stable across time periods. The plots are now compressed and showed a mean rank value.

      (12) Panels B and C are just overwhelming and hard to grasp. Colors are similar both  to regular rank values and delta-rank. I don't see any differences between all  conditions (in general). In the text, the authors report only M2 to have an increase in  rank during the response period. Late or early response? The figure does not go well  with the text. Consider minimizing this plot and moving stuff to supplementary.

      The colormap are now changed to avoid confusion, and brain plots are now compressed.

      (13) In terms of a statistical test for Figure 4, a two-way ANOVA was done, but over  what? What are the statistics and p-values for the test? Is there a main effect of time  also? Is their a significant interaction? Was this done on all mice together? How many  mice? If I understand correctly, the post-hoc statistics are presented in the  supplementary, but from the main figure, you cannot know what is significant and what  is not.

      For these figures we were mainly concerned with the post-hoc statistics which described the changes in the rankings of each region across learning.

      We have changed the description to “t-test with Sidak correction” to avoid the confusion.

      (14) In the legend of Figure 4, it is reported that 610 expert CR trials from 6 sessions,  instead of 7 sessions. Why was that? Also, like the previous point, why only 3 mice?

      Behavior data of all the sessions used were shown in Figure S1. There were only 3 mice used for the learning group, the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size

      (15) Body movement analysis: was this done in a different cohort of mice? Only now  do I understand why there was a division into early and late stim periods. In supp 4,  there should be a trace of each body part in CR expert versus naïve. This should also  be done for Hit trials as a sanity check. I am not sure that the brightness difference  between consecutive frames is the best measure. Rather try to calculate frame-to frame correlation. In general, body movement analysis is super important and should  be carefully analyzed.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (16) For Hit trials, in the striatum, there is an increase in input rank around the  response period, and from Figure S6 it is clear that this is lick-related. Other than that,  the authors report other significant changes across learning and point out to Figure 5b,c. I couldn't see which areas and when it occurred.

      We did naturally expect the activity in striatum to be strongly related to movement.

      With Figure S6 (now S7) we wished to show that the observed rank increase for striatum could not simply be attributed to changes in time of lick initiation.

      As some readers may argue that during learning the mice might have learned to only intensely lick after response signal onset, causing the observed rise of input rank after response signal, we realigned the spikes in each trial to the time of the first lick, and a strong difference could still be observed between early training stage and expert training stage.

      We still cannot fully rule out the effects from more subtle movement changes, as the face motion energy did increase in early response period. This result and related discussion has been added to the results section of revised manuscript.

      (17) Figure 6, again, is rather hard to grasp. There are 16 panels, spread over 4 areas,  input and output, stim and response. What is the take home message of all this?  Visually, it's hard to differentiate between each panel. For me, it seems like all the  panels indicate that for all 4 areas, both in output and input, frontal areas increase in  rank. This take-home message can be visually conveyed in much less tedious ways.  This simpler approach is actually conveyed better in the text than in the figures  themselves. Also, the whole explanation on how this analysis was done, was not clear  from the text. If I understand it, you just divided and ranked the general input (or  output) into individual connections? If so, then this should be better explained.

      We appreciate this advice and we have compressed the figures to better convey the main message.The rankings for Figure 6 and Figure S8 (now Figure S9) was explained in the left panel of Figure 3C. Each non-zero element in the connection matrix was ranked to value from 1-10, with a value of 10 represented the 10% strongest non-zero elements in the matrix.

      We have updated the figure legends of Figure 3, and we have also updated the description in methods (Connection rank analyses) to give a clearer description of how the analyses were applied in subsequent figures.

      (18) Figure 7: Here, the authors perform a ROC analysis between go and no-go  stimuli. They balance between choice, but there is still an essential difference between  a hit and a FA in terms of movement and licks. That is maybe why there is a big  difference in selective units during the response period. For example, during a Hit trial  the mouse licks and gets a reward, resulting in more licking and excitement. In FAs,the mouse licks, but gets punished, which causes a reduction in additional licking and  movements. This could be a simple explanation why the ROC was good in the late  response period. Body movement analysis of Hit and FA should be done as in Figure  S4.

      We appreciate this insightful advice.

      Though we balanced the numbers of basic trial types, we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, which is likely the reason of large proportion of encoding neurons in response period.

      We have added this discussion both in result section and discussion section along with the necessity of more carefully designed behavior paradigm to disentangle task information.

      (19) The authors also find selective neurons before stimulus onset, and refer to trial  history effects. This can be directly checked, that is if neurons decode trial history.

      We attempted encoding analyses on trial history, but regrettably for our dataset we could not find enough trials to construct a dataset with fully balanced trial history, visual stimulus and behavior choice.

      (20) Figure 7e. What is the interpretation for these results? That areas which peaked  earlier had more input and output with other areas? So, these areas are initiating  hubs? Would be nice to see ACC vs Str traces from B superimposed on each other.  Having said this, the Str is the only area to show significant differences in the early  stim period. But is also has the latest peak time. This is a bit of a discrepancy.

      We appreciate this important point.

      The limitation in the anatomical coverage of brain regions restricted our interpretation about these findings. They could be initiating hubs or earlier receiver of the true initiating hubs that were not monitored in our study.

      The Str trace was in fact above the ACC trace, especially in the response period. This could be explained by the above advice 18: since we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, and considering striatum activity is strongly related to movement, the Str trace may reflect more in the motion related spike count difference between FA trials and Hit trials, instead of visual stimulus related difference.

      This further shows the necessity of more carefully designed behavior paradigm to disentangle task information.

      The striatum trace also in fact didn’t show a true double peak form as traces in other regions, it ramped up in the stimulus region and only peaked in response period. This description is now added to the results section.

      In the early stim period, the Striatum did show significant differences in average percent of encoding neurons, as the encoding neurons were stably high in expert stage. The striatum activity is more directly affected Still the percentage of neurons only reached peak in late stimulus period.

      (21) For the optogenetic silencing experiments, how many mice were trained for each  group? This is not mentioned in the results section but only in the legend of Figure 8. This part is rather convincing in terms of the necessity for OFC and V2M

      We have included the mice numbers in results section as well.

      (C) Discussion

      (1) There are several studies linking sensory areas to frontal networks that should be  mentioned, for example, Esmaeili et a,l 2022, Matteucci et al., 2022, Guo et a,l 2014,Gallero Salas et al, 2021, Jerry Chen et al, 2015. Sonja Hofer papers, maybe. Probably more.

      We appreciate this advice. We have now included one of the mentioned papers (Esmaeili et al., 2022) in the results section and discussion section for its direct characterization of the enhanced coupling between somatosensory region and frontal (motor) region during sensory learning.The other studies mentioned here seem to focus more on the differences in encoding properties between regions along specific cortical pathways, rather than functional connection or interregional activity correlation, and we feel they are not directly related to the observations discussed.

      (2) The reposted reorganization of brain-wide networks with shifts in time is best  described also in Sych et al. 2021.

      We regret we didn’t include this important research and we have now cited this in discussion section.

      (3) Regarding the discussion about more widespread stimulus encoding after learning,  the results indicate that the striatum emerges first in decoding abilities (Figure 7c left  panel), but this is not discussed at all.

      We briefly discussed this in the result section. We tend to attribute this to trial history signal in striatum, but since the structure of our data could not support a direct encoding analysis on trial history, we felt it might be inappropriate to over-interpret the results.

      (4) An important issue which is not discussed is the contribution of movement which  was shown to have a strong effect on brain-wide dynamics (Steinmetz et al 2019;  Musall et al 2019; Stringer et al 2019; Gilad et al 2018) The authors do have some movement analysis, but this is not enough. At least a discussion of the possible effects of movement on learning-related dynamics should be added.

      We have included these studies in discussion section accordingly. Since the movement analyses were done in a separate cohort of mice, we have made our limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (D) Methods

      (1) How was the light delivery of the optogenetic experiments done? Via fiber  implantation in the OFC? And for V2M? If the red laser was on the skull, how did it get  to the OFC?

      The fibers were placed on cortex surface for V2M group, and were implanted above OFC for OFC manipulation group. These were described in the viral injection part of the methods section.

      (2) No data given on how electrode tracking was done post hoc

      As noted in our response to the advice 3 in results section, the electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      As an attempt to verify the accuracy of implantation depth, we measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      Reviewer #3 (Recommendations for the authors):

      (1) The manuscript uses decision-making in the title, abstract and introduction.  However, nothing is related to decision learning in the results section. Mice simply  learned to suppress licking in no-go trials. This type of task is typically used to study behavioral inhibition. And consistent with this, the authors mainly identified changes  related to network on no-go trials. I really think the title and main message is  misleading. It is better to rephrase it as visual discrimination learning. In the  introduction, the authors also reviewed multiple related studies that are based on  learning of visual discrimination tasks.

      We do view the Go/No-Go task as a specific genre of decision-making task, as there were literature that discussed this task as decision-making task under the framework of signal detection theory or updating of item values (Carandini & Churchland, 2013; Veling, Becker, Liu, Quandt, & Holland, 2022).

      We do acknowledge the essential differences between the Go/No-Go task and the tasks that require the animal to choose between alternatives, and since we have now realized some readers may not accept this task as a decision task, we have changed the title to visual discrimination task as advised.

      (2) Learning induced a faster onset on CR trials. As the no-go stimulus was not  presented to mice during early stages of training, this change might reflect the  perceptual learning of relevant visual stimulus after repeated presentation. This further  confirms my speculation, and the decision-making used in the title is misleading. 

      We have changed the title to visual discrimination task accordingly.

      (3) Figure 1E, show one hit trial. If the second 'no-go stimulus' is correct, that trial  might be a false alarm trial as mice licked briefly. I'd like to see whether continuous  licking can cause motion artifacts in recording. 

      We appreciate this important point. There were indeed licking artifacts with continuous licking in Hit trials, which was part of the reason we focused our analyses on CR trials. Opto-based lick detectors may help to reduce the artefacts in future studies.

      (4) What is the rationale for using a threshold of d' < 2 as the early-stage data and d'>3  as expert stage data?

      The thresholds were chosen as a result from trade-off based on practical needs to gather enough CR trials in early training stage, while maintaining a relatively low performance.

      Assume the mice showed lick response in 95% of Go stimulus trials, then d' < 2 corresponded to the performance level at which the mouse correctly rejected less than 63.9% of No-Go stimulus trials, and d' > 3 corresponded to the performance level at which the mouse correctly rejected more than 91.2% of No-Go stimulus trials.

      (5) Figure 2A, there is a change in baseline firing rates in V2M, MDTh, and Str. There  is no discussion. But what can cause this change? Recording instability, problem in  spiking sorting, or learning?

      It’s highly possible that the firing rates before visual stimulus onset is affected by previous reward history and task engagement states of the mice. Notably, though recorded simultaneously in same sessions, the changes in CR trials baseline firing rates in the V2M region were not observed in Hit trials.

      Thus, though we cannot completely rule out the possibility in recording instability, we see this as evidence of the effects on firing rates from changes in trial history or task engagement during learning.

      References:

      Carandini, M., & Churchland, A. K. (2013). Probing perceptual decisions in rodents. Nat Neurosci, 16(7), 824-831. doi:10.1038/nn.3410.

      Cruz, K. G., Leow, Y. N., Le, N. M., Adam, E., Huda, R., & Sur, M. (2023).Cortical-subcortical interactions in goal-directed behavior. Physiol Rev, 103(1), 347-389. doi:10.1152/physrev.00048.2021

      Esmaeili, V., Oryshchuk, A., Asri, R., Tamura, K., Foustoukos, G., Liu, Y., Guiet, R., Crochet, S., & Petersen, C. C. H. (2022). Learning-related congruent and incongruent changes of excitation and inhibition in distinct cortical areas. PLOS Biology, 20(5), e3001667. doi:10.1371/journal.pbio.3001667

      Goldbach, H. C., Akitake, B., Leedy, C. E., & Histed, M. H. (2021). Performance in even a simple perceptual task depends on mouse secondary visual areas. Elife, 10, e62156. doi:10.7554/eLife.62156.

      Siegle, J. H., Jia, X., Durand, S., Gale, S., Bennett, C., Graddis, N., Heller, G.,Ramirez, T. K., Choi, H., Luviano, J. A., Groblewski, P. A., Ahmed, R., Arkhipov, A., Bernard, A., Billeh, Y. N., Brown, D., Buice, M. A., Cain, N.,Caldejon, S., Casal, L., Cho, A., Chvilicek, M., Cox, T. C., Dai, K., Denman, D.J., de Vries, S. E. J., Dietzman, R., Esposito, L., Farrell, C., Feng, D., Galbraith, J., Garrett, M., Gelfand, E. C., Hancock, N., Harris, J. A., Howard, R., Hu, B.,Hytnen, R., Iyer, R., Jessett, E., Johnson, K., Kato, I., Kiggins, J., Lambert, S., Lecoq, J., Ledochowitsch, P., Lee, J. H., Leon, A., Li, Y., Liang, E., Long, F., Mace, K., Melchior, J., Millman, D., Mollenkopf, T., Nayan, C., Ng, L., Ngo, K., Nguyen, T., Nicovich, P. R., North, K., Ocker, G. K., Ollerenshaw, D., Oliver, M., Pachitariu, M., Perkins, J., Reding, M., Reid, D., Robertson, M., Ronellenfitch, K., Seid, S., Slaughterbeck, C., Stoecklin, M., Sullivan, D., Sutton, B., Swapp, J., Thompson, C., Turner, K., Wakeman, W., Whitesell, J. D., Williams, D., Williford, A., Young, R., Zeng, H., Naylor, S., Phillips, J. W., Reid, R. C., Mihalas, S., Olsen, S. R., & Koch, C. (2021). Survey of spiking in the mouse visual system reveals functional hierarchy. Nature, 592(7852), 86-92. doi:10.1038/s41586-020-03171-x

      Sych, Y., Fomins, A., Novelli, L., & Helmchen, F. (2022). Dynamic reorganization of the cortico-basal ganglia-thalamo-cortical network during task learning. Cell Rep, 40(12), 111394. doi:10.1016/j.celrep.2022.111394

      Veling, H., Becker, D., Liu, H., Quandt, J., & Holland, R. W. (2022). How go/no-go training changes behavior: A value-based decision-making perspective. Current Opinion in Behavioral Sciences, 47,101206.

      doi:https://doi.org/10.1016/j.cobeha.2022.101206.

    1. eLife Assessment

      This study provides valuable insight into the role of actin protrusions in mediating early pre-endoyctic steps of human papillomavirus entry at the cell surface. Using state-of-the-art microscopy in an immortalized keratinocyte model, the authors present mostly solid evidence that filopodia actively promote the transfer of heparin sulfate-coated virions from the extracullar matrix to the viral entry factor CD151. Remaining gaps in the mechanistic model could be further supported by including a more expansive analysis of the fixed microscopy samples and live cell imaging to distinguish virion transfer from direct binding.

    2. Reviewer #1 (Public review):

      Summary:

      The author's goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released and interaction with the cell surface, specifically with CD151 was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary. The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage has been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data needs to be provided.

      Note on revisions:

      The authors did an excellent job in their revision to include data from the effect of proteolytic priming on their observed virion transfer to the cell body. All other minor issues were addressed adequately.

      The work could be especially critical to understanding the process of in vivo infection.

    3. Reviewer #2 (Public review):

      The study design involves infecting HaCaT cells (immortalised keratinocytes mimicking basal cells of a target tissue) and observing virus localization with and without actin polymerization inhibition by cytochalasin D (cytoD) to analyze virion transfer from the ECM to the cell via filopodial structures, using cellular proteins as markers.

      In the context of the model system, the authors stress in the revised version the importance of using HaCaT cells as a relevant 'polarized' cell model for infection. The term 'polarized' is used in the cell biological literature for epithelial cells to describe a strict apical vs. basolateral demarcation of the plasma membrane with an established diffusion barrier of the tight junction. However, HaCat cells do not form tight junctions. In squamous epithelia, such barriers are only found in granular layers of the epithelium. The published work cited in support of their claims either does not refer to polarity or only in the context of other cells such as CaCo-2 cells.

      Overall, the matter of polarity would be important, if indeed the virus could only access cell-associated HSPGs as primary binding receptor, or the elusive secondary receptor via the ECM in the used model system (HaCaT cells), if they would locate exclusively basolaterally. This is at least not the case for binding, as observed in several previous publications (just two examples: Becker et al, 2018, Smith et al., 2008). With only a rather weak attempt at experimental verification of their model system with regards to polarity of binding, the authors then go on to base their conclusions on this unverified assumption.

      This is one example of several in the manuscript, where claims for foundational premises, observations, and/or conclusions remain undocumented or not supported by experimental data.

      Another such example is the assumption of transfer of the virus from ECM to the tetraspanin CD151. Here, the conclusions are based on the poorly documented inability of the virus to bind to the cell body, which is in stark contrast to several previous publications, and raises questions. Thus, association with CD151 likely occurs both from ECM derived virus AND virus that binds to cells, so that any conclusions on the mode of association is possible only in live cell data (which is not provided). Overall, their proposed model thus remains largely unsubstantiated with regards to receptor switching.

      There are a number of important additional issues with the manuscript:

      First, none of the inhibitors have been tested in their system for efficacy and specificity, but rely on published work in other cell types. This considerably weakens the confidence on the conclusion drawn by the authors.

      Second, the authors aim to study transfer from ECM to the cell body and effects thereof. However, there are still substantial amounts of viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells. This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. This remains an issue despite the added Supple. Fig. 1, where also only sub cellular regions are being displayed. As a consequence the obtained data from time point experiments is skewed, and remains for the most part unconvincing, largely because the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting the association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could be originating from cell bound and ECM-transferred virions alike.

      Third, the use of fixed images in a time course series also does not allow to understand the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout. The microscopic analysis uses an extension of a plasma membrane stain as marker for ECM bound virions, this may introduce a bias and skew the analysis.

      Fourth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established. For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. But given the high density of objects on the plasma membrane, I am not convinced that doing the same by flipping only the plasma membrane will not also obtain similar numbers than the original.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary.

      We stated in the introduction on line 65/66 ´Two release mechanisms are discussed, that mutually are not exclusive´. This implies that we do not consider the shedding model as ‘the accepted model’. Furthermore, we do not state in the discussion neither that the shedding model is the preferred one. However, we referred to the shedding model in the discussion, because we find HS associated with transferred PsVs, which is in line with this model.

      The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided.

      Our findings are compatible with both models, and we do not aim to verify the shedding model neither want to disprove the priming model. However, as we understand, the referee wishes more visibility of the priming model. Therefore, using inhibitors previously used in the field, we tested whether inhibition of KLK8 or furin reduces PsV translocation to the cell body (after CytD wash off). Leupeptin blocks transport, while Furin inhibitor I still allows some initial translocation. We incorporated this new data as Figure 2 (line 265): “…we would expect that inhibition of L1 processing during the CytD incubation prevents the recovery of PsV translocation from the ECM to the cell body (Figure 2A and D). To test for this possibility, as employed in earlier studies, the protease inhibitor leupeptin was used to inhibit proteases including KLK8 which is required for L1 cleavage (Cerqueira et al. 2015). Employing this inhibitor, the PCC between PsV-L1 and F-actin staining remains negative after CytD removal, showing that for translocation indeed the action of proteases is required (Figure 2B and D). In contrast, inhibition of L2 cleavage by a furin specific inhibitor has no effect on the PCC (Figure 2C and D). However, it should be noted that we occasionally observe PsVs not completely translocating but accumulating at the border of the F-actin stained area (for example see Figure 2C (60 min)). This results in an increase of the PCC almost equal to complete translocation, explaining why the PCC remains unaffected despite a furin inhibitory effect. Hence, furin inhibition may have some effect on translocation that, however, is undetected in this type of analysis.’

      Moreover, we have added a paragraph discussing how our data integrates into the established model of the HPV infection cascade (line 604): ‘HPV infection is the result of several steps, starting with the initial binding of virions via electrostatic and polar interactions (Dasgupta et al. 2011) to the primary attachment site HS (Richards et al. 2013), which induces capsid modification (Feng et al. 2024; Cerqueira et al. 2015) and HS cleavage (Surviladze et al. 2015), enabling the virion to be released from the ECM or the glycocalyx. Next, virions bind to the cell surface to a secondary receptor complex that forms over time, and become internalized via endocytosis, before they are trafficked to the nucleus (Ozbun and Campos 2021; Mikuličić et al. 2021). Regarding the transition from the primary attachment site to cell surface binding, as already outlined in the introduction, two models are discussed. In one model, proteases cleave the capsid proteins. After priming, the capsids are structurally modified and the virion can dissociate from its HS attachment site. It has been suggested that capsid priming is mediated by KLK8 (Cerqueira et al. 2015) and furin (Richards et al. 2006). In our system, KLK8 inhibition blocks PsV transport, while furin inhibition has some effect that, however, cannot be detected in this analysis (Figure 2) suggesting furin engagement at later steps in the infection cascade. This is in line with earlier in vitro studies on the role of cell surface furin (Surviladze et al. 2015; Day et al. 2008; Day and Schiller 2009). In any case, our results align with both models of ECM detachment: one involving HS cleavage (HS co-transfer) and another involving capsid modification (by e.g., KLK8).’

      The model should be fitted into established entry events,…

      Please see our reply above.

      or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      We do not see any discrepancies; our observations are compatible with aspects of both the shedding and the priming model. That PsVs carry HS-cleavage products doesn´t imply that HS cleavage is sufficient or required for infection, or that the priming model would be wrong. We do not view our data as being in conflict with the priming model. Most of the above-mentioned papers are now cited.

      Altogether, we acknowledge that the study gains importance by directly testing the priming model within our experimental system. We are thankful for the above comments and addressed this issue.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection.

      Not obligatory, but strongly supportive (Bienkowska-Haba et al., Plos Path., 2018; Surviladze et al., J. Gen. Viro., 2015). As recently published by the Sapp lab (Bienkowska-Haba et al., Plos Path., 2018), ´Direct binding of HPV16 to primary keratinocytes yields very inefficient infection rates for unknown reasons.´ Moreover, the paper shows that HaCaT cell ECM binding of PsVs increases the infection of NHEK by 10-fold and of HFK by almost 50-fold.

      This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      The reviewing editor speculated that HaCaT cells may be a model system in which the in vivo relevant binding to the ECM can be better studied as in non-polarized cell types. This is because binding to the ECM cannot be bypassed by direct cell surface binding. The observation that only few PsVs bind to the basal cell membrane indeed suggests restricted diffusional access of PsVs to binding receptors of the basal membrane. The reviewing editor asked for an experiment showing that more PsVs bind after cell detachment. We performed this experiment and indeed find more PsVs binding to the cell surface of detached cells. This point is very important for the understanding of the study and now we mention it in several sections of the manuscript, as outlined in the following.

      Line 125: ‘Many PsVs that bind to the ECM may locate distal from the cell surface and are thus unable to establish direct contact with entry receptors. However, they are capable of migrating by an actindependent transport along cell protrusions towards the cell body (Smith et al. 2008; Schelhaas et al. 2008). We aimed for blocking this transport in HaCaT cells, a cell line that is widely used as a cell culture model for HPV infection. HaCaT cells closely resemble primary keratinocytes in key aspects: they are not virally transformed and produce large amounts of ECM that facilitates infection (Bienkowska-Haba et al. 2018; Gilson et al. 2020). In addition, HaCaT cells exhibit cellular polarity that enforces binding of virus particles to the ECM, as the virions cannot bind to receptors/entry components, such as CD151, Itgα6 and HSPGs that co-distribute on the basolateral membrane of polarized keratinocytes (Sterk et al. 2000; Cowin et al. 2006; Mertens et al. 1996), making them inaccessible by diffusion.’

      Line 205: ‘During the CytD incubation, PsVs bind to HSPGs of the basolateral membrane for 5 h. Still, in the cell body area hardly any PsVs are present (0.14 PsV/µm<sup>2</sup>, Supplementary Figure 1B). In the control, the PsV density is several-fold larger (Supplementary Figure 1B). This is expected, as the PsVs bind to the ECM and translocate to the cell body. We wondered whether there are more binding sites at the basal membrane that remain inaccessible to PsVs by diffusion because of the insufficient space between glass-coverslip and basolateral membrane. For clarification, we incubated EDTA detached HaCaT cells in suspension with PsVs for 1 h at 4 °C, followed by re-attachment for 1 h. Under these conditions, we find a PsV density 12.4-fold larger than after 5 h of CytD incubation of adhered cells (Supplementary Figure 1B and D). However, it should be noted that these values cannot be directly compared. Aside from the different treatments, another difference lies in the size of the basal membrane, as re-attachment of cells is not complete after only 1 h (compare size of adhered membranes in Supplementary Figure 1A and C). Therefore, the imaged membranes are likely strongly ruffled, which results in the underestimation of the size of the adhered membrane. As a result, we overestimate the PsVs per µm<sup>2</sup> (please note that we cannot re-attach cells for longer times as we would then lose PsVs due to endocytosis). On the other hand, we would underestimate the PsV density at the basal membrane if after re-attachment we image in part also some apical membrane. In any case, the experiment suggests that PsVs bind more efficiently if membrane surface receptors are accessible by diffusion. This is in support of the above notion that the basal membrane may provide more entry receptors than one would expect from the low density of PsVs bound after 5 h CytD (Supplementary Figure 1B). This suggests that under our assay conditions, PsVs cannot easily bypass the translocation from the ECM to the cell body by diffusing directly to the basal membrane. Hence, the large majority of PsVs that enter the cell were previously bound to the ECM. Therefore, HaCaT cells serve as an ideal model for studying the transfer of ECM bound HPV particles to the cell surface, which is similar to in vivo infection of basal keratinocytes after binding to the basement membrane (Day and Schelhaas 2014; Kines et al. 2009; Schiller et al. 2010; Bienkowska-Haba et al. 2018).’

      Line 529: ‘Filopodia usage not only facilitates infection but also increases the likelihood of virions to reach their target cells during wound healing, namely the filopodia-rich basal dividing cells. In fact, several types of viruses exploit filopodia during virus entry (Chang et al. 2016), hinting at the possibility that for HPV and other types of viruses actin-driven virion transport may play a more important role than it is currently assumed. If this is the case, sub-confluent HaCaT cells, or even better single HaCaT cells, would be an ideal model system for the study of these very early infection steps that involve ECM attachment and subsequent filopodia-dependent transport. As shown in Supplementary Figure 1, HaCaT cells have many binding sites for the HPV16 PsVs. However, as they are polarized and the binding receptors are only at the basal membrane, they remain relatively inaccessible by diffusion. Therefore, the ECM binding that is also observed in vivo (Day and Schelhaas 2014) and subsequent transport via filopodia are used upon infection of HaCaT cells that locate at the periphery of cell patches. Here, PsVs bind to the ECM which strongly enhances infection of primary keratinocytes (Bienkowska-Haba et al. 2018). In contrast, HPV can readily bind to HSPGs on the cell surface of nonpolarized cells, and by this bypasses ECM mediated virus priming and the filopodia dependency. We propose that HaCaT cells are a valuable system for studying the very early events in HPV infection that allows for dissecting capsid interaction with ECM resident priming factors and cell surface receptors.’

      Finally, please note that in the previous version of the manuscript, we did not question that in many cellular systems PsVs interact with heparan sulfate proteoglycans (HSPGs) present on the cell surface, or both on the cell surface and the ECM. We stated on line 59 ´While in cell culture virions bind to HS of the cell surface and the ECM, it has been suggested that in vivo they bind predominantly to HS of the extracellular basement membrane (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      We hope that after adding the above explanations and the experiment requested by the reviewing editor it is now clear why only few PsVs bind directly (not via the ECM) to the cell surface. We appreciate the reviewer’s and the reviewing editor’s input that has significantly improved the manuscript.

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface.

      There is staining. However, as the staining at the periphery is stronger and images are shown at the same settings of brightness and contrast, the impression is given that the cell surface is not stained. We have added more images showing HS cell surface staining.

      (i) Supplementary Figure 4C shows an enlarged view of the CytD/0 min cell shown in Figure 6A. In the area stained by Itgα6, that marks the cell body, HS staining is present, although less abundant in comparison to the ECM.

      (ii) In Figure 8, CytD/30 min, a cell is shown with abundant HS in the cell body region (compare cyan and green LUT).

      (iii) In newly added Figure 3A, lower panel, another cell with HS in the cell body region is shown.

      Please note that the staining is highly variable. We indicate this by stating on Line 373: ‘The pattern of the HS staining (cyan LUT) and the overlap of HS with PsVs and Itgα6 are highly variable (Figure 6A).’

      Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      The transient increase in the PCC at CytD/30 min can be interpreted as PsV/HS co-transport or as direct binding of PsVs to cell surface HSPGs. However, two arguments support co-transport.

      First, we find that CytD/PsVs increases the HS intensity (see newly added Figure 3, confirming old Figure 5 that is now Figure 6). We state on line 290 ‘… that without actin-dependent PsV translocation HS cleavage products are retained in the ECM, consistent with the hypothesis that cleaved HS remains associated with PsVs (Ozbun and Campos 2021).

      Second, the distance between HS and Itgα6 (the cell body marker) decreases over time after CytD removal, which suggests movement of HS to the cell body (Supplementary Figure 8D). We state on line 422: ‘The movement of HS towards the cell body after removal of CytD, which indirectly demonstrates that PsVs are coated with HS, is suggested by a shortening of the HS-Itgα6 distance over time (Supplementary Figure 8D).’

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels.

      Some areas are covered with confluent cells, to which hardly any PsVs are bound, because accessing their basolateral membrane is nearly impossible, and PsVs do not bind to the exposed apical membrane as well. We assume this is a major difference to cultures of unpolarized cells, where PsVs should distribute more or less equally over cells. This means that in our experiments the vge/cell is not a suitable parameter for relating the magnitude of an effect to a defined number of PsVs. In the ECM, the PsV density is very high, enabling one cell to collect, in theory, several hundred PsVs, much more than expected from the 50 vge/cell.

      We state on line 135: ‘Frequently, we observe patches of confluent cells which are common to HaCaT cells. Cells at the center of these patches are dismissed during imaging, because there are no anterogradely migrating PsVs at these cells. A second reason for our dismissal of these cells is that hardly any PsVs are bound to them, possibly because their basal membranes are inaccessible by diffusion. Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. In these cells, we find more PsVs per cell than one would expect from the employed 50 viral genome equivalents (vge) per cell, indicating that PsVs are unequally distributed between the cells.’

      The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated.

      We did not claim that PsVs induce shedding, we rather believe they retain shedded HS. Without PsVs, the shedded HS is washed off from the ECM. We have reproduced the observation made in old Figure 5 (now Figure 6) in the newly added Figure 3 that also shows that PsVs alone have no effect on the HS intensity, only when present together with CytD. We state on line 277: ‘As outlined above, during the 5 h incubation with CytD, proteases in the ECM are expected to cleave HS chains. These cleavage products should be able to diffuse out of the ECM, unless they remain associated with nontranslocating PsVs. In the control, PsV associated HS cleavage products would leave the ECM through PsV translocation…. Using an antibody that reacts with an epitope in native heparan sulfate chains, only after CytD and if PsVs are present, the level of HS staining is significantly increased (Figure 3B). As shown in Figure 3A, stronger HS staining at PsVs (open arrows) and as well in PsV free areas (closed arrows) was observed… Collectively, our findings indicate that without actin-dependent PsV translocation HS cleavage products are retained in the ECM, consistent with the hypothesis that cleaved HS remains associated with PsVs (Ozbun and Campos 2021).’

      If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

      We have tested the antibody by which we obtain only a very weak staining (Supplementary Figure 2), not allowing to differentiate between an increase in the cell periphery and the cell body area. We still include the experiment as it suggests that CytD has no effect on HS processing. We state on line 286: ‘As additional control and shown in Supplementary Figure 2, we use an antibody that reacts with a HS neo-epitope generated by heparitinase-treated heparan sulfate chains (Yokoyama et al. 1999; for details see methods). This neo-epitope staining is independent of the presence of CytD and the incubation time, suggesting that CytD does not directly affect HS processing.’

      Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      The study identifies a rapid translocation step from the ECM to CD151 assemblies. We have no data that demonstrates a physical interaction between PsVs and CD151. In the model figure, we draw CD151 as part of the secondary receptor complex. We are sorry for having raised the impression that PsVs would bind directly to CD151 and have modified the model Figure accordingly. In the new model figure (Figure 9), the first contact established is to a CD151 free receptor.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      As we focus on early events, we are not concerned about CytD blocking as well late steps in the infection cascade, like endocytosis. However, we agree that a comparison between CytD and blebbistatin would be very interesting. We added Figure 8, showing that blebbistatin only partially stops migration.

      Line 429: ‘Actin retrograde transport, which underlies the here observed virion transport, is the integrative result of three components (Smith et al. 2008; Schelhaas et al. 2008)…. As CytD broadly interferes with F-actin dependent processes, we investigated the effects upon inhibition of only one of the three components, namely the myosin II mediated retrograde movement towards the cell body. Instead of CytD, we employed in the 5 h preincubation the myosin II inhibitor blebbistatin. For the control (0 min), we show in Figure 8A one example of a cell with comparatively many PsVs at the periphery (as mentioned above, the PsV pattern is highly variable) to better illustrate the difference to the PsV pattern occasionally seen with blebbistatin. After blebbistatin treatment (0 min), PsVs are still distal to the cell body but less dispersed than after CytD treatment, seemingly as if translocation started but stopped in the midst of the pathway (Figure 8A, blebbistatin). The PCC between PsVs and HS, like after CytD (Figure 6C), is elevated after blebbistatin, albeit the effect is not significant (Figure 8C). The cell body PCC, is not at 30 min (CytD) but already at 0 min elevated (compare Figure 6D to Figure 8D), which can be explained by partial translocation. This is further supported by the fact that only 8% of PsVs are closely associated with HS (Figure 8E; blebbistatin, 0 min) compared to 15% after CytD treatment (Figure 6E; 0 min). Furthermore, after 0 min PsV incubation with blebbistatin we observe no effect on the HS intensity (compare Figure 8B to Figure 3B and Figure 6B). Hence, in contrast to CytD, blebbistatin does not trap the PsVs in the ECM where they associate with HS, but ongoing actin polymerization pushes actin filaments along with PsVs towards the cell body.’

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      Please see our detailed reply to referee #1 that has raised the same issue. In brief, we agree that in multiple cell culture systems viruses bind preferentially to the cell surface directly. However, in HaCaT cells, the majority of PsVs does not bind directly to the basal membrane but gets there after initial binding to the ECM. Thus, we believe our system appropriately models the physiologically relevant scenario of ECM-to-cell transfer, as also speculated by the reviewing editor that has suggested an experiment showing that more PsVs bind to detached cells (please see above).

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      As already stated above, we observe massive binding of PsVs to the ECM, in contrast to very few PsVs that diffuse beneath the basolateral membrane of the polarized HaCaT cells and do bind directly to the cell surface. In other cellular systems, cells may hardly secrete ECM, are not polarized, and therefore virions can easily bypass ECM binding. Therefore, it is reasonable to assume that in HaCaT cells the large majority of PsVs found on the cell body originates from the ECM.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout.

      The newly added blebbistatin experiment suggests that the initial translocation is exclusively dependent on retrograde actin flow. However, we agree that we are not able to unravel more details regarding the different possible contributions to the movement. Importantly, the lack of PCC increase after CytD/leupeptin removal (Figure 2D) suggest there is not much cell spreading into the area of accumulated PsVs. Please see our more detailed reply to the same issue raised by the same referee in the recommendations for the authors.

      The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      The dye TMA-DPH stains exclusively cellular membranes and not the ECM. The stain is actually used to delineate the cell body from the ECM area (please see Figure 1).

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established.

      We agree that the way of how randomization is done is very important. Regarding the association of PsVs with CD151 and HS, we corrected for random background association, which is now explained in more detail in in the Figure legend of Supplementary Figure 7: “On flipped images, we often find values more than half of the values of the original images, demonstrating that many PsVs have a distance ≤ 80 nm to CD151 merely by chance (background association)… (C) Each time point in (A) and (B) obtained from flipped images is the average of three biological replicates. We use these altogether 24 data points, plotting the fraction of closely associated PsVs against the CD151 maxima density. The fraction increases with the maxima density, as the chance of random association increases with the maxima density. The fitted linear regression line describes the dependence of the background association from the maxima density. As a result, the background association (y) can be calculated for any maxima density (x) in original images with the equation y = 2.04x. Please note that the CytD/0 min may be overcorrected as we subtract background association with reference to the CD151 maxima density of the entire ROI (for an example ROI see Supplementary Figure 6A), although the local maxima density at distal PsVs is lower. On the other hand, PsVs at the cell border may have a larger local CD151 maxima density and consequently are undercorrected.’

      For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original.

      We are aware of this problem. For instance, it would produce ‘artificially’ low PCCs after flipping images of PsV/HS stainings (please see negative PCC value after flipping in Supplementary Figure 8). In this case, we do not use as argument that in flipped images the PCC is lower. Instead, we would argue that over time the PCC changes in the original images. We still provide the PCC values of flipped images, as additional information, showing that in most cases we obtain after flipping a PCC of zero, as expected

      Hence, we fully agree that careful controls in image analysis is required, and used the above-described method for the correction of background association when the fraction of closely associated PsVs is analyzed. We do not use a lower PCC value in flipped images as argument if not appropriate.

      I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals.

      Figure 6D and 8D show the PCC specifically of the cell body (only of plasma membrane ROIs). In flipped images (not shown in the previous version for clarity), we obtain significantly lower PCCs (Supplementary Figure 8F/G and Supplementary Figure 10C/D. We propose that in this case it would be appropriate to use a lower PCC of flipped images as argument for specific association. Still, also in this experiment we argue with a change in the PCC over time, and not with a PCC of zero after flipping. As above, we still provide the PCC values of flipped images as additional information.

      Also, there should be a higher n for the measurements.

      One replicate is based on the average of 14-15 cells for each condition (more for figure 4). Hence, in a typical experiment (Control and CytD with 4 time points) about 120 cells are analyzed, which is a broad basis for the averages of one replicate.

      We realize that with three biological replicates we find significant effects only if we have strong effects or moderate effects with very low variance.

      Recommendations for the authors:

      Reviewing Editor:

      The focus on the events of HPV infection between ECM binding and keratinocyte-specific receptor binding is unique and interesting. However, I agree with the reviewers that some of the conclusions could use more experimental support, as detailed in their comments. The failure to detect direct binding of the PsV to HSPGs on the cell surface in in vitro assays contradicts much of the published literature. For example, others have found that HPV capsids bind cultured cell lines in suspension, i.e, in the absence of ECM. Do EDTA-suspended HaCaT cells bind PsV? Is the binding HSPG dependent? If the authors think that failure to detect direct cell binding of HaCaTs is an unusual feature of these cell lines or culture condition,s then it would be helpful to provide an explanation. However, it is worth noting that an in vitro system where the cells do not directly bind capsids through HSPG interactions would be a much better model for studying the stages of HPV infection that are the focus of this study, since there is no direct binding of keratinoctyes in vivo.

      We are thankful for this comment that had a strong influence on the revision. The suggested experiment has been incorporated as new Supplementary Figure 1. It shows that many more PsVs bind to the cell surface of cells in suspension than to adhered cells. As suggested by the reviewing editor, we explain now that HaCaT cells are a suitable model system for studying the in vivo transport from the ECM to the cell body that in these cells, due to their polarization, cannot be bypassed (for more details please see our replies above addressing these issues).

      Because conclusions drawn regarding HS interactions are largely based on experiments using a single HS mAb, it is important that the specificity of this mAb is described in more detail, either based on the literature or further experimentation.

      We provide now detailed information about the HS antibodies used in the study. We state on line 282 ‘Using an antibody that reacts with an epitope in native heparan sulfate chains…’ and on line 286 ‘we use an antibody that reacts with a HS neo-epitope generated by heparitinase-treated heparan sulfate chains…’ and in the methods section ‘For Heparan sulfate (HS) a mouse IgM monoclonal antibody (1:200) (amsbio, cat# 370255-S) was used that reacts with an epitope in native heparan sulfate chains and not with hyaluronate, chondroitin or DNA, and poorly with heparin (mAb 10E4 (David et al., 1992)). For HS neo-epitope (Yokoyama et al., 1999) detection, a mouse monoclonal antibody (1:200) (amsbio, cat#370260-S) was used that reacts only with heparitinase-treated heparan sulfate chains, proteoglycans, or tissue sections, and not with heparinase treated HSPGs. The antibody recognizes desaturated uronic acid residues (mAb 3G10 (David et al., 1992)).’

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "tight association" or similar is repeatedly used and is not acceptable for microscopic studies; use "close association", which has no affinity connotations.

      Has been changed as suggested by the referee.

      (2) Why are lysine-coated coverslips used for microscopy? HaCaT cells adhere tightly to untreated glass, and this coating could affect the distribution of ECM and extracellular PsV.

      We believe a tight association of the basal cell membrane to its substrate, as in vivo, where the basal membrane is tightly adhered to other cells, is important in these experiments. In weakly adherent cells more PsVs may bind to the cell surface, bypassing the transport step. Hence, although HaCaT cells may not require the coat and would be able to adhere to glass, the association may not be tight enough to mimic in vivo conditions.

      (3) What is the reason to use detection of the pseudogenome for some of the experiments instead of L1 detection throughout? The process of EdU detection is sufficiently denaturing to affect some protein epitopes. The introduction of this potential artifact doesn't seem warranted for capsid detection experiments.

      The L1 and the Itgα6 antibody are from the same species, wherefore we have used in Figures 4 and 6 click-labeling of the reporter plasmid. We do not disagree with the notion of the referee, that EdU detection may denature the epitope of some proteins. For instance, we have observed a different staining pattern for CD151; for Itgα6 and HS we saw no obvious difference in the staining patterns. In double staining experiments using L1 antibody and click-labeling, both staining patterns overlapped very well, indicating that click-labeling is suitable to visualize PsVs.

      (4) What concentration of TMA-DPH was used?

      TMA-DPH is a poorly water-soluble dye that becomes strongly fluorescent upon insertion into a membrane. Because of its poor water solubility, a precise concentration cannot be given. We added 50 µl of a saturated TMA-DPH solution in PBS to 1 ml of PBS in the imaging chamber. We state this now in the methods section.

      (5) Line 419: This statement is misleading. Although PsV interaction with HSPG on the ECM is crucial for infectious transfer to cells, the majority of the PsV binding on the ECM has been attributed to interaction with laminin 332. Treatment of PsV with heparin causes sequestration to the ECM.

      We are sorry for the confusion and have removed the misleading statement.

      (6) Some reference choices are poor:

      Line 54: Ozbun and Campos, this is not the correct reference

      In the review we cited, in the introduction it is stated that PsVs establish infection via a break in the epithelial barrier? However, we have replaced this reference by a review that focuses more on epithelial wounding: ‘Ozbun, Michelle A. (2019): Extracellular events impacting human papillomavirus infections: Epithelial wounding to cell signaling involved in virus entry. In Papillomavirus research (Amsterdam, Netherlands) 7, pp. 188–192. DOI: 10.1016/j.pvr.2019.04.009.’

      Line 2012: Doorbar et al., this is not the correct reference.

      Thank you for pointing this out (..we assume the referee refers to line 104 and not line 2012). We have noticed this error during revision. As it is difficult to get a specialized review on this topic, we now cite Ozbun and Campus, 2021 that states PsVs are ‘structurally and immunologically indistinguishable from lesion- and tissue-derived HPVs.’

      Minor issues:

      (1) It is difficult to appreciate the ECM and cell surface binding pattern from the provided images, which do not even contain an entire cell. We need to see a few representative field views with the ECM delineated with laminin 332 staining, as HS antibodies stain both the ECM and cell surface.

      We now provide overview images in Supplementary Figure 4. The only experiment requiring a clear delineation between ECM and cell surface is the experiment of Figure 4. Here, we do not use the HS as a reference staining because it stains both the ECM and the cell surface.

      (2) For Figure 1E, the cells were only infected for 24 hours. The half-time for infectious internalization of HaCaT cells was shown to be 8 hours for cell-associated PsV and closer to 20 hours for PsV that was associated with the ECM prior to cell association (Becker et al., 2018). Why was such a short infection time chosen?

      During assay establishment it has been observed that after 24 h the luciferase activity is optimal.

      (3) Figure 5, the staining of uninfected cells +/- cyto treatment needs to be included.

      Now visible in new Figure 3.

      I am confused by lines 54-57. It seems as if the authors are claiming that HSPGs are not present on the ECM. This sentence, as written, is misleading.

      We agree, and state now on line 58 ‘Here, virions bind to the linear polysaccharide heparan sulfate (HS) that is present in the extracellular matrix (ECM) but as well on the plasma membrane surface. HS is attached to proteins forming so called heparan sulfate proteoglycans (HSPGs).’

      Reviewer #2 (Recommendations for the authors):

      There are further issues that are not pertaining to the study design that I find important.

      (1) It remains speculative whether the virions that are transferred from the ECM are actually structurally modified.

      The newly added Figure 2, showing that leupeptin blocks infection in our assay, suggests that virions indeed are primed.

      (2) The origin of HS correlated with virions on the cell body after transfer is also not clear: does the virus associate with cell surface HS, or does it bring HS from the ECM? Simply staining HS against Nsulfated moieties does not allow such conclusions.

      This issue has been already raised in the public review to which we replied above. In brief, we agree that the transient increase of the PCC between PsVs and HS in the cell body region can be also explained by PsVs coming from the ECM without HS and binding to cell surface HS, or from PsVs binding directly (not via the ECM) to cell surface HSPGs. However, there are two more arguments indicating that PsVs are coated with HS. Please see our detailed reply above.

      (3) Figure 1: There are few, if any, filopodia in untreated cells. It would be good to quantify their abundance to substantiate that resting HaCat cells are indeed a good model for filopodial transport bs. membrane retraction / spreading. In HaCat ECM, the virus also binds to laminin-332 for a good part. Would this not also confound the analysis?

      At first glance, the number of filopodia appears to be too low to account for such an efficient transport. However, please note that the formation of filopodia is very dynamic, and that they can form and disappear within minutes (see below). We also often observe many PsVs aligned at one filopodium. Moreover, not every cell periphery exhibits large accumulations of PsVs. Therefore, we believe it is in principle possible that filopodia are largely responsible for the transport. We cannot exclude that we overestimate the transport rate due to partial cell spreading after CytD removal, which, however, we consider as rather unlikely as in Figure 2 we observe no increase in the PCC when leupeptin was present during the CytD incubation. Under these conditions, PsVs do not translocate but cells could spread, and this would increase he PCC between PsVs and F-actin if cells would spread into the area of accumulated PsVs.

      We now state on line 304: ‘This suggests that the half-time of PsV translocation from the periphery to the cell body is about 15 min. In fact, the half-time maybe longer, as we cannot exclude that cell spreading after CytD removal contributes to less PsVs measured in the cell periphery.’ and on line 477 ‘As mentioned above, the half-time could be longer if cell spreading is in part responsible for the translocation of PsVs onto the cell body. However, we assume that this is rather unlikely, as cell spreading would increase the PCC between PsVs and F-actin under a condition where filopodia mediated transport is blocked but not cell spreading, which is not the case (Figure 2B and D, CytD/leupeptin).’

      (4) Figure 2: This would benefit from live cell analysis. There are considerable amounts of virions on the cell body, which partially contradicts statements from Figure 1.

      Does the referee refer to the images shown in Figure 4 (old Figure 2)? Please note that at CytD/0 min there are hardly any PsVs in the cell body region, the fluorescence (magenta LUT) is autofluorescence (this is explained in the results section). Only at later time points PsVs are in the cell body region.

      The fast transfer to the cell body after cyto D washout is based on the assumption that filopodia formation and transport along them (and not membrane extension) occur quickly. Is this reasonable?

      We are no experts on filopodia, but one finds references suggesting that they grow at rates of several µm per minutes and have lifetimes between a few seconds and several minutes. Hence, within the 15 min we determine for the transport, cells may need a few minutes to recover from CytD, a few minutes to form filopodia that reach out into the ECM, and a few minutes for the transport itself. However, we agree that we cannot exclude membrane extension contributing to our observed transport, although we consider this as rather unlikely (see above).

      (5) Figure 3: The rationale of claiming the existence of 'endocytic structures' needs to be better explained and quantified in the according supplementary figure.

      We now state in the legend ‘We propose that the agglomerated CD151 maxima close to PsVs feature the characteristics of endocytic structures, as CD151 has been shown to co-internalize with PsVs (Scheffer et al. 2013), and as these structures invaginate into the cell, like PsV filled tubular organelles previously described by electron microscopy (Schelhaas et al. 2012).’ For a proper quantification of these highly variable structures a much larger sample would be required.

      The formation of virus-filled tubules upon cytoD treatment has been previously reported. Are these viruses that come from the cell body or from the ECM?

      With the new data and explanations that have been added to the manuscript, it should be clear that it is reasonable to assume that they come largely from the ECM.

      (6) Figure 4: How are the subcellular ROIs chosen? Is there not a bias by not studying a full cell?

      We now explain better how we chose cells for analysis. We state on line 138 ‘Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. In these cells, we find more PsVs per cell than one would expect from the employed 50 viral genome equivalents (vge) per cell, as PsVs are unequally distributed between the cells. Moreover, these PsVs usually are not homogenously distributed around the cell but concentrate at one region. We investigate the translocation of PsVs from these regions, defining ROIs for analysis that cover PsVs at the periphery and the cell body (see Supplementary Figures 6A and 8A).’

      (7) Figure 5/6: The data needs a better analysis on correlation by using randomisation as explained above.

      Please see our reply to the same point of the public review raised by the same referee.

      (8) Figure 7: This model involves CD151 being a mediator in transfer, but this has not been functionally shown. There are HaCaT CD151 KO cells available (from the Sonnenberg lab), it would be good to use those to test the model and whether transfer indeed involves CD151.

      As already stated above, we are sorry for having raised the impression that PsVs bind directly to CD151. The model Figure has been modified. Please see our reply above.

      (9) The manuscript would benefit from a number of experiments addressing the most crucial issues:

      (a) As mentioned before, the use of blebbistatin, which blocks myosin II function and arrests actin retrograde flow within seconds of addition, would be a good inhibitor to control for transfer in at least some of the most crucial experiments.

      In Figure 8 we have tested blebbistatin. Please see our reply above.

      (b) Live cell analysis would allow for monitoring of whether membrane retraction upon cytoD treatment would have to be taken into account for the analysis of the data. The same is true for the cytoD washouts, upon which most cells exhibit pronounced membrane spreading. The latter is important to support filopodial transport rather than membrane ruffling and spreading, leading to the clearance of extracellular virions from the ECM.

      We agree that this would be desirable. As replied above, we now discuss the issue of possible membrane spreading and reason why we consider it as rather unlikely.

      (c) To rid oneself of the issue of plasma membrane-bound virions as a confounding factor, one could use cells treated by sodium chlorate, which leads to undersulfation of HS on the cell surface, and seed them onto ECM with functional HSPGs. This would then indeed establish that the HS and virus are transferred together.

      We agree that this would be a smart experiment. As the main focus of our study is not clarifying whether PsVs are coated with HS or not, we gave other experiments priority.

      (10) The manuscript is, while carefully and thoughtfully worded on the issue of microscopy analysis, for a good part, extrapolating too strongly from the authors' data and unsubstantiated assumptions to conclude on their model. It would be good if the authors would support their claims with previous or their own experimental work. Just two examples of several: the assumption that cell-bound virions are negligible should be substantiated, as the literature would indicate otherwise.

      We determined the PsV density in adhered, CytD treated cells, and find around 0.14 per µm<sup>2</sup> (Supplementary figure 1B), which is 4 to 5-fold less when compared to the PsV density quantified in an area covering the cell body and the periphery (Figure 1B, see line 174 for PsVs/µm<sup>2</sup> values). Quantifying the PsV density only in the periphery would yield a severalfold larger difference. However, due to the limited resolution of the microscope we would strongly underestimate the PsV density in the accumulations. We prefer not to discuss this in detail, as exact numbers are difficult to obtain.

      Line 129: Cyto D should not inhibit the enzymes modifying HS or proteins (including virions). This is true, but cytoD may limit their secretion and abundance.

      We show in Figure 3 that CytD does not reduce HS staining (e.g., by limiting HS secretion, as suggested by the referee), suggesting that it rather does not limit secretion.

      We thank the referee´s and the reviewing editor for their helpful comments!

    1. eLife Assessment

      This valuable work advances our understanding of the relationship between multimodal magnetic resonance imaging (MRI) measures, cognition, and mental health. Compelling use of statistical learning techniques in UK Biobank data shows that 48% of the variance between an 11-task derived g-factor and imaging data can be explained. Overall, this paper contributes to the study of brain-behaviour relations and will be of interest for both its methods and its findings on how much variance in g can be explained.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities the authors used a so-called a stacking approach, which employs two levels of machine learning. First, they build a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they use predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) Big study population (UK Biobank with 14000 subjects)

      (2) Description of methods (including Figure 1) is helpful in understanding the approach

      (3) Final manuscript improved after revision

      Weaknesses:

      (1) The relevance of the question is now better described, but the impact of the work is more of conceptual value than of direct clinical value.

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is now further explained, but remains a bit counterintuitive.

      Note: the computational aspects of the methods fall beyond my expertise.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation. I have reviewed the paper before and the authors addressed my comments very well.

      Strengths:

      Large sample (UK biobank data) and clear description of advanced analyses.

      Weaknesses:

      My main concern in my previous review was that it was not completely clear to me what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      Thank you for this insightful observation. We agree that establishing the real-world significance of fundamental research is paramount, and we have revised our manuscript to better articulate this relevance.

      For clinicians, our work (a) corroborates the link between cognition and mental health, confirming the transdiagnostic role of cognition, and (b) demonstrates that current neuroimaging tools can capture the neurobiology underlying this relationship. These findings offer several implications for clinical practice. First, they support the development of interventions aimed at enhancing cognitive functioning as a pathway to improving mental health. Second, our work introduces neuroimaging as a potential tool for assessing the neurobiological basis of the cognition–mental health connection. With further research, clinicians may be able to use neuroimaging to track cognitive changes at the neural level, which could help monitor treatment efficacy for interventions (e.g., stimulant medications for ADHD) designed to boost cognitive functioning.

      Following your suggestions, we have expanded the Discussion (Line 684) to include future directions and clinical perspectives on the findings.

      Line 684: “Neuroimaging offers a unique window into the biological mechanisms underlying cognition–mental health overlap – insights unattainable from behavioural data alone. Our findings validate brain-based neural markers as a core unit of analysis for cognitive functioning, advancing mental health research through the lens of cognition. Beyond this conceptual contribution, the study has clinical implications. First, by demonstrating a transdiagnostic link between cognition and mental health, we support interventions that enhance cognition as a pathway to improving mental health. Second, we show neuroimaging as an effective tool for assessing the neurobiological basis of this link. Quantifying neuroimaging’s capacity to capture this relationship is essential for future research integrating imaging with cognitive testing to monitor treatment-related neural changes. Such work could enable personalised interventions, using neuroimaging to track cognitive changes and treatment efficacy (e.g., stimulant medications for ADHD) aimed at boosting cognitive functioning.”

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      Thank you for pointing this out. We appreciate your concern regarding the interpretation of positive and negative PLSR loadings. To clarify:

      (1) The directions of PLSR loadings are broadly consistent with univariate correlations, suggesting that the somewhat counterintuitive relationships mentioned are shown even when we apply simply univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. It constructs new components – linear combinations of predictors – that simultaneously explain variance in the predictors and their covariance with the response.

      (2) The positive loading of distress likely reflects cohort-specific questionnaire design in the UK Biobank, where feeling of distress was tied to seeking medical help. Individuals with higher cognition and socioeconomic status may be more likely to seek professional support, which explains the counterintuitive direction.

      (3) The negative loadings of wellbeing and happiness may also reflect cohort-specific effects, such as older age, and align with prior work linking excessive optimism to poorer reasoning and cognitive performance. This suggests that realism or pessimism may sometimes be associated with better cognition, particularly in older adults.

      These points are discussed in detail in the manuscript (Lines 493–545). We have emphasised that some of these findings may be cohort-specific and cited supporting literature, as seen below.

      (1) How to explain that distress has a positive loading and anxiety/trauma has a negative loading?

      Line 493: “The directions of PLSR loadings were broadly consistent with univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. Consistently, both univariate correlations and factor loadings derived from the PLSR model indicated that scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.”

      Line 529: “Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].”

      (2) How to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma?

      Line 545: “Finally, both negative PLSR loadings and corresponding univariate correlations for features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the gfactor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

      Thank you for pointing this out. We acknowledge that the analysis plan was not preregistered, as our approach was primarily data‑driven rather than hypothesis‑driven. We essentially applied the machine learning approach to quantify the strength of the cognition-mental health relationship in relation to neuroimaging. To ensure transparency and reproducibility, we have made all analysis code and intermediate outputs publicly available on our GitHub repository (https://github.com/HAM-lab-Otago-University/UKBiobank/) within the constraints of UK Biobank’s ethical policy and provided a detailed description of each methodological step in the Supplementary Materials.

      Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

      Thank you for your positive feedback and for recognizing the strengths of our work. We appreciate your comments and are happy that the revisions addressed your concerns.

    1. eLife Assessment

      This study offers a valuable methodological advance by introducing a gene panel selection approach that captures combinatorial specificity to define cell identity. The findings address key limitations of current single-gene marker methods. The evidence is compelling, but would be strengthened by further validation of rare cell states and unexpected marker categories.

    2. Joint Public Review:

      In this study, the authors introduce CellCover, a gene panel selection algorithm that leverages a minimal covering approach to identify compact sets of genes with high combinatorial specificity for defining cell identities and states. This framework addresses a key limitation in existing marker selection strategies, which often emphasize individually strong markers while neglecting the informative power of gene combinations. The authors demonstrate the utility of CellCover through benchmarking analyses and biological applications, particularly in uncovering previously unresolved cell states and lineage transitions during neocorticogenesis.

      The major strengths of the work include the conceptual shift toward combinatorial marker selection, a clear mathematical formulation of the minimal covering strategy, and biologically relevant applications that underscore the method's power to resolve subtle cell-type differences. The authors' analysis of the Telley et al. dataset highlights intriguing cases of ribosomal, mitochondrial, and tRNA gene usage in specific cortical cell types, suggesting previously underappreciated molecular signatures in neurodevelopment. Additionally, the observation that outer radial glia markers emerge prior to gliogenic progenitors in primates offers novel insights into the temporal dynamics of cortical lineage specification.

      However, several aspects of the study would benefit from further elaboration. First, the interpretability of gene panels containing individually lowly expressed genes but high combinatorial specificity could be improved by providing clearer guidelines or illustrative examples. Second, the utility of CellCover in identifying rare or transient cell states should be more thoroughly quantified, especially under noisy conditions typical of single-cell datasets. Third, while the findings on unexpected gene categories are provocative, they require further validation - either through independent transcriptomic datasets or orthogonal methods such as immunostaining or single-molecule FISH-to confirm their cell-type-specific expression patterns.

      Specifically, the manuscript would benefit from further clarification and additional validation in the following areas:

      • A more in-depth explanation of marker panel applications is needed. Specifically, how should users interpret gene panels where individual genes show only moderate or low expression levels, but the combination provides high specificity? Providing a concrete example, along with guidelines for interpreting such combinatorial signatures, would enhance the practical utility of the method.

      • Further quantification of CellCover's sensitivity in detecting rare cell subtypes or states would strengthen the evaluation of its performance. Additionally, it would be helpful to assess how CellCover performs under noisy conditions, such as low cell numbers or read depths, which are common challenges in scRNA-seq datasets.

      • It is intriguing and novel that CellCover analysis of the dataset from Telley et al. suggests cell-type-specific expression of ribosomal, mitochondrial, or tRNA genes. These findings would be significantly strengthened by additional validation. For example, the reported radial glia-specific expression of Rps18-ps3 and Rps10-ps1, as well as the postmitotic neuron-specific expression of mt-Tv and mt-Nd4l, should be corroborated using independent scRNA-seq or spatial transcriptomic datasets of the developing neocortex. Alternatively, these expression patterns could be directly examined through immunostaining or single-molecule FISH analysis.

      • The observation that outer radial glia (oRG) markers are expressed in neural progenitors before the emergence of gliogenic progenitors in primates and humans is compelling. This could be further supported by examining the temporal and spatial expression patterns of early oRG-specific markers versus gliogenic progenitor markers in recent human spatial transcriptomic datasets - such as the one published by Xuyu et al. (PMID: 40369074) or Wang et al. (PMID: 39779846).

      Summary:

      Overall, this work provides a conceptually innovative and practically useful method for cell type classification that will be valuable to the single-cell and developmental biology communities. Its impact will likely grow as more researchers seek scalable, interpretable, and biologically informed gene panels for multimodal assays, diagnostics, and perturbation studies.

    3. Author response:

      A more in-depth explanation of marker panel applications is needed. Specifically, how should users interpret gene panels where individual genes show only moderate or low expression levels, but the combination provides high specificity? Providing a concrete example, along with guidelines for interpreting such combinatorial signatures, would enhance the practical utility of the method.

      We appreciate the need to explain and demonstrate how to use the novel combinatorial gene marker sets that CellCover generates. To be clear, individual genes expressed at low levels and in small numbers of cells, in general, have high specificity (the ability to mark cells of a particular type without erroneously marking other cells as this type) and are often used in combinations by CellCover to achieve a panel of genes with high sensitivity (the ability to mark all cells of a particular type). Low or sparsely expressed genes of this type may represent poorly measured genes (i.e. zero inflation known to occur in single-cell data, where genes are measured as zero in cells which actually express the gene) or may represent genes which are truly expressed only in a subset of the annotated class. Because CellCover can borrow strength across genes, it can harness the true information in either class of genes, even if affected by zero inflation. Further investigation of structure within the cell class (and across other cell classes) using the CellCover gene marker panel, as well as other genes, is necessary to clarify this issue in any particular analysis. In the manuscript, we evaluate the expression of individual genes within and across classes in this manner to understand deeper structure in Figures 1A, S6 and S8.

      To demonstrate how CellCover selects individual genes with high specificity and low sensitivity, but which are complementary to one another, in order to achieve high collective sensitivity, here we consider a hypothetical dataset of many cells where we focus on one cell class that contains 100 cells composed of four subtypes.

      - Subtype A: cells 1–20

      - Subtype B: cells 21–30

      - Subtype C: cells 31–50

      - Subtype D: cells 51–100

      To illustrate how CellCover evaluates marker gene panels, in this example, the genes under instigation have very different weights (i.e. the ratio of a gene’s expression in the cell class of interest versus its expression in other cells). Suppose we have two candidate marker panels:

      Panel 1 (coarse markers).

      - Gene A: covers cells 1–30 (weight = 0.4)

      - Gene B: covers cells 30–60 (weight = 0.3)

      - Gene C: covers cells 60–100 (weight = 0.2)

      Each gene in this panel covers a relatively large portion of the population (> 30%), but their weights are comparatively high, indicating limited specificity to the focal cell type. Although the panel {A,B,C} attains full coverage, its markers are coarse and nonspecific.

      Panel 2 (fine-grained, combinatorial markers).

      - Gene A’: covers cells 1–20 (weight = 0.05)

      - Gene B’: covers cells 20–30 (weight = 0.10)

      - Gene C’: covers cells 30–50 (weight = 0.05)

      - Gene D’: covers cells 50–100 (weight = 0.10)

      Each marker is expressed in a smaller fraction of the population (individually low sensitivity), but the weights are substantially lower, reflecting strong subtype specificity. Importantly, these genes are complementary: their union covers all 100 cells (high combinatorial sensitivity), even though no single gene spans more than 20–50% of the cells.

      Under a strict covering requirement (e.g., α \= 0, requiring 100% coverage, i.e. perfect sensitiity), both panels satisfy the constraint. However, CellCover selects the second panel because its total weight (specificity) is smaller. This preference reflects the design of the objective function: the method favors markers that are highly cell-type-specific, even if they individually cover only a subset of the population, as long as their complements yield full coverage. As a result, CellCover can reveal refined subtype structure within what appears to be a single cell population.

      Interpretation guidelines. We explicitly note that CellCover marker panels should be interpreted as combinatorial signatures:

      - Individual genes may show localized, subtype-restricted expression.

      - The union of their expression defines the target cell type.

      - Low-weight genes are more specific; CellCover therefore prioritizes them whenever they provide complementary coverage.

      - The resulting panel may highlight latent heterogeneity or subpopulations within the cell type that express different subsets of the markers.

      In addition to these technical guidelines for interpreting gene panels, throughout the manuscript we use the transfer of CellCover marker gene panels to related datasets to assess the biological function of the gene sets. We propose this as a general tool in the examination of gene lists and have implemented methods to visualize the expression of any gene list (including gene lists uploaded by users) using the Projection Tool within NeMO Anlaytics.

      Further quantification of CellCover’s sensitivity in detecting rare cell subtypes or states would strengthen the evaluation of its performance. Additionally, it would be helpful to assess how CellCover performs under noisy conditions, such as low cell numbers or read depths, which are common challenges in scRNA-seq datasets.

      While CellCover is a method to define marker gene panels for cell classes that are already defined in a dataset, its performance on rare cell classes, small numbers of cells and low read depths is still a relevant issue. The analyses in the paper can speak to some of these concerns: The Telley dataset, which we use throughout the manuscript, used FlashTag labeling of cells prior to sequencing in order to ascertain the time since terminal division for each cell. This unique metadata linked to each cell’s expression data enabled many of the analyses we performed in the paper, but also limited the number of cells that were sequenced. For this reason, the number of cells in this dataset (total cells = 2756) is much lower than that seen in the vast majority of other single-cell sequencing studies, including those we use for the transfer of marker gene sets defined by CellCover in the Telley data. As a result, the cell classes for which we define marker gene panels in the paper contain relatively small numbers of cells. This is especially true in the 12-class analysis in Figures 4 and 5 where CellCover successfully defines gene panels for all 12 classes which transfer well to other datasets. Total cells per class range from 134 to 301. Figure S6 shows that the discriminative power of the 12 gene panels varied widely, with the most highly discriminative panel being from the E12.1H condition with only 189 cells).

      In addition, we note that the behavior of CellCover on rare (or any) cell classes can be characterized deterministically under mild condition. For a fixed cell class and a required covering rate of 1, a depth-k covering gene panel exists if and only if every cell in the class expresses at least k genes. Under this condition, CellCover is guaranteed to find a covering panel of depth-k. Importantly, this guarantee does not impose any restriction on the panel size. Consequently, the compactness of the resulting panel reflects intrinsic properties of the data rather than algorithmic limitations: a small panel indicates that a subset of genes is robustly and consistently expressed across most cells in the class, even if the class itself is rare, whereas a large panel suggests highly heterogeneous expression patterns, where different genes are expressed in different cells. In this sense, the feasibility and structure of a covering panel are determined by the biological and technical characteristics of the dataset (e.g., read depth, expression sparsity, and the specificty of gene expression in the defined cell classes), rather than by the performance of CellCover itself.

      It is intriguing and novel that CellCover analysis of the dataset from Telley et al. suggests cell-type-specific expression of ribosomal, mitochondrial, or tRNA genes. These findings would be significantly strengthened by additional validation. For example, the reported radial glia-specific expression of Rps18-ps3 and Rps10-ps1, as well as the postmitotic neuron-specific expression of mt-Tv and mt-Nd4l, should be corroborated using independent scRNA-seq or spatial transcriptomic datasets of the developing neocortex. Alternatively, these expression patterns could be directly examined through immunostaining or single-molecule FISH analysis.

      The main problem with such analysis is that most studies have omitted the expression of these genes (especially mitochondrial genes that are primarily viewed as QC metrics) from their datasets. We encourage researchers to retain the expression of these transcripts in their data so that their biological functions can be explored. Where available, the expression of these genes can be visualized in NeMO Analytics in the mouse where the enrichment of Rps18-ps3 expression in radial glia can be seen in the Di Bella 2021 dataset and in the human where the expression of mt-Tv can be seen in neurons in the Polioudakis 2019, Darmanis 2015, Camp 2015, and Liu 2016 datasets.

      Taking a broader perspective, a growing body of foundational work in developmental neurobiology supports the observation that mitochondrial state and metabolic programs undergo systematic changes during neuronal differentiation, consistent with our CellCover findings. For example, Khacho 2016 demonstrated that mitochondrial dynamics are essential regulators of neuronal fate commitment and that the maturation of the mitochondrial network is essential for the transition from the progenitor metabolic state to the neuronal state. Iwata 2020 further highlight cell type specific mitochondrial dynamics by showing that daughter cells with highly fragmented mitochondria tend to become neurons.

      The observation that outer radial glia (oRG) markers are expressed in neural progenitors before the emergence of gliogenic progenitors in primates and humans is compelling. This could be further supported by examining the temporal and spatial expression patterns of early oRG-specific markers versus gliogenic progenitor markers in recent human spatial transcriptomic datasets - such as the one published by Xuyu et al. (PMID: 40369074) or Wang et al. (PMID: 39779846).

      We have added the scRNA-seq data from Wang et al., as well as data from the Nano et al. 2025 meta-atlas to the NeMO Analytics data collection. oRG markers from Liu et al 2023 can now be visualized across the Wang, Nano and many more human in vivo datasets. In the Nano data, these oRG markers can be seen increasing in expression in the human neocortex from GW7-12, leading into peak neurogenesis and prior to gliogenesis. Although with lower age resolution, the peaking of oRG markers in the 2nd trimester (dring peak neurogenesis) and their precipitous drop in the 3rd trimester (during peak gliogenesis) can also be seen in the Wang data. At NeMO Analytics individual marker genes of oRGs can also visualized in these datasets.

    1. eLife Assessment

      This manuscript presents a valuable methodological approach to investigating context-dependent activity of cis-regulatory activity within defined genomic loci. The authors combine a locus-specific massively parallel reporter assay, enabling unbiased and high-coverage profiling of enhancer activity across large genomic regions, with a degenerate reporter assay to identify nucleotides critical for enhancer function. The data supporting the conclusions are solid, highlighted by successful identification and characterization of both previously known and new regulatory elements across multiple developmental stages, cell types, and species. While the approach has inherent limitations in sensitivity, and indirect assignment of regulatory elements to target genes, it provides a flexible platform for nominating candidate cis-regulatory elements across defined loci.

    2. Reviewer #1 (Public review):

      MPRAs are a high-throughput and powerful tool for assaying the regulatory potential of genomic sequences. However, linking MPRA-nominated regulatory sequences to their endogenous target genes, and identifying the more specific functional regions within these sequences can be challenging. MPRAs that tile a genomic region, and saturation mutagenesis-based MRPAs can help to address these challenges. In this work, Tulloch et al. describe a streamlined MPRA system for the identification and investigation of the regulatory elements surrounding a gene of interest with high resolution. The use of BACs covering a locus of interest to generate MPRA libraries allows for an unbiased, and high-coverage assessment of a particular region. Follow up degenerate MPRAs, where each nucleotide in the nominated sequences are systematically mutated, then can point to key motifs driving their regulatory activity. The authors present this MPRA platform as straightforward, easily customizable, and less time- and resource-intensive than traditional MPRA designs. They demonstrate the utility of their design in the context of the developing mouse retina, where they first use the LS-MPRA to identify active regulatory elements for select retinal genes, followed by d-MPRA which allowed them to dissect the functional regions within those elements and nominate important regulatory motifs. These assays were able to recapitulate some previously known cis-regulatory modules (CRMs), as well as identify some new potential regulatory regions. Follow up experiments assessing co-localization of the gene of interest with the CRM-linked GFP reporter in the target cells, and CUT&RUN assays to confirm transcription factor binding to nominated motifs provided support linking these CRMs to the genes of interest. Overall, this method appears flexible and could be an easy to implement tool for other investigators aiming to study their locus of interest with high resolution.

      Strengths:

      (1) The method of fragmenting BACs allows for high, overlapping coverage of the region of interest.

      (2) The d-MPRA method was an efficient way to identify key functional transcription factor motifs, and nominate specific transcription factor-driven regulatory pathways that could be studied further.

      (3) Additional assays like co-expression analyses using the endogenous gene promoter, and use of the Notch inhibitor in the case of Olig2, helped correlate the activity of the CRMs to the expression of the gene of interest, and distinguish false positives from the initial MPRA.

      (4) The use of these assays across different time points, tissues, and even species demonstrated that they can be used across many contexts to identify both common and divergent regulatory mechanisms for the same gene.

      Weaknesses:

      (1) The LS-MPRA assay most strongly identified promoters, which are not usually novel regulatory elements you would try to discover, and the signal to noise ratio for more TSS-distal, non-promoter regulatory elements was usually high, making it difficult to discriminate lower activity CRMs, like enhancers, from the background. For example, NR2 and NR3 in Figure 3 have very minimal activity peaks (NR3 seems non-existent). The ex vivo data in Figure 2 is similarly noisy. Is there a particular metric or calculation that was or could be used to quantitatively or statistically call a peak above the background? The authors mention in the discussion some adjustments that could reduce the noise, such as increased sequencing depth, which I think is needed to make these initial LS-MPRA results and the benchmarking of this assay more convincing and impactful.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Tulloch et al. developed two modified massively parallel reporter assays (MPRAs) and applied them to identify cis-regulatory modules (CRMs) - genomic regions that activate gene expression - controlling retinal gene expression. These CRMs usually function at specific developmental stages and in distinct cell types to orchestrate retinal development. Studying them provides insights into how retinal progenitor cells give rise to various retinal cell types.

      The first assay, named locus-specific MPRA (LS-MPRA), tests all genomic regions within 150-300 kb of the gene of interest, rather than relying on previously predicted candidate regulatory elements. This approach reduces potential bias introduced during candidate selection, lowers the cost of synthesizing a library of candidate sequences, and simplifies library preparation. The LS-MPRA libraries were electroporated into mouse retinas in vivo or ex vivo. To benchmark the method, the authors first applied LS-MPRA near stably expressed retinal genes (e.g., Rho, Cabp5, Grm6, and Vsx2), and successfully identified both known and novel CRMs. They then used LS-MPRA to identify CRMs in embryonic mouse retinas, near Olig2 and Ngn2, genes expressed in subsets of retinal progenitor cells. Similar experiments were conducted in chick retinas and postnatal mouse retinas, revealing some CRMs with conserved activity across species and developmental stages.

      Although the study identified CRMs with robust reporter activity in Olig2+ or Ngn2+ cells, the data do not provide sufficient evidence to support the claims that these CRMs regulate Olig2 or Ngn2, rather than other nearby genes, in a cell type-specific manner. For example, the authors propose that three regions (NR1/2/3) regulate Olig2 specifically in retinal progenitor cells based on: 1) the three regions are close to Olig2, 2) increased Olig2 expression and NR1/2/3 activity upon Notch inhibition, and 3) reporter activity observed in Olig2+ cells (though also present in many Olig2- cells). While these are promising findings, they do not directly support the claims.

      The second assay, called degenerate MPRA (d-MPRA), introduces random point mutations into CRMs via error-prone PCR to assess the impact of sequence variations on regulatory activity. This approach was used on NR1/2/3 to identify mutations that alter CRM activity, potentially by influencing transcription factor binding. The authors inferred candidate transcription factors, such as Mybl1 and Otx2, through motif analysis, co-expression with Olig2 (based on single-cell RNA-seq), and CUR&RUN profiling. While some transcription factors identified in this way overlapped with the d-MPRA results, others did not. This raises questions about how well d-MPRA complements other methods for identifying TF binding sites.

      Strengths:

      The study introduces two technically robust MPRA protocols that offer advantages over standard methods, such as avoiding reliance on predefined candidate regions, reducing cost and labor, and minimizing selection bias.

      The identified regulatory elements and transcription factors contribute to our understanding of gene regulation in retinal development and may have translational potential for cell type-specific gene delivery into developing retinas.

      Weakness:

      Like other MPRA-based approaches, LS-MPRA mainly tests whether a sequence can drive expression of a reporter gene in given cell type(s). However, this type of assay generally does not show which endogenous gene the sequence regulates. In this study, the evidence supporting gene-specific CRMs is largely correlative. The evidence for cell-type-specific CRMs is also not fully supported (e.g., reporter expression is observed in the intended cell type as well as additional cell types). If further validation in the native genomic context (e.g., CRISPRi of the candidate element followed by RNA-seq across relevant cell types) is out of scope, the manuscript should avoid wording that implies definitive target gene assignment or cell-type specificity.

    4. Reviewer #3 (Public review):

      Summary:

      Use of reporter assays to understand the regulatory mechanisms controlling gene expression moves beyond simple correlations of cis-regulatory sequence accessibility, evolutionary sequence conservation, and epigenetic status with gene expression, instead quantifying regulatory sequence activity for individual elements. Tulloch et al., provide systematic characterization of two new reporter assay techniques (LS-MPRA and d-MPRA) to comprehensively identify cis-regulatory sequences contained within genomic loci of interest during retinal development. The authors then apply LS-MPRA and d-MPRA to identify putative cis-regulatory sequences controlling Olig2 and Ngn2 expression, including potential regulatory motifs that known retinal transcription factors may bind. Transcription factor binding to regulatory sequences is then assessed via CUT&RUN. The broader utility of the techniques are then highlighted by performing the assays across development, across species, and across tissues.

      Strengths:

      The authors validate the reporter assays on retinal loci for which the regulatory sequences are known (Rho, Vsx2, Grm6, Cabp5) mostly confirming known regulatory sequence activity but highlighting either limitations of the current technology or discrepancies of previous reporter assays and known biology. The techniques are then applied to loci of interest (Olig2 and Ngn2) to better understand the regulatory sequences driving expression of these transcription factors across retinal development within subsets of retinal progenitor cells, identifying novel regulatory sequences through comprehensive profiling of the region.

      LS-MPRA provides broad coverage of loci of interest

      d-MPRA identifies sequence features that are important for cis-regulatory sequence activity.

      The authors take into account transcript and protein stability when determining the correlation of putative enhancer sequence activity with target gene expression.

      Overall, the manuscript highlights the utility of the techniques to identify novel cis-regulatory sequence contributions to gene expression, including systematic characterizations of sequence motifs conferring activating or repressive functions.

      Limitations:

      Barcoding strategies have the potential to induce high collision rates (see Table S3) that may lead to misinterpretation of the data and/or high false positive/negative rates.

      There are limited robust methods to distinguish differentially active versus inactive CRMs in the LS-MPRA.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      MPRAs are a high-throughput and powerful tool for assaying the regulatory potential of genomic sequences. However, linking MPRA-nominated regulatory sequences to their endogenous target genes and identifying the more specific functional regions within these sequences can be challenging. MPRAs that tile a genomic region, and saturation mutagenesis-based MPRAs, can help to address these challenges. In this work, Tulloch et al. describe a streamlined MPRA system for the identification and investigation of the regulatory elements surrounding a gene of interest with high resolution. The use of BACs covering a locus of interest to generate MPRA libraries allows for an unbiased and high-coverage assessment of a particular region. Follow-up degenerate MPRAs, where each nucleotide in the nominated sequences is systematically mutated, can then point to key motifs driving their regulatory activity. The authors present this MPRA platform as straightforward, easily customizable, and less time- and resource-intensive than traditional MPRA designs. They demonstrate the utility of their design in the context of the developing mouse retina, where they first use the LS-MPRA to identify active regulatory elements for select retinal genes, followed by d-MPRA, which allowed them to dissect the functional regions within those elements and nominate important regulatory motifs. These assays were able to recapitulate some previously known cis-regulatory modules (CRMs), as well as identify some new potential regulatory regions. Follow-up experiments assessing co-localization of the gene of interest with the CRM-linked GFP reporter in the target cells, and CUT&RUN assays to confirm transcription factor binding to nominated motifs, provided support linking these CRMs to the genes of interest. Overall, this method appears flexible and could be an easy-to-implement tool for other investigators aiming to study their locus of interest with high resolution.

      Strengths:

      (1) The method of fragmenting BACs allows for high, overlapping coverage of the region of interest.

      (2) The d-MPRA method was an efficient way to identify key functional transcription factor motifs and nominate specific transcription factor-driven regulatory pathways that could be studied further.

      (3) Additional assays like co-expression analyses using the endogenous gene promoter, and use of the Notch inhibitor in the case of Olig2, helped correlate the activity of the CRMs to the expression of the gene of interest, and distinguish false positives from the initial MPRA.

      (4) The use of these assays across different time points, tissues, and even species demonstrated that they can be used across many contexts to identify both common and divergent regulatory mechanisms for the same gene.

      Weaknesses:

      The LS-MPRA assay most strongly identified promoters, which are not usually novel regulatory elements you would try to discover, and the signal-to-noise ratio for more TSS-distal, non-promoter regulatory elements was usually high, making it difficult to discriminate lower activity CRMs, like enhancers, from the background. For example, NR2 and NR3 in Figure 3 have very minimal activity peaks (NR3 seems non-existent). The ex vivo data in Figure 2 are similarly noisy. Is there a particular metric or calculation that was or could be used to quantitatively or statistically call a peak above the background? The authors mention in the discussion some adjustments that could reduce the noise, such as increased sequencing depth, which I think is needed to make these initial LS-MPRA results and the benchmarking of this assay more convincing and impactful.

      Much of the statistical and quantitative data asked for by the Reviewers have been provided in the Revision. However, it is important to note that the types of statistics using peak callers asked for regarding candidate choice will be of limited value. If one is testing a library in a single cell type in vitro, and/or running genome-wide assays, these statistics could aid in the choice of candidates. However, here we are electroporating a complex and dynamic set of cells, with each cell type constituting what can be very different frequencies (e.g. Olig2-expressing cells are <2.4% of cells). This fact alone will give different apparent signal to noise values. In addition, at least for Olig2 and Ngn2, their expression is very transient, suggesting dynamic regulation by what is likely multiple positive and negative CRMs. An additional confound is that the level of expression of each gene that one might test is variable. All of these variables render a statistical prediction of candidates to be less valuable than one might hope, and might lead one to miss those CRMs of interest, particularly those in a small subset of cells. Instead, we suggest that one use one’s own level of interest and knowledge in choosing CRM candidates. We provide several examples of experimental, rather than purely statistical, approaches that might help in one’s choice of candidates. We used a functional read-out of CRM activity (Notch perturbation), carried out in the context of the entire LS-MPRA library, as one method. Co-expression in single cells of candidate regulators identified by the d-MPRA is another. One can of course use chromatin structure and sequence conservation, as used in many studies of regulatory regions, as other ways to narrow down candidates. The d-MPRA predictions also can be viewed in light of previous genetic studies, i.e. mutations in TFs that effect the cell type of interest or the regulation of the gene of interest, as we were able to do here for CRMs predicted to be regulated by Otx2.

      Reviewer #2 (Public review):

      Summary:

      In this study, Tulloch et al. developed two modified massively parallel reporter assays (MPRAs) and applied them to identify cis-regulatory modules (CRMs) - genomic regions that activate gene expression, controlling retinal gene expression. These CRMs usually function at specific developmental stages and in distinct cell types to orchestrate retinal development. Studying them provides insights into how retinal progenitor cells give rise to various retinal cell types.

      The first assay, named locus-specific MPRA (LS-MPRA), tests all genomic regions within 150-300 kb of the gene of interest, rather than relying on previously predicted candidate regulatory elements. This approach reduces potential bias introduced during candidate selection, lowers the cost of synthesizing a library of candidate sequences, and simplifies library preparation. The LS-MPRA libraries were electroporated into mouse retinas in vivo or ex vivo. To benchmark the method, the authors first applied LS-MPRA near stably expressed retinal genes (e.g., Rho, Cabp5, Grm6, and Vsx2), and successfully identified both known and novel CRMs. They then used LS-MPRA to identify CRMs in embryonic mouse retinas, near Olig2 and Ngn2, genes expressed in subsets of retinal progenitor cells. Similar experiments were conducted in chick retinas and postnatal mouse retinas, revealing some CRMs with conserved activity across species and developmental stages.

      Although the study identified CRMs with robust reporter activity in Olig2+ or Ngn2+ cells, the data do not provide sufficient evidence to support the claims that these CRMs regulate Olig2 or Ngn2, rather than other nearby genes, in a cell-type-specific manner. For example, the authors propose that three regions (NR1/2/3) regulate Olig2 specifically in retinal progenitor cells based on: (1) the three regions are close to Olig2, (2) increased Olig2 expression and NR1/2/3 activity upon Notch inhibition, and (3) reporter activity observed in Olig2+ cells (though also present in many Olig2- cells). While these are promising findings, they do not directly support the claims.

      The second assay, called degenerate MPRA (d-MPRA), introduces random point mutations into CRMs via error-prone PCR to assess the impact of sequence variations on regulatory activity. This approach was used on NR1/2/3 to identify mutations that alter CRM activity, potentially by influencing transcription factor binding. The authors inferred candidate transcription factors, such as Mybl1 and Otx2, through motif analysis, co-expression with Olig2 (based on single-cell RNA-seq), and CUR&RUN profiling. While some transcription factors identified in this way overlapped with the d-MPRA results, others did not. This raises questions about how well d-MPRA complements other methods for identifying transcriptional regulators.

      Strengths:

      (1) The study introduces two technically robust MPRA protocols that offer advantages over standard methods, such as avoiding reliance on predefined candidate regions, reducing cost and labor, and minimizing selection bias.

      (2) The identified regulatory elements and transcription factors contribute to our understanding of gene regulation in retinal development and may have translational potential for cell-type-specific gene delivery into developing retinas.

      Weaknesses:

      (1) The claims for gene-specific and cell type-specific CRMs would benefit from further validation using complementary approaches, such as CRISPR interference or Prime editing.

      The methods that we developed were meant to provide candidates for regulatory elements for a gene of interest. These candidates could be used to further understand the regulation of a gene, a complex and difficult task, especially for dynamically regulated genes in the context of development. These candidates could also, or instead, be used to drive gene expression specifically in a target cell of interest for applications such as gene therapy or perturbations that need this type of specificity. In the first case, to use the candidates to understand the regulation of a gene, one would need to validate the candidates using the types of methods typically employed for this purpose, most rigorously in the in vivo genomic context. We did not pursue this level of validation as it would encompass a great deal of work outside the scope of the current study. However, by initially testing loci which have been studied by several groups (as cited in the manuscript, Rho, Grm6, Vsx2, and Cabp5), we were able to show that LS-MPRA can identify known CRMs. In the cases of Rho and Vsx2, previous data have shown the CRMs to be relevant in the genomic context in vivo. In addition, two Vsx2 CRM’s identified by LS-MPRA are located at -37 Kb and -17Kb, and the Grm6 CRM identified by LS-MPRA is at -8Kb. These are the same CRM locations identified previously using classical methods. These data show that the method is capable of identifying distal elements. When one has only one or a few loci of interest, i.e. one does not need to use genome-wide approaches, LS-MPRA is accurate enough to be worth the relatively small effort to identify potential CRMs, even those at some distance from the TSS. However, it is apparent that our methods are not perfect and that the LS-MPRA does not pick up all CRMs. We do not know of a method that has been shown to do so.

      Reviewer #3 (Public review):

      Summary:

      Use of reporter assays to understand the regulatory mechanisms controlling gene expression moves beyond simple correlations of cis-regulatory sequence accessibility, evolutionary sequence conservation, and epigenetic status with gene expression, instead quantifying regulatory sequence activity for individual elements. Tulloch et al., provide a systematic characterization of two new reporter assay techniques (LS-MPRA and d-MPRA) to comprehensively identify cis-regulatory sequences contained within genomic loci of interest during retinal development. The authors then apply LS-MPRA and d-MPRA to identify putative cis-regulatory sequences controlling Olig2 and Ngn2 expression, including potential regulatory motifs that known retinal transcription factors may bind. Transcription factor binding to regulatory sequences is then assessed via CUT&RUN. The broader utility of the techniques is then highlighted by performing the assays across development, across species, and across tissues.

      Strengths:

      (1) The authors validate the reporter assays on retinal loci for which the regulatory sequences are known (Rho, Vsx2, Grm6, Cabp5) mostly confirming known regulatory sequence activity but highlighting either limitations of the current technology or discrepancies of previous reporter assays and known biology. The techniques are then applied to loci of interest (Olig2 and Ngn2) to better understand the regulatory sequences driving expression of these transcription factors across retinal development within subsets of retinal progenitor cells, identifying novel regulatory sequences through comprehensive profiling of the region.

      (2) LS-MPRA provides broad coverage of loci of interest.

      (3) d-MPRA identifies sequence features that are important for cis-regulatory sequence activity.

      (4) The authors take into account transcript and protein stability when determining the correlation of putative enhancer sequence activity with target gene expression.

      Weaknesses:

      (1) In its current form, the many important controls that are standard for other MPRA experiments are not shown or not performed, limiting the interpretations of the utility of the techniques. This includes limited controls for basal-promoter activity, limited information about sequence saturation and reproducibility of individual fragments across different barcode sequences, limitations in cloning and assay delivery, and sequencing requirements. Additional quantitative metrics, including locus coverage and number of barcodes/fragments, would be beneficial throughout the manuscript.

      We thank the reviewer for these comments and have provided detailed responses to the additional analyses in the subsequent Recommendations section.

      (2) There are no statistical metrics for calling a region/sequence 'active'. This is especially important given that NR3 for Olig2 seems to have a small 'peak' and has non-significant activity in Figure 4.

      See comments about peak calling in our response to Reviewer #1.

      (3) The authors present correlational data for identified cis-regulatory sequences with target gene expression. Additionally, the significance of transcription factor binding to the putative regulatory sequences is not currently tested, only correlated based on previous single-cell RNA-sequencing data. While putative regulatory sequences with potential mechanisms of regulation are identified/proposed, the lack of validation (and discrepancies with previous literature) makes it hard to decipher the utility of the techniques.

      See comments about further validation in our response to Reviewer #2.

      (4) While the interpretations that Olig2 mRNA/protein expression is dynamically regulated improved the proportions of cells that co-expressed CRM-regulated GFP and Olig2, alternate explanations (some noted) are just as likely. First, the electroporation isn't specific to Olig2+ progenitors. Also, the tested, short CRM fragments may have activating signals outside of Olig2 neurogenic cells because chromatin conformation, histone modifications, and DNA methylation are not present on plasmids to precisely control plasmid activity. Alternatively, repressive elements that control Olig2 expression are not contained in the reporter vectors.

      The electroporation of Olig2 minus and plus cells is an excellent way to determine if a CRM is active in all cells, or only a specific subset, and we therefore consider this the best way to answer the question of specificity. We agree that we were unable to show that all CRM active cells were indeed Olig2-expressing cells. As noted by the Reviewer, we went to some lengths to quantify RNA and protein co-expression, including of endogenous Olig2 protein and RNA. Even with the endogenous RNA and protein, there was a mismatch wherein one infrequently saw the two together in the same cell, which could be predicted from the short half-lives of these molecules. Regarding chromatin, etc., we are intrigued by the proper regulation that we have observed for CRMs that we have previously discovered by plasmid electroporation (e.g. Kim et al. 2008, Matsuda and Cepko, 2004, Wang et al. 2014, Emerson et al. 2013). It is indeed interesting that plasmids can recapitulate proper regulation, without the proper genomic context or chromatin modifications. We have expanded our discussion of these points in the Discussion.

      (5) It is unclear as to why the d-MPRA uses a different barcoding strategy, placing a second copy of the cis-regulatory sequence in the 3' UTR. As acknowledged by the author, this will change the transcript stability by changing the 3' UTR sequence. Because of this, comparisons of sequence activity between the LS-MPRA and d-MPRA should not be performed as the experiments are not equivalent.

      We had provided a rationale for the different strategies of barcoding in the original submission, and believe it is at the discretion of the experimenter to utilize either strategy for their specific purposes. We agree that comparing activity between different techniques would not be appropriate. The analysis of mutated CRMs using d-MPRA does not utilize data from the LS-MPRA, but is an analysis of relative activity among all mutated d-MPRA constructs.

      (6) Furthermore, details of the mutational burden in d-MPRA experiments are not provided, limiting the interpretations of these results.

      We have provided detailed responses to the additional analyses in the subsequent Recommendations section and included details of the mutational burden in Supplemental Document A.

      (7) Many figures are IGV screenshots that suffer from low resolution. Many figures could be consolidated.

      We have increased the resolution of all IGV genome tracks, but believe the content within all figures remains appropriate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improving the clarity of the results in the figures:

      (1) The pie charts used the show the percentage of overlapping cells in the colocalization analyses were not especially intuitive to read, and although the percentages and any statistical significance were often written in the text, it would've been helpful to have them written in the figures. I would suggest displaying the results in stacked bar plots, possibly like the one shown in Figure 6A, to demonstrate the data more clearly.

      We thank the reviewer for the suggestions. Though adding the percentages directly to the pie charts would make the relevant panels too confusing to interpret, we added supplemental tables (Tables S5-S9) with the percentages displayed in all pie charts for readers interested in the precise quantifications.

      (2) The scRNA-seq UMAPs showing co-expression of Olig2 with the TFS of interest - it is very hard to see the cells that co-express. I would recommend either having a window zoomed in on the Olig2-expressing cell population to be able to see the co-expression more clearly visually, and/or including a graph demonstrating the percentages of co-expressing cells. These numbers were written in the text, but would be useful to see in the figure.

      The resolution of the scRNA-Seq plot has been improved for the visualization of co-expressing cells, which were also brought forward in all UMAP plots to improve clarity. Because of the higher quality images, insets should no longer be necessary. We have also included percentages of co-expression in the figures (Figs. 8 and 8S) and thank the reviewer for the suggestion.

      Other minor suggestions/corrections:

      (3) Figures 6B and 10S are missing the overlap quantification (in bar or pie charts) like in the other figures.

      The quantification for the image in 6B (i.e., GFP fluorescence and GFP RNA) is displayed in 6D for the four Olig2 CRM plasmid constructs. In Fig. 10S, the experiments in early chick ventral neural tube delivered constructs to a very limited number of cells, and quantification of cells would not necessarily represent an accurate number of cells with CRM activity. We therefore decided to show only representative images of CRM activity in this population of cells rather than present a biased count or increase the number of experiments/samples to obtain a robust quantification.

      (4) On the second-to-last line of page 10, in the sentence "The d-MPRA approach provided a robust, high resolution method for functionally relevant TF binding sites....", I think you're missing a word between "for" and "functionally". For example, it might be "for identifying..." or "for nominating...".

      We have revised the sentence accordingly.

      Reviewer #2 (Recommendations for the authors):

      Minor suggestions:

      (1) Please indicate which mouse reference genome (e.g., mm10) was used in plots such as Figure 2.

      We have added text to the relevant sections in the Results (the reference genome was already mentioned in Methods).

      (2) In Figures 2 and 2S, the CRMs discussed in the text are not labeled or highlighted, making it unclear which regions are being referenced.

      We have labeled peaks with roman numerals in both the figures, legends, and text for clarity and thank the reviewer for the suggestion.

      (3) Consider listing the genomic coordinates for the CRMs mentioned in the text, as this information would be especially useful for readers interested in exploring these regions further.

      This information was included in Table 2S in the original submission, with all relevant coordinates provided therein.

      (4) The d-MPRA plots (e.g., Figure 7C-E) do not clearly show the effects of different nucleotide substitutions. A more informative visualization style can be found in Kircher et al (PMID: 31395865, Fig. 1D) or Deng et al (PMID: 38781390, Fig. 5F).

      The precise nucleotide substitutions would be informative to visualize the effects of specific changes. However, we were more interested in how any nucleotide substitution influenced the CRM activity to hone in on relevant TFBS. We therefore believe the current visualization is the most appropriate to accomplish this. However, for some types of future applications, a more informative visualization as noted would be a valuable addition.

      (5) It would be extremely helpful to the community if the LS-MPRA data were uploaded to the UCSC genome browser and made accessible via a link.

      We have uploaded all LS-MPRA genome tracks to a Track Hub in the UCSC genome browser and provided the appropriate link to access the Hub (https://github.com/cattapre/ALAS00) in the methods section.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should address the following metrics to showcase the utility of the techniques:

      We thank the reviewer for requesting the detailed metrics outlined below. We have addressed all inquiries and included the majority of metrics in the resubmission.

      (a) Library size

      This should be shown for each library that is generated. It is acknowledged that the complete size of the library is limited by sequencing, and the comprehensiveness of the library will change every time the library is re-prepped. However, metrics of this are not currently provided in a robust manner for each library. "Libraries of at least 7x10^6 and as many as 9x10^7 fragments are made" - vague - how was library complexity established since this seems to be an estimation, how many reads were utilized to estimate library complexity?

      We created a new supplemental table (Table S3) that displays the complexity based on sequencing rather than the estimated complexity based on the serial dilutions prior to 3D culture (which was used for the estimates listed in the results). We updated the complexity range in the text as well and thank the reviewer for the suggestion.

      Does library size scale proportionally to the BACs of different sizes?

      The fragmentation of different BACs with differing sizes does not necessarily alter the size of the library. Library size is primarily determined by the library creation pipeline, with the size selection step of the fragmented BAC and the cloning step that inserts adapter-ligated fragments into the barcoded expression vector being the primary determinants of complexity of plasmid libraries.

      (b) Sequence saturation

      Can the authors please provide evidence that the libraries have been sequenced to saturation or estimates of the degree of under-sequencing? How many reads does it take to discover a new barcode associated with a new regulatory sequence?

      We have provided library characteristics for this in Table S3 and have also generated Sequence Saturation Curves for each association library in Supplemental Document A.

      (c) Barcode saturation

      How many barcodes are present for each fragment in the libraries? Are most fragments only covered by 1 barcode? The barcoding strategy doesn't prevent the same barcode from being assigned to multiple different fragments, as barcodes are random. What is the incidence of barcode collisions?

      We have provided library characteristics for this in Table S3 and have also generated Barcode Saturation Curves for each association library in Supplemental Document A.

      Additionally, we tested whether the omission of barcode collisions would affect the output of our LS-MPRA. We reanalyzed one barcode abundance library (one replicate following 12h Notch inhibitor) and filtered the barcodes so that only unique barcodes were analyzed. We were able to replicate all previously identified peaks. Though it is not necessary to filter out barcode collisions, there may be an improvement in signal-to-noise if the sequencing depth of libraries was sufficient (see Supplemental Document B).

      (d) Normalization

      As performed, fragment activity is normalized by RNA expression compared to the presence of fragments in the library. While this is done for small libraries, for large libraries, this may not be appropriate. For large libraries, every sequence in the library will not be delivered to each cell, and many fragments contained in the library may not be electroporated at all. Ideally, the authors would have sequenced both the RNA and DNA from the electroporations to i) identify the fragment distribution of the library that was successfully electroporated and ii) provide an internal normalization factor across replicate samples. This is especially important if the libraries were ever re-prepped, as the jack-potting or asymmetries in fragment recovery can occur every time the library is re-derived.

      We agree with the reviewer’s comments about the variability in fragments delivered experimentally, though we also believe the normalization of the libraries is still appropriate. We never needed to re-prep the libraries as there was sufficient material for many more experiments than were performed. However, should one ever need to re-prep an LS-MPRA library, all experimental sequencing should be normalized to the respective sequenced association library to account for biased distributions, as the reviewer mentions.

      In the absence of these metrics (this would likely require the authors to repeat all experiments and is acknowledged to be outside the scope of revisions), the authors should provide information on the percentage of the library that is profiled in the RNA for each library.

      We have provided RNA profiles of all abundance libraries in Table S4. The overall fraction of fragments represented in the RNA pools was lower than that observed in other published MPRAs. This difference is expected given that most MPRA studies preselect fragments based on chromatin accessibility, transcription factor binding, sequence conservation, or bioinformatically predicted CRMs, thereby enriching for regulatory elements with high activity potential. Our locus-specific MPRA libraries, by contrast, include all fragments across the targeted genomic region, many of which are likely to be inactive in the tested context. Consequently, only a smaller proportion of fragments show measurable RNA expression.

      (e) Fragment sizes

      Please provide a density plot or something similar showcasing the size distribution of the libraries generated. Is there any correlation between sequence activity and the size of fragments?

      We have generated size distribution plots and correlations between fragment size and activity of all libraries and have included them in Supplemental Document A.

      (2) Questions about the statistical validity of results:

      (a) What threshold is utilized for calling a sequence as active? This is important as NR3 does not seem to be an element that has significant activity.

      See comments about peak calling in prior responses.

      (b) A Fisher's exact test using cells from single-cell RNA-sequencing as replicate samples is inappropriate as the cells are i) not from replicate experiments and ii) potentially in different cell states. The proportions of cells across replicate scRNA-seq datasets would be more appropriate.

      We thank the reviewer for raising this important point. While we agree that individual cells do not substitute for biological replicates, we believe Fisher’s exact test remains appropriate for testing whether gene expression is associated with Olig2 expression within a single scRNA-seq dataset. The test assesses co-occurrence at the level of individual cells, which is valid under the assumption that each cell represents an independent sampling of transcriptional states, even when it is possible that cells are in different states. We use this method as an exploratory tool to identify candidate genes associated with Olig2 expression in this dataset, and in the future, this could also be further validated by comparing the proportions of cells across replicate datasets, as the reviewer mentions.

      (3) Discussion of the reporter/Olig2/Ngn2 RNA/protein disconnect needs to be expanded. Some simpler explanations for the presence of GFP in Olig2- and Ngn2- cells, as well as the presence of Olig2 or Ngn2 in GFP- cells, is that (i) these putative CRMs are being introduced to cells in plasmids, taking them out of their native genomic context where they may be inaccessible or repressed and allowing them to drive reporter expression even if their candidate target gene is not endogenously expressed, (ii) these putative CRMs may regulate genes besides just Olig2 or Ngn2, and (iii) Olig2 and Ngn2 are regulated by far more regulatory elements than the 3 or 4 being tested in each reporter assay, so their expression likely does not rely solely on the activity of the few putative CRMs tested.

      We have added these points in an expanded discussion in the text.

      (4) Problems with figures: Low resolution of many IGV genome tracks, pink 'co-expression' dots are completely indiscernible. Numbers should be listed with the pie charts. BFP expression should be shown since this is being quantified, especially since electroporation efficiency can change across age and/or tissue samples.

      We have reconfigured the IGV tracks so that they are higher resolution and have included supplemental tables for the numbers pertaining to the pie charts. For electroporation controls (BFP and RFP), BFP expression is shown in Figs 5S, 6, and 10S and the RFP electroporation control is shown in Fig. 11. Though BFP is sometimes used as a qualifier in the denominator of some of the quantification, displaying its expression, particularly in combination with three other signals that are already included in most images, provides limited utility.

      (5) More information is required to understand the utility of the d-MPRA. Detailed quantification of the number of mutations/fragments needs to be ascertained. When multiple mutations are present, how are the authors controlling for which mutation is affecting activity? What is the coverage of the loci of interest for mutational burden (ie, is every base pair mutated in at least one fragment?). For mutations that increase the activity of the element, are there specific sequence features that increase activity (new motifs generated)?

      The d-MPRA platform is a high-throughput assay that seeks to identity putative sub-regions within CRMs nominated by the LS-MPRA, or any other assay. It relies on deep mutational coverage to determine positive and negative regulatory sub-regions of the CRMs. While many reads have multiple mutations, they are broadly co-occurring across the entire fragment (see Supplemental Document A) so as not to create a false linkage between the sites. Every individual site is mutated many times with roughly even coverage across each fragment (see Supplemental Document A), thus allowing us to assess the requirement of each base in contributing to a putative CRM’s activity. Comparing d-MPRA plots using bulk fragments or fragments with singleton mutations (Supplemental Document A) yielded almost identical plots for two libraries, and a similar analysis of the third library. Any differences between analysis of fragments with one or more mutations is likely a result of either sequencing depth or the requirement of multiple bases for binding or CRM activation. Follow-up experiments investigating intra-CRM interactions would elucidate such variability. Whether new motifs are generated for any specific substitution is an interesting question, which could be followed up for a CRM of interest. The d-MPRA data that we provide would provide the starting point for such follow-up experiments.

      (6) Transcription factors as regulators of CRM-activity.

      It is appreciated that the authors validated the binding of transcription factors to NR2. However, this correlative analysis should be further tested in follow-up experiments to highlight novel biology using systems already in place. Potential experiments that could be performed include the following (reagents in hand, or performed in a manner similar to experiments performed by the lab in previous publications):

      (a) over-expression of TF using LS-MPRA library.

      (b) over-expression of TF using d-MPRA library, showing that mutations in the putative TF binding site disrupt activity compared to non-mutated sequences.

      (c) performing TF over-expression using target CRMs, including sequences where the TF binding site is mutated (similar to a small MPRA).

      (d) the quantification of target gene expression when i) TF is over-expressed, ii) CRM is activated using CRISPRa, or iii) CRM is inhibited using CRISPRi.

      These are all valid follow-up experiments. Please see prior responses we have provided regarding further validation.

      Minor points

      (1) Please acknowledge that some distal regulatory sequences may be contained outside of the BAC regions. Also, the authors should emphasize the point that the assay is NOT cell-type-specific or specific to regulatory sequences for the gene of interest, but ALL regulatory sequences contained within the locus. The discussion of this with respect to Ift122 and Rpl32 is somewhat confusing.

      We have added a sentence in the Discussion addressing possible CRMs outside the BAC coverage. We believe it is implicitly understood that the assay only screens regulatory activity in the BAC, and believe we have addressed this in the manuscript.

      If one wishes to use a candidate CRM to drive gene expression in a targeted cell type, one needs to establish specificity. In particular, specificity needs to be established in the context of the vector that is being used. Non-integrated vs integrated vectors, different types of viral vectors with their own confounding regulatory sequences, different types of plasmids and methods of delivery, and copy number can all affect specificity. We provided a double in situ hybridization method for the examination of specificity for some of the novel candidate CRMs. It was quite difficult in the case of Olig2 and Ngn2 as their RNAs and proteins are unstable. We would need to provide further evidence should we wish to use these candidate CRMs for directing expression specifically in Olig2- or Ngn2-expressing cells. We suggest that an investigator can choose the vector and method for establishing specificity depending upon the goals of the application.

      (2) I am curious as to why low-resolution, pseudo-bulked single-nucleus ATAC was utilized instead of more comprehensive retina ATAC samples at similar time-points (for example, as available in Al Diri et al., 2017 (E14, E17, P0, P3, P7, P10) samples are all available.

      The use of pseudo-bulked single-nucleus ATAC-seq data provided a convenient and consistent comparison to our LS-MPRA results. We agree that incorporating higher-resolution datasets such as those from Al Diri et al. would be valuable for future analyses aimed at linking CRM activity with broader chromatin accessibility dynamics.

    1. eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodelling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

    2. Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

    3. Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1).

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

    4. Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

    5. Author response:

      eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

      We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

      Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.

      Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.

      The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNA-methylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.

      Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.

      Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.

      Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.

      One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure. Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.

      Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.

      We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.

      Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      (below is response to (4) and (5) together)

      We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.

      As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1).

      We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

      We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.

      Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      We are grateful that this reviewer recognizes the importance of our study.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

      We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      References

      Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.

      Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.

      Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U. Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.

      Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.

      Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.

      Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.

      Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.

      Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.

      Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.

      Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.

      Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.

      Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.

      Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.

      Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235–239.

      Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.

      Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.

      Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.

      Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.

      Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.

      Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.

      Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.

      Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.

      Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.

    1. eLife Assessment

      This study uses a valuable combination of functional magnetic resonance imaging and electroencephalography (EEG) to study brain activity related to prediction errors in relation to both sensorimotor and more complex cognitive functions. It provides incomplete evidence to suggest that prediction error minimisation drives brain activity across both types of processing and that elevated inter-regional functional coupling along a superior-inferior axis is associated with high prediction error, whereas coupling along a posterior-anterior axis is associated with low prediction error. The manuscript will be of interest to neuroscientists working on predictive coding and decision-making, but would benefit from more precise localisation of EEG sources and more rigorous statistical controls.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates whether prediction error extends beyond lower-order sensory-motor processes to include higher-order cognitive functions. Evidence is drawn from both task-based and resting-state fMRI, with addition of resting-state EEG-fMRI to examine power spectral correlates. The results partially support the existence of dissociable connectivity patterns: stronger ventral-dorsal connectivity is associated with high prediction error, while posterior-anterior connectivity is linked to low prediction error. Furthermore, spontaneous switching between these connectivity patterns was observed at rest and correlated with subtle intersubject behavioral variability.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Minor Weakness:

      The lack of spatial specificity of sensor-level EEG somewhat limits the inferences that can be obtained in terms of how the fMRI network processes and the EEG power fluctuations relate to each other.<br /> While the language no longer suggests a strong overlap of the source of the two signals, several scenarios remain open (e.g., the higher-order fMRI networks being the source of the EEG oscillations, or the networks controlling the EEG oscillations expressed in lower-order cortices, or a third process driving both the observations in fMRI networks and EEG oscillations...) and somewhat weaken interpretability of this section.

      Comments on revisions:

      My prior recommendations have been mostly addressed.

      Questions remaining about the NBS results:

      The authors write about the NBS cluster: "Visual examination of the cluster roughly points to the same four posterior-anterior and ventral-dorsal modules identified formally in main-text ". I think it might be good to add quantification, not just visual inspection. The size of the significant NBS cluster should be reported. What proportion of the edges that passed uncorrected threshold and entered NBS were part of the NBS cluster? Put simply, I don't think any edges beyond those passing NBS-based correction should be interpreted or used downstream in the manuscript.

      Also, NBS is not typically used by collapsing over effects in two effect directions, but the authors use NBS on the absolute value of Z. I understand the logic of the general manuscript focusing on strength rather than direction, but here I am wondering about the methodological validity. I believe that the editor who is an expert on the methodology may be able to comment on the validity of this approach (as opposed to running two separate NBS analyses for the two directions of effect).

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates putative networks associated with prediction errors in task-based and resting state fMRI. It attempts to test the idea that prediction errors minimisation includes abstract cognitive functions, referred to as global prediction error hypothesis, by establishing a parallel between networks found in task-based fMRI where prediction errors are elicited in a controlled manner and those networks that emerge during "resting state".

      Strengths:

      Clearly a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall well written with a couple of exceptions on clarity as per below and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put, "finding the same networks during a prediction error task and during rest does not mean that the networks engagement during rest reflect prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed to that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification. Which ROIs are excluded and what is the evidence for exclusion?

      The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

      Comments on revisions:

      The authors have done a good job of addressing the issues raised during the review process. There is one issue remaining that still required attention. In R2.4. when referring to "existing knowledge of prominent structural pathways among these quadrants" please cite the relevant literature.

    4. Reviewer #3 (Public review):

      Summary:

      Bogdan et al. present an intriguing investigation into the spontaneous dynamics of prediction error (PE)-related brain states. Using two independent fMRI tasks designed to elicit prediction and prediction error in separate participant samples, alongside both fMRI and EEG data, the authors identify convergent brain network patterns associated with high versus low PE. Notably, they further show that similar patterns can be detected during resting-state fMRI, suggesting that PE-related neural states may recur outside of explicit task demands.

      Strengths:

      The authors use a well-integrated analytic framework that combines multiple prediction tasks and brain imaging modalities. The inclusion of several datasets probing PE under different contexts strengthens the claim of generalizability across tasks and samples. The open sharing of code and data is commendable and will be valuable for future work seeking to build on this framework.

      Weaknesses:

      A central challenge of the manuscript lies in interpreting the functional significance of PE-related brain network states during rest. Demonstrating that a task-defined cognitive state recurs spontaneously is intriguing, but without clear links to behavior, individual traits, or experiential content during rest, it remains difficult to interpret what such spontaneous brain states tell us about the mind and brain. For example, it is unclear whether these states support future inference or learning, reflect offline predictive processing, or instead suggest state reinstatement due to a more general form of neural plasticity and circuit dynamics in the brain. Demonstrating any one of these downstream relationships would be valuable since it has the potential to inform our understanding of cognitive function or more general principles of neural organization.

      I appreciate the authors' position that establishing the existence of such states is a necessary first step, and that future work may clarify their behavioral relevance. However, the current form makes it challenging to assess the conceptual advance of the present work in isolation.

      Relatedly, in my previous review I raised questions about both across- and within-individual variability-for example, whether individuals who exhibit stronger or more distinct PE-related fluctuations at rest also show superior performance on prediction-related tasks (across-individual), or whether momentary increases in PE-network expression during tasks relate to faster or more accurate prediction (within-individual). The authors thoughtfully addressed this suggestion by conducting an individual-differences analysis correlating each participant's fluctuation amplitude with approximately 200 behavioral and trait measures from the HCP dataset.

      The reported findings-a negative association with age and card-sorting performance, alongside a positive association with age-adjusted picture sequence memory-are interesting but difficult to interpret within a coherent functional framework. As presented, these results do not clearly support the idea that spontaneous PE-state fluctuations are related to enhancement in prediction, inference, or broader cognitive function. Instead, they raise the possibility that fluctuation amplitude may reflect more general factors (e.g., age) rather than a functionally meaningful PE-related process.

      Overall, while the methodological contribution is strong, the manuscript would benefit from a clearer articulation of what functional conclusions can or cannot be drawn from the presence of spontaneous PE-related states, as well as a more cautious framing of their potential cognitive significance.

      Further comments:

      I appreciate that the authors took my earlier suggestions seriously and incorporated additional analyses examining behavioral relevance and permutation tests in the revision.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The Reviewer structured their review such that their first two recommendations specifically concerned the two major weaknesses they viewed in the initial submission. For clarity and concision, we have copied their recommendations to be placed immediately following their corresponding points on weaknesses.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Weaknesses:

      Major:

      (R1.1) Lack of multiple comparisons correction for edge-wise contrast:

      The analysis of connectivity differences across three levels of prediction error was conducted separately for approximately 22,000 edges (derived from 210 regions), yet no correction for multiple comparisons appears to have been applied. Then, modularity was applied to the top 5% of these edges. I do not believe that this approach is viable without correction. It does not help that a completely separate approach using SVMs was FDR-corrected for 210 regions.

      [Later recommendation] Regarding the first major point: To address the issue of multiple comparisons in the edge-wise connectivity analysis, I recommend using the Network-Based Statistic (NBS; Zalesky et al., 2010). NBS is well-suited for identifying clusters (analogous to modules) of edges that show statistically significant differences across the three prediction error levels, while appropriately correcting for multiple comparisons.

      Thank you for bringing this up. We acknowledge that our modularity analysis does not evaluate statistical significance. Originally, the modularity analysis was meant to provide a connectome-wide summary of the connectivity effects, whereas the classification-based analysis was meant to address the need for statistical significance testing. However, as the reviewer points out, it would be better if significance were tested in a manner more analogous to the reported modules. As they suggest, we updated the Supplemental Materials (SM) to include the results of Network-Based Statistic analysis (SM p. 1-2):

      “(2.1) Network-Based Statistic

      Here, we evaluate whether PE significantly impacts connectivity at the network level using the Network-Based Statistic (NBS) approach.[1] NBS relied on the same regression data generated for the main-text analysis, whereby a regression is performed examining the effect of PE (Low = –1, Medium = 0, High = +1) on connectivity for each edge. This was done across the connectome, and for each edge, a z-score was computed. For NBS, we thresholded edges to |Z| > 3.0, which yielded one large network cluster, shown in Figure S3. The size of the cluster – i.e., number of edges – was significant (p < .05) per a permutation-test using 1,000 random shuffles of the condition data for each participant, as is standard.[1] These results demonstrate that the networklevel effects of PE on connectivity are significant. The main-text modularity analysis converts this large cluster into four modules, which are more interpretable and open the door to further analyses”.

      We updated the Results to mention these findings before describing the modularity analysis (p. 8-9):

      “After demonstrating that PE significantly influences brain-wide connectivity using Network-Based Statistic analysis (Supplemental Materials 2.1), we conducted a modularity analysis to study how specific groups of edges are all sensitive to high/low-PE information.”

      (R1.2) Lack of spatial information in EEG:

      The EEG data were not source-localized, and no connectivity analysis was performed. Instead, power fluctuations were averaged across a predefined set of electrodes based on a single prior study (reference 27), as well as across a broader set of electrodes. While the study correlates these EEG power fluctuations with fMRI network connectivity over time, such temporal correlations do not establish that the EEG oscillations originate from the corresponding network regions. For instance, the observed fronto-central theta power increases could plausibly originate from the dorsal anterior cingulate cortex (dACC), as consistently reported in the literature, rather than from a distributed network. The spatially agnostic nature of the EEG-fMRI correlation approach used here does not support interpretations tied to specific dorsal-ventral or anterior-posterior networks. Nonetheless, such interpretations are made throughout the manuscript, which overextends the conclusions that can be drawn from the data.

      [Later recommendation] Regarding the second major point: I suggest either adopting a source-localized EEG approach to assess electrophysiological connectivity or revising all related sections to avoid implying spatial specificity or direct correspondence with fMRI-derived networks. The current approach, which relies on electrode-level power fluctuations, does not support claims about the spatial origin of EEG signals or their alignment with specific connectivity networks.

      We thank the reviewer for this important point, which allows us to clarify the specific and distinct contributions of each imaging modality in our study. Our primary goal for Study 3 was to leverage the high temporal resolution of EEG to identify the characteristic frequency at which the fMRI-defined global connectivity states fluctuate. The study was not designed to infer the spatial origin of these EEG signals, a task for which fMRI is better suited and which we addressed in Studies 1 and 2.

      As the reviewer points out, fronto-central theta is generally associated with the dACC. We agree with this point entirely. We suspect that there is some process linking dACC activation to the identified network fluctuations – some type of relationship that does not manifest in our dynamic functional connectivity analyses – although this is only a hypothesis and one that is beyond the present scope.

      We updated the Discussion to mention these points and acknowledge the ambiguity regarding the correlation between network fluctuation amplitude (fMRI) and Delta/Theta power (EEG) (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism. Additionally, Theta is widely seen as a sign of dorsal anterior cingulate cortex activity,3 and it is unclear how to reconcile this with our claims about network fluctuations. Nonetheless, as we show with simulations (Supplemental Materials 5), a correlation between slow fMRI network fluctuations and fast EEG Delta/Theta oscillations is also consistent with a common global neural process oscillating rapidly and eliciting both measures.”

      Regarding source-localization, several papers have described known limitations of this strategy for drawing precise anatomical inferences,[4–6] and this seems unnecessary given that our fMRI analyses already provide more robust anatomical precision. We intentionally used EEG in our study for what it measures most robustly: millisecond-level temporal dynamics.

      (R1.2a)Examples of problematic language include:

      Line 134: "detection of network oscillations at fast speeds" - the current EEG approach does not measure networks.

      This is an important issue. We acknowledge that our EEG approach does not directly measure fMRI-defined networks. Our claim is inferential, designed to estimate the temporal dynamics of the large-scale fMRI patterns we identified. The correlation between our fMRI-derived fluctuation amplitude (|PA – VD|) and 3-6 Hz EEG power provides suggestive evidence that the transitions between these network states occur at this frequency, rather than being a direct measurement of network oscillations.

      To support the validity of this inference, we performed two key analyses (now in Supplemental Materials). First, a simulation study provides a proof-of-concept, confirming our method can recover the frequency of a fast underlying oscillator from slow fMRI and fast EEG data. Second, a specificity analysis shows the EEG correlation is unique to our measure of fluctuation amplitude and not to simpler measures like overall connectivity strength. These analyses demonstrate that our interpretation is more plausible than alternative explanations.

      Overall, we have revised the manuscript to be more conservative in the language employed, such as presenting alternative explanations to the interpretations put forth based on correlative/observational evidence (e.g., our modifications above described in our response to comment R1.2). In addition, we have made changes throughout the report to state the issues related to reverse inference more explicitly and to better communicate that the evidence is suggestive – please see our numerous changes described in our response to comment R3.1. For the statement that the reviewer specifically mentioned here, we revised it to be more cautious (p. 7):

      “Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.”

      (R1.2b) Line 148: "whether fluctuations between high- and low-PE networks occur sufficiently fast" - this implies spatial localization to networks that is not supported by the EEG analysis.

      Building on our changes described in our immediately prior response, we adjusted our text here to say our analyses searched for evidence consistent with the idea that the network fluctuations occur quickly rather than searching for decisive evidence favoring this idea (p. 7-8):

      “Finally, we examined rs-fMRI-EEG data to assess whether we find parallels consistent with the high/low-PE network fluctuations occurring at fast timescales suitable for the type of cognitive operations typically targeted by PE theories.”

      (R1.2c) Line 480: "how underlying neural oscillators can produce BOLD and EEG measurements" - no evidence is provided that the same neural sources underlie both modalities.

      As described above, these claims are based on the simulation study demonstrating that this is a possibility, and we have revised the manuscript overall to be clearer that this is our interpretation while providing alternative explanations.

      Reviewer #2 (Public review):

      Strengths:

      Clearly, a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall, well written with a couple of exceptions on clarity, as per below, and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      (R2.1) The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put it, "finding the same networks during a prediction error task and during rest does not mean that the networks' engagement during rest reflects prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      We thank the reviewer for this comment. We agree that reverse inference is a fundamental challenge and that our central claim requires a particularly high bar of evidence. While no single analysis resolves this issue, our goal was to build a cumulative case that is compelling by converging on the same conclusion from multiple, independent lines of evidence.

      For our investigation, we initially established a task-general signature of prediction error (PE). By showing the same neural pattern represents PE in different contexts, we constrain the reverse inference, making it less likely that our findings are a task-specific artifact and more likely that they reflect the core, underlying process of PE. Building on this, our most compelling evidence comes from linking task and rest at the individual level. We didn't just find the same general network at rest; we showed that an individual’s unique anatomical pattern of PE-related connectivity during the task specifically predicts their own brain's fluctuation patterns at rest. This highly specific, person-by-person correspondence provides a direct bridge between an individual's task-evoked PE processing and their intrinsic, resting-state dynamics. Furthermore, these resting-state fluctuations correlate specifically with the 3-6 Hz theta rhythm—a well-established neural marker for PE.

      While reverse inference remains a fundamental limitation for many studies on resting-state cognition, the aspects mentioned above, we believe, provide suggestive evidence, favoring our PE interpretation. Nonetheless, we have made changes throughout the manuscript to be more conservative in the language we use to describe our results, to make it clear what claims are based on correlative/observational evidence, and to put forth alternative explanations for the identified effects. Please find our numerous changes detailed in our response to comment R3.1.

      (R2.2) Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed for that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      We (and some others) take a broad interpretation of PE and believe it is often more intuitive to think about PE minimization in terms of uncertainty rather than “surprise”; the word “surprise” usually implies a sudden emotive reaction from the violation of expectations, which is not useful here.

      When planning future actions, each step of the plan is spurred by the uncertainty of what is the appropriate action given the scenario set up by prior steps. Each planned step erases some of that uncertainty. For example, you may be mentally simulating a conversation, what you will say, and what another person will say. Each step of this creates uncertainty of “what is the appropriate response?” Each reasoning step addresses contingencies. While planning, you may also uncover more obvious forms of uncertainty, sparking memory retrieval to finish it. A resting-state participant may think to cook a frozen pizza when they arrive home, but be uncertain about whether they have any frozen pizzas left, prompting episodic memory retrieval to address this uncertainty. We argue that every planning step or memory retrieval can be productively understood as being sparked by uncertainty/surprise (PE), and the subsequent cognitive response minimizes this uncertainty.

      We updated the Introduction to include a paragraph near the start providing this explanation (p. 3-4):

      “PE minimization may broadly coordinate brain functions of all sorts, including abstract cognitive functions. This includes the types of cognitive processes at play even in the absence of stimuli (e.g., while daydreaming). While it may seem counterintuitive to associate this type of cognition with PE – a concept often tied to external surprises – it has been proposed that the brain's internal generative model is continuously active.[12–14] Spontaneous thought, such as planning a future event or replaying a memory, is not a passive, low-PE process. Rather, it can be seen as a dynamic cycle of generating and resolving internal uncertainty. While daydreaming, you may be reminded of a past conversation, where you wish you had said something different. This situation contains uncertainty about what would have been the best thing to say. Wondering about what you wish you said can be viewed as resolving this uncertainty, in principle, forming a plan if the same situation ever arises again in the future. Each iteration of the simulated conversation repeatedly sparks and then resolves this type of uncertainty.”

      (R2.3)The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      We thank the reviewer for this opportunity to clarify our method. A single correlation between the full, aggregated networks would be conceptually misaligned with what we aimed to assess. To test for a personspecific anatomical correspondence, it is necessary to examine the link between task and rest at a granular level. We therefore asked whether the specific parts of an individual's network most responsive to PE during the task are the same parts that show the strongest fluctuations at rest. Our analysis, performed iteratively across all 3,432 possible ROI subsets, was designed specifically to answer this question, which would be obscured by an aggregated network measure.

      We appreciate the reviewer's concern about the modest effect size (r = .021). However, this must be contextualized, as the short task scan has very low reliability (.08), which imposes a severe statistical ceiling on any possible task-rest correlation. Finding a highly significant effect (p < .001) in the face of such noisy data, therefore, provides robust evidence for a genuine task-rest correspondence.

      We updated the Discussion to discuss this point (p. 22-23):

      “A key finding supporting our interpretation is the significant link between individual differences in task-evoked PE responses and resting-state fluctuations. One might initially view the effect size of this correspondence (r = .021) as modest. However, this interpretation must be contextualized by the considerable measurement noise inherent in short task-fMRI scans; the split-half reliability of the task contrast was only .08. This low reliability imposes a severe statistical ceiling on any possible task-rest correlation. Therefore, detecting a highly significant (p < .001) relationship despite this constraint provides robust evidence for a genuine link. Furthermore, our analytical approach, which iteratively examined thousands of ROI subsets rather than one aggregated network, was intentionally granular. The goal was not simply to correlate two global measures, but to test for a personspecific anatomical correspondence – that is, whether the specific parts of an individual's network most sensitive to PE during the task are the same parts that fluctuate most strongly at rest. An aggregate analysis would obscure this critical spatial specificity. Taken together, this granular analysis provides compelling evidence for an anatomically consistent fingerprint of PE processing that bridges task-evoked activity and spontaneous restingstate dynamics, strengthening our central claim.”

      (R2.4) Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification.Which ROIs are excluded, and what is the evidence for exclusion?

      Our four-quadrant model is a principled simplification designed to distill the dominant, large-scale connectivity patterns from the complex modularity results. This approach focuses on coherent, well-documented anatomical streams while setting aside a few anatomically distant and disjoint ROIs that were less central to the main modules. This heuristic additionally unlocks more robust and novel analyses.

      The two low-PE posterior-anterior (PA) pathways are grounded in canonical processing streams. (i) The OCATL connection mirrors the ventral visual stream (the “what” pathway), which is fundamental for object recognition and is upregulated during the smooth processing of expected stimuli. (ii) The IPL-LPFC connection represents a core axis of the dorsal attention stream and the Fronto-Parietal Control Network (FPCN), reflecting the maintenance of top-down cognitive control when information is predictable; the IPL-LPFC module excludes ROIs in the middle temporal gyrus, which are often associated with the FPCN but are not covered here.

      In contrast, the two high-PE ventral-dorsal (VD) pathways reflect processes for resolving surprise and conflict. (i) The OC-IPL connection is a classic signature of attentional reorienting, where unexpected sensory input (high PE) triggers a necessary shift in attention; the OC-IPL module excludes some ROIs that are anterior to the occipital lobe and enter the fusiform gyrus and inferior temporal lobe. (ii) The ATL-LPFC connection aligns with mechanisms for semantic re-evaluation, engaging prefrontal control regions to update a mental model in the face of incongruent information.

      Beyond its functional/anatomical grounding, this simplification provides powerful methodological and statistical advantages. It establishes a symmetrical framework that makes our dynamic connectivity analyses tractable, such as our “cube” analysis of state transitions, which required overlapping modules. Critically, this model also offers a statistical safeguard. By ensuring each quadrant contributes to both low- and high-PE connectivity patterns, we eliminate confounds like region-specific signal variance or global connectivity. This design choice isolates the phenomenon to the pattern of connectivity itself (posterior-anterior vs. ventral-dorsal), making our interpretation more robust.

      We updated the end of the Study 1A results (p. 10-11):

      “Some ROIs appear in Figure 2C but are excluded from the four targeted quadrants (Figures 2C & 2D) – e.g., posterior inferior temporal lobe and fusiform ROIs are excluded from the OC-IPL module, and middle temporal gyrus ROIs are excluded from the IPL-LPFC modules. These exclusions, in favor of a four-quadrant interpretation, are motivated by existing knowledge of prominent structural pathways among these quadrants. This interpretation is also supported by classifier-based analyses showing connectivity within each quadrant is significantly influenced by PE (Supplemental Materials 2.2), along with analyses of single-region activity showing that these areas also respond to PE independently (Supplemental Materials 3). Hence, we proceeded with further analyses of these quadrants’ connections, which summarize PE’s global brain effects.

      “This four-quadrant setup also imparts analytical benefits. First, this simplified structure may better generalize across PE tasks, and Study 1B would aim to replicate these results with a different design. Second, the four quadrants mean that each ROI contributes to both the posterior-anterior and ventral-dorsal modules, which would benefit later analyses and rules out confounds such as PE eliciting increased/decreased connectivity between an ROI and the rest of the brain. An additional, less key benefit is that this setup allows more easily evaluating whether the same phenomena arise using a different atlas (Supplemental Materials Y).”

      (R2.5) The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower, while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

      We thank the reviewer for raising this important point, which allows us to clarify the logic of our multimodal analysis. Our analysis does not claim that the fMRI BOLD signal itself oscillates at 3-6 Hz. Instead, it is based on the principle that the intensity of a fast neural process can be reflected in the magnitude of the slow BOLD response. It’s akin to using a long-exposure photograph to capture a fast-moving object; while the individual movements are blurred, the intensity of the blur in the photo serves as a proxy for the intensity of the underlying motion. In our case, the magnitude of the fMRI network difference (|PA – VD|) acts as the "blur," reflecting the intensity of the rapid fluctuations between states within that time window.

      Following this logic, we correlated this slow-moving fMRI metric with the power of the fast EEG rhythms, which reflects their amplitude. To bridge the different timescales, we averaged the EEG power over each fMRI time window and convolved it with the standard hemodynamic response function (HRF) – a crucial step to align the timing of the neural and metabolic signals. The resulting significant correlation specifically in the 3-6 Hz band demonstrates that when this rhythm is stronger, the fMRI data shows a greater divergence between network states. This allows us to infer the characteristic frequency of the underlying neural fluctuations without directly measuring them at that speed with fMRI, thus reconciling the two timescales.

      Reviewer #3 (Public review):

      Bogdan et al. present an intriguing and timely investigation into the intrinsic dynamics of prediction error (PE)-related brain states. The manuscript is grounded in an intuitive and compelling theoretical idea: that the brain alternates between high and low PE states even at rest, potentially reflecting an intrinsic drive toward predictive minimization. The authors employ a creative analytic framework combining different prediction tasks and imaging modalities. They shared open code, which will be valuable for future work.

      (R3.1) Consistency in Theoretical Framing

      The title, abstract, and introduction suggest inconsistent theoretical goals of the study.

      The title suggests that the goal is to test whether there are intrinsic fluctuations in high and low PE states at rest. The abstract and introduction suggest that the goal is to test whether the brain intrinsically minimizes PE and whether this minimization recruits global brain networks. My comments here are that a) these are fundamentally different claims, and b) both are challenging to falsify. For one, task-like recurrence of PE states during resting might reflect the wiring and geometry of the functional organization of the brain emerging from neurobiological constraints or developmental processes (e.g., experience), but showing that mirroring exists because of the need to minimize PE requires establishing a robust relationship with behavior or showing a causal effect (e.g., that interrupting intrinsic PE state fluctuations affects prediction).

      The global PE hypothesis-"PE minimization is a principle that broadly coordinates brain functions of all sorts, including abstract cognitive functions"-is more suitable for discussion rather than the main claim in the abstract, introduction, and all throughout the paper.

      Given the above, I recommend that the authors clarify and align their core theoretical goals across the title, abstract, introduction, and results. If the focus is on identifying fluctuations that resemble taskdefined PE states at rest, the language should reflect that more narrowly, and save broader claims about global PE minimization for the discussion. This hypothesis also needs to be contextualized within prior work. I'd like to see if there is similar evidence in the literature using animal models.

      Thank you for bringing up this issue. We have made changes throughout the paper to address these points. First, we have omitted reference to a “global PE hypothesis” from the Abstract and Introduction, in favor of structuring the Introduction in terms of a falsifiable question (p. 4):

      “We pursued this goal using three studies (Figure 1) that collectively targeted a specific question: Do the taskdefined connectivity signatures of high vs. low PE also recur during rest, and if so, how does the brain transition between exhibiting high/low signatures?”

      We made changes later in the Introduction to clarify that the investigation is based on correlative evidence and requires interpretations that may be debated (p. 5-7):

      “Although this does not entirely address the reverse inference dilemma and can only produce correlative evidence, the present research nonetheless investigates these widely speculated upon PE ideas more directly than any prior work.

      Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.

      Second, we examined the recruitment of these networks during rs-fMRI, and although the problems related to reverse inference are impossible to overcome fully, we engage with this issue by linking rs-fMRI data directly to task-fMRI data of the same participants, which can provide suggestive evidence that the same neural mechanisms are at play in both.”

      We made changes throughout the Results now better describing the results as consistent with a hypothesis rather than demonstrating it (p. 12-19):

      “In other words, we essentially asked whether resting-state participants are sometimes in low PE states and sometimes in high PE states, which would be consistent with spontaneous PE processing in the absence of stimuli.

      These emerging states overlap strikingly with the previous task effects of PE, suggesting that rs-fMRI scans exhibit fluctuations that resemble the signatures of low- and high-PE states. 

      To be clear, this does not entirely dissuade concerns about reverse inference, which would require a type of causal manipulation that is difficult (if not impossible) to perform in a resting state scan. Nonetheless, these results provide further evidence consistent with our interpretation that the resting brain spontaneously fluctuates between high/low PE network states.

      These patterns are most consistent with a characteristic timescale near 3–6 Hz for the amplitude of the putative high/low-PE fluctuations. This is notably consistent with established links between PE and Delta/Theta and is further consistent with an interpretation in which these fluctuations relate to PE-related processing during rest.”

      We have also made targeted edits to the Discussion to present the findings in a more cautious way, more clearly state what is our interpretation, and provide alternative explanations (p. 19-26):

      “The present research conducted task-fMRI, rs-fMRI, and rs-fMRI-EEG studies to clarify whether PE elicits global connectivity effects and whether the signatures of PE processing arise spontaneously during rest. This investigation carries implications for how PE minimization may characterize abstract task-general cognitive processes. […] Although there are different ways to interpret this correlation, it is consistent with high/low PE states generally fluctuating at 3-6 Hz during rest. Below, we discuss these three studies’ findings.

      Our rs-fMRI investigation examined whether resting dynamics resemble the task-defined connectivity signatures of high vs. low PE, independent of the type of stimulus encountered. The resting-state analyses indeed found that, even at rest, participants’ brains fluctuated between strong ventral-dorsal connectivity and strong posterior-anterior connectivity, consistent with shifts between states of high and low PE. This conclusion is based on correlative/observational evidence and so may be controversial as it relies on reverse inference.

      These patterns resemble global connectivity signatures seen in resting-state participants, and correlations between fMRI and EEG data yield associations, consistent with participants fluctuating between high-PE (ventral-dorsal) and low-PE (posterior-anterior) states at 3-6 Hz. Although definitively testing these ideas is challenging, given that rs-fMRI is defined by the absence of any causal manipulations, our results provide evidence consistent with PE minimization playing a role beyond stimulus process.”

      (R3.2) Interpretation of PE-Related Fluctuations at Rest and Its Functional Relevance. It would strengthen the paper to clarify what is meant by "intrinsic" state fluctuations. Intrinsic might mean taskindependent, trait-like, or spontaneously generated. Which do the authors mean here? Is the key prediction that these fluctuations will persist in the absence of a prediction task?

      Of the three terms the reviewer mentioned, “spontaneous” and “task-independent” are the most accurate descriptors. We conceptualize these fluctuations as a continuous background process that persists across all facets of cognition, without requiring a task explicitly designed to elicit prediction error – although we, along with other predictive coding papers, would argue that all cognitive tasks are fundamentally rooted in PE mechanisms and thus anything can be seen as a “prediction task” (see our response to comment R2.2 for our changes to the Introduction that provide more intuition for this point). The proposed interactions can be seen as analogous to cortico-basal-thalamic loops, which are engaged across a vast and diverse array of cognitive processes.

      The prior submission only used the word “intrinsic” in the title. We have since revised it to “spontaneous,” which is more specific than “intrinsic,” and we believe clearer for a title than “task-independent” (p. 1): “Spontaneous fluctuations in global connectivity reflect transitions between states of high and low prediction error”

      We have also made tweaks throughout the manuscript to now use “spontaneously” throughout (it now appears 8 times in the paper).

      Regardless of the intrinsic argument, I find it challenging to interpret the results as evidence of PE fluctuations at rest. What the authors show directly is that the degree to which a subset of regions within a PE network discriminates high vs. low PE during task correlates with the magnitude of separation between high and low PE states during rest. While this is an interesting relationship, it does not establish that the resting-state brain spontaneously alternates between high and low PE states, nor that it does so in a functionally meaningful way that is related to behavior. How can we rule out brain dynamics of other processes, such as arousal, that also rise and fall with PE? I understand the authors' intention to address the reverse inference concern by testing whether "a participant's unique connectivity response to PE in the reward-processing task should match their specific patterns of resting-state fluctuation". However, I'm not fully convinced that this analysis establishes the functional role of the identified modules to PE because of the following:

      Theoretically, relating the activities of the identified modules directly to behavior would demonstrate a stronger functional role.

      (R3.2a) Across participants: Do individuals who exhibit stronger or more distinct PE-related fluctuations at rest also perform better on tasks that require prediction or inference? This could be assessed using the HCP prediction task, though if individual variability is limited (e.g., due to ceiling effects), I would suggest exploring a dataset with a prediction task that has greater behavioral variance.

      This is a good idea, but unfortunately difficult to test with our present data. The HCP gambling task used in our study was not designed to measure individual differences in prediction or inference and likely suffers from ceiling effects. Because the task outcomes are predetermined and not linked to participants' choices, there is very little meaningful behavioral variance in performance to correlate with our resting-state fluctuation measure.

      While we agree that exploring a different dataset with a more suitable task would be ideal, given the scope of the existing manuscript, this seems like it would be too much. Although these results would be informative, they would ultimately still not be a panacea for the reverse inference issues.

      Or even more broadly, does this variability in resting state PE state fluctuations predict general cognitive abilities like WM and attention (which the HCP dataset also provides)? I appreciate the inclusion of the win-loss control, and I can see the intention to address specificity. This would test whether PE state fluctuations reflect something about general cognition, but also above and beyond these attentional or WM processes that we know are fluctuating.

      This is a helpful suggestion, motivating new analyses: We measured the degree of resting-state fluctuation amplitude across participants and correlated it with the different individual differences measures provided with the HCP data (e.g., measures of WM performance). We computed each participant’s fluctuation amplitude measure as the average absolute difference between posterior-anterior and ventral-dorsal connectivity; this is the average of the TR-by-TR fMRI amplitude measure from Study 3. We correlated this individual difference score with all of the ~200 individual difference measures provided with the HCP dataset (e.g., measures of intelligence or personality). We measured the Spearman correlation between mean fluctuation amplitude with each of those ~200 measures, while correcting for multiple hypotheses using the False Discovery Rate approach.[18]

      We found a robust negative association with age, where older participants tend to display weaker fluctuations (r = -.16, p < .001). We additionally find a positive association with the age-adjusted score on the picture sequence task (r = .12, p<sub>corrected</sub> = .03) and a negative association with performance in the card sort task (r = -.12, p<sub>corrected</sub> = 046). It is unclear how to interpret these associations, without being speculative, given that fluctuation amplitude shows one positive association with performance and one negative association, albeit across entirely different tasks.  We have added these correlation results as Supplemental Materials 8 (SM p. 11):

      “(8) Behavioral differences related to fluctuation amplitude 

      To investigate whether individual differences in the magnitude of resting-state PE-state fluctuations predict general cognitive abilities, we correlated our resting-state fluctuation measure with the cognitive and demographic variables provided in the HCP dataset.

      (8.1) Methods

      For each of the 1,000 participants, we calculated a single fluctuation amplitude score. This score was defined as the average absolute difference between the time-varying posterior-anterior (PA) and ventral-dorsal (VD) connectivity during the resting-state fMRI scan (the average of the TR-by-TR measure used for Study 3). We then computed the Spearman correlation between this score and each of the approximately 200 individual difference measures provided in the HCP dataset. We corrected for multiple comparisons using the False Discovery Rate (FDR) approach.

      (8.2) Results

      The correlations revealed a robust negative association between fluctuation amplitude and age, indicating that older participants tended to display weaker fluctuations (r = -.16, p<sub>corrected</sub> < .001). After correction, two significant correlations with cognitive performance emerged: (i) a positive association with the age-adjusted score on the Picture Sequence Memory Test (r = .12, p<sub>corrected</sub> = .03), (ii) a negative association with performance on the Card Sort Task (r = -.12, p<sub>corrected</sub> = .046). As greater fluctuation amplitude is linked to better performance on one task but worse performance on another, it is unclear how to interpret these findings.”

      We updated the main text Methods to direct readers to this content (p. 39-40):

      “(4.4.3) Links between network fluctuations and behavior

      We considered whether the extent of PE-related network expression states during resting-state is behaviorally relevant. We specifically investigated whether individual differences in the overall magnitude of resting-state fluctuations could predict individual difference measures, provided with the HCP dataset. This yielded a significant association with age, whereby older participants tended to display weaker fluctuations. However, associations with cognitive measures were limited. A full description of these analyses is provided in Supplemental Materials 8.”

      (R3.2b) Within participants: Do momentary increases in PE-network expression during tasks relate to better or faster prediction? In other words, is there evidence that stronger expression of PE-related states is associated with better behavioral outcomes?

      This is a good question that probes the direct behavioral relevance of these network states on a trial-by-trial basis. We agree with the reviewer's intuition; in principle, one would expect a stronger expression of the low-PE network state on trials where a participant correctly and quickly gives a high likelihood rating to a predictable stimulus.

      Following this suggestion, we performed a new analysis in Study 1A to test this. We found that while network expression was indeed linked to participants’ likelihood ratings: higher likelihood ratings correspond to stronger posterior-anterior connectivity, whereas lower ratings correspond to stronger ventral-dorsal connectivity (Connectivity-Direction × likelihood, β [standardized] = .28, p = .02). Yet, this is not a strong test of the reviewer’s hypothesis, and different exploratory analyses of response time yield null results (p > .05). We suspect that this is due to the effect being too subtle, so we have insufficient statistical power. A comparable analysis was not feasible for Study 1B, as its design does not provide an analogous behavioral measure of trialby-trial prediction success.

      (R3.3) A priori Hypothesis for EEG Frequency Analysis.

      It's unclear how to interpret the finding that fMRI fluctuations in the defined modules correlate with frontal Delta/Theta power, specifically in the 3-6 Hz range. However, in the EEG literature, this frequency band is most commonly associated with low arousal, drowsiness, and mind wandering in resting, awake adults, not uniquely with prediction error processing. An a priori hypothesis is lacking here: what specific frequency band would we expect to track spontaneous PE signals at rest, and why? Without this, it is difficult to separate a PE-based interpretation from more general arousal or vigilance fluctuations.

      This point gets to the heart of the challenge with reverse inference in resting-state fMRI. We agree that an interpretation based on general arousal or drowsiness is a potential alternative that must be considered. However, what makes a simple arousal interpretation challenging is the highly specific nature of our fMRI-EEG association. As shown in our confirmatory analyses (Supplemental Materials 6), the correlation with 3-6 Hz power was found exclusively with the absolute difference between our two PE-related network states (|PA – VD|)—a measure of fluctuation amplitude. We found no significant relationship with the signed difference (a bias toward one state) or the sum (the overall level of connectivity). This specificity presents a puzzle for a simple drowsiness account; it seems less plausible that drowsiness would manifest specifically as the intensity of fluctuation between two complex cognitive networks, rather than as a more straightforward change in overall connectivity. While we cannot definitively rule out contributions from arousal, the specificity of our finding provides stronger evidence for a structured cognitive process, like PE, than for a general, undifferentiated state. 

      We updated the Discussion to make the argument above and also to remind readers that alternative explanations, such as ones based on drowsiness, are possible (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism.”

      (R3.4) Significance Assessment

      The significance of the correlation above and all other correlation analyses should be assessed through a permutation test rather than a single parametric t-test against zero. There are a few reasons: a) EEG and fMRI time series are autocorrelated, violating the independence assumption of parametric tests;

      Standard t-tests can underestimate the true null distribution's variance, because EEG-fMRI correlations often involve shared slow drifts or noise sources, which can yield spurious correlations and inflating false positives unless tested against an appropriate null.

      Building a null distribution that preserves the slow drifts, for example, would help us understand how likely it is for the two time series to be correlated when the slow drifts are still present, and how much better the current correlation is, compared to this more conservative null. You can perform this by phase randomizing one of the two time courses N times (e.g., N=1000), which maintains the autocorrelation structure while breaking any true co-occurrence in patterns between the two time series, and compute a non-parametric p-value. I suggest using this approach in all correlation analyses between two time series.

      This is an important statistical point to clarify, and the suggested analysis is valuable. The reviewer is correct that the raw fMRI and EEG time series are autocorrelated. However, because our statistical approach is a twolevel analysis, we reasoned that non-independence at the correlation-level would not invalidate the higher-level t-test. The t-test’s assumption of independence applies to the individual participants' coefficients, which are independent across participants. Thus, we believe that our initial approach is broadly appropriate, and its simplicity allows it to be easily communicated.

      Nonetheless, the permutation-testing procedure that the Reviewer describes seems like an important analysis to test, given that permutation-testing is the gold standard for evaluating statistical significance, and it could guarantee that our above logic is correct. We thus computed the analysis as the reviewer described. For each participant, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This was done for each participant once per permutation. Each participant’s phase-randomized data was submitted to the analysis of each oscillatory power band as originally, generating one mean correlation for each band. This was done 1,000 times.

      Across the five bands, we find that the grand mean correlation is near zero (M<sub>r</sub> = .0006) and the 97.5<sup>th</sup> percentile critical value of the null distribution is r = ~.025; this 97.5<sup>th</sup> percentile corresponds to the upper end of a 95% confidence interval for a band’s correlation; the threshold minimally differs across bands (.024 < rs < .026). Our original correlation coefficients for Delta (M<sub>r</sub> = .042) and Theta (M<sub>r</sub> = .041), which our conclusions focused on, remained significant (p ≤ .002); we can perform family-wise error-rate correction by taking the highest correlation across any band for a given permutation, and the Delta and Theta effects remain significant (p<sub>FWE</sub>corrected ≤ .003); previously Reviewer comment R1.4c requested that we employ family-wise error correction.

      These correlations were previously reported in Table 1, and we updated the caption to note what effects remain significant when evaluated using permutation-testing and with family-wise error correction (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      We updated the Methods to describe the permutation-testing analysis (p. 43):

      “To confirm the significance of our fMRI-EEG correlations with a non-parametric approach, we performed a group-level permutation-test. For each of 1,000 permutations, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This procedure breaks the true temporal relationship between the fMRI and EEG data while preserving its structure. We then re-computed the mean Spearman correlation for each frequency band using this phase-randomized data. We evaluated significance using a family-wise error correction approach that accounts for us analyzing five oscillatory power bands. We thus create a null distribution composed of the maximum correlation value observed across all frequency bands from each permutation. Our observed correlations were then tested for significance against this distribution of maximums.”

      (R3.5) Analysis choices

      If I'm understanding correctly, the algorithm used to identify modules does so by assigning nodes to communities, but it does not itself restrict what edges can be formed from these modules. This makes me wonder whether the decision to focus only on connections between adjacent modules, rather than considering the full connectivity, was an analytic choice by the authors. If so, could you clarify the rationale? In particular, what justifies assuming that the gradient of PE states should be captured by edges formed only between nearby modules (as shown in Figure 2E and Figure 4), rather than by the full connectivity matrix? If this restriction is instead a by-product of the algorithm, please explain why this outcome is appropriate for detecting a global signature of PE states in both task and rest.

      We discuss this matter in our response to comment R2.(4).

      When assessing the correspondence across task-fMRI and rs-fMRI in section 2.2.2, why was the pattern during task calculated from selecting a pair of bilateral ROIs (resulting in a group of eight ROIs), and the resting state pattern calculated from posterior-anterior/ventral-dorsal fluctuation modules? Doesn't it make more sense to align the two measures? For example, calculating task effects on these same modules during task and rest?

      We thank the reviewer for this question, as it highlights a point in our methods that we could have explained more clearly. The reviewer is correct that the two measures must be aligned, and we can confirm that they were indeed perfectly matched.

      For the analysis in Section 2.2.2, both the task and resting-state measures were calculated on the exact same anatomical substrate for each comparison. The analysis iteratively selected a symmetrical subset of eight ROIs from our larger four quadrants. For each of these 3,432 iterations, we computed the task-fMRI PE effect (the Connectivity Direction × PE interaction) and the resting-state fluctuation amplitude (E[|PA – VD|]) using the identical set of eight ROIs. The goal of this analysis was precisely to test if the fine-grained anatomical pattern of these effects correlated within an individual across the task and rest states. We will revise the text in Section 2.2.2 to make this direct alignment of the two measures more explicit.

      Recommendations for authors:

      Reviewer #1 (Recommendations for authors):

      (R1.3) Several prior studies have described co-activation or connectivity "templates" that spontaneously alternate during rest and task states, and are linked to behavioral variability. While they are interpreted differently in terms of cognitive function (e.g., in terms of sustained attention: Monica Rosenberg; alertness: Catie Chang), the relationship between these previously reported templates and those identified in the current study warrants discussion. Are the current templates spatially compatible with prior findings while offering new functional interpretations beyond those already proposed in the literature? Or do they represent spatially novel patterns?

      Thank you for this suggestion. Broadly, we do not mean to propose spatially novel patterns but rather focus on how these are repurposed for PE processing. In the Discussion, we link our identified connectivity states to established networks (e.g., the FPCN). We updated this paragraph to mention that these patterns are largely not spatially novel (p. 20):

      “The connectivity patterns put forth are, for the most part, not spatially novel and instead overlap heavily with prior functional and anatomical findings.”

      Regarding the specific networks covered in the prior work by Rosenberg and Chang that the reviewer seems to be referring to, [7,8] this research has emphasized networks anchored heavily in sensorimotor, subcortical– cerebellar, and medial frontal circuits, and so mostly do not overlap with the connectivity effects we put forth.

      (R1.4) Additional points:

      (R1.4a) I do not think that the logic for taking the absolute difference of fMRI connectivity is convincing. What happens if the sign of the difference is maintained ?

      Thank you for pointing out this area that requires clarification. Our analysis targets the amplitude of the fluctuation between brain states, not the direction. We define high fluctuation amplitude as moments when the brain is strongly in either the PA state (PA > VD) or the VD state (VD > PA). The absolute difference |PA – VD| correctly quantifies this intensity, whereas a signed difference would conflate these two distinct high-amplitude moments. Our simulation study (Supplemental Materials, Section 5) provides the theoretical validation for this logic, showing how this absolute difference measure in slow fMRI data can track the amplitude of a fast underlying neural oscillator.

      When the analysis is tested in terms of the signed difference, as suggested by the Reviewer, the association between the fMRI data and EEG power is insignificant for each power band (ps<sub>uncorrected</sub> ≥ .47). We updated Supplemental Materials 6 to include these results. Previously, this section included the fluctuation amplitude (fMRI) × EEG power results while controlling for: (i) the signed difference between posterior-anterior and ventral-dorsal connectivity, (ii) the sum of posterior-anterior and ventral-dorsal connectivity, and (iii) the absolute value of the sum of posterior-anterior and ventral-dorsal connectivity. For completeness, we also now report the correlation between each EEG power band and each of those other three measures (SM, p. 9)

      “We additionally tested the relationship between each of those three measures and the five EEG oscillation bands. Across the 15 tests, there were no associations (ps<sub>uncorrected</sub>  ≥ .04); one uncorrected p-value was at p = .044, although this was expected given that there were 15 tests. Thus, the association between EEG oscillations and the fMRI measure is specific to the absolute difference (i.e., amplitude) measure.”

      (R1.4b) Reasoning of focus on frontal and theta band is weak, and described as "typical" (line 359) based on a single study.

      Sorry about this. There is a rich literature on the link between frontal theta and prediction error,[3,9–11] and we updated the Introduction to include more references to this work (p. 18): “The analysis was first done using power averaged across frontal electrodes, as these are the typical focus of PE research on oscillations.[3,9–11]”

      We have also updated the Methods to cite more studies that motivate our electrode choice (p. 41): “The analyses first targeted five midline frontal electrodes (F3, F1, Fz, F2, F4; BioSemi64 layout), given that this frontal row is typically the focus of executive-function PE research on oscillations.[9–11]”

      (R1.4c) No correction appears to have been applied for the association between EEG power and fMRI connectivity. Given that 100 frequency bins were collapsed into 5 canonical bands, a correction for 5 comparisons seems appropriate. Notably, the strongest effects in the delta and theta bands (particularly at fronto-central electrodes) may still survive correction, but this should be explicitly tested and reported.

      Thanks for this suggestion. We updated the Table 1 caption to mention what results survive family-wise error rate correction – as the reviewer suggests, the Delta/Theta effects would survive Bonferroni correction for five tests, although per a later comment suggesting that we evaluate statistical significance with a permutationtesting approach (comment R3.4), we instead report family-wise error correction based on that. The revised caption is as follows (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      (R1.4d) Line 135. Not sure I understand what you mean by "moods". What is the overall point here?

      The overall argument is that the fluctuations occur rapidly rather than slowly. By slow “moods” we refer to how a participant could enter a high anxiety state of >10 seconds, linked to high PE fluctuations, and then shift into a low anxiety state, linked to low PE fluctuations. We argue that this is not occurring. Regardless, we recognize that referring to lengths of time as short as 10 seconds or so is not a typical use of the word “mood” and is potentially ambiguous, so we have omitted this statement, which was originally on page 6: “Identifying subsecond fluctuations would broaden the relevance of the present results, as they rule out that the PE states derive from various moods.”

      (R1.4e) Line 100. "Few prior PE studies have targeted PE, contrasting the hundreds that have targeted BOLD". I don't understand this sentence. It's presumably about connectivity vs activity?

      Yes, sorry about this typo. The reviewer is correct, and that sentence was meant to mention connectivity. We corrected (p. 5): “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R1.4f) Line 373: "0-0.5Hz" in the caption is probably "0-50Hz".

      Yes, this was another typo, thank you. We have corrected it (p. 19): “… every 0.5 Hz interval from 0-50 Hz.”

      Reviewer #2 (Recommendations for authors):

      (R2.6) (Page 3) When referring to the "limited" hypothesis of local PE, please clarify in what sense is it limited. That statement is unclear.

      Thank you for pointing out this text, which we now see is ambiguous. We originally use "limited" to refer to the hypothesis's constrained scope – namely, that PE is relevant to various low-level operations (e.g., sensory processing or rewards) but the minimization of PE does not guide more abstract cognitive processes. We edited this part of the Introduction to be clearer (p. 3)

      “It is generally agreed that the brain uses PE mechanisms at neuronal or regional levels,[15,16] and this idea has been useful in various low-level functional domains, including early vision [15] and dopaminergic reward processing.[17] Some theorists have further argued that PE propagates through perceptual pathways and can elicit downstream cognitive processes to minimize PE.”

      (R2.7) (Page 5) "Few prior PE have targeted PE"... this statement appears contradictory. Please clarify.

      Sorry about this typo, which we have corrected (p. 5):

      “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R2.8) What happened to the data of the medium PE condition in Study 1A?

      The medium PE condition data were not excluded. We modeled the effect of prediction error on connectivity using a linear regression across the three conditions, coding them as a continuous variable (Low = -1, Medium = 0, High = +1). This approach allowed us to identify brain connections that showed a linear increase or decrease in strength as a function of increasing PE. This linear contrast is a more specific and powerful way to isolate PErelated effects than a High vs. Low contrast. We updated the Results slightly to make this clearer (p. 8-9):

      “In the fMRI data, we compared the three PE conditions’ beta-series functional connectivity, aiming to identify network-level signatures of PE processing, from low to high. […] For the modularity analysis, we first defined a connectome matrix of beta values, wherein each edge’s value was the slope of a regression predicting that edge’s strength from PE (coded as Low = -1, Medium = 0, High = +1; Figure 2A).”

      (R2.9) (Page 15) The point about how the dots in 6H follow those in 6J better than those in 6I is a little subjective - can the authors provide an objective measure?

      Thank you for pointing out this issue. The visual comparison using Figure 6 was not meant as a formal analysis but rather to provide intuition. However, as the reviewer describes, this is difficult to convey. Our formal analysis is provided in Supplemental Materials 5, where we report correlation coefficients between a very large number of simulated fMRI data points and EEG data points corresponding to different frequencies. We updated this part of the Results to convey this (p. 16-17):

      “Notice how the dots in Figure 6H follow the dots in Figure 6J (3 Hz) better than the dots in Figure 6I (0.5 Hz) or Figure 6K (10 Hz); this visual comparison is intended for illustrative purposes only, and quantitative analyses are provided in Supplemental Materials 5.”

      References

      (1) Zalesky, A., Fornito, A. & Bullmore, E. T. Network-based statistic: identifying differences in brain networks. Neuroimage 53, 1197–1207 (2010)

      (2) Strijkstra, A. M., Beersma, D. G., Drayer, B., Halbesma, N. & Daan, S. Subjective sleepiness correlates negatively with global alpha (8–12 Hz) and positively with central frontal theta (4–8 Hz) frequencies in the human resting awake electroencephalogram. Neuroscience letters 340, 17–20 (2003).

      (3) Cavanagh, J. F. & Frank, M. J. Frontal theta as a mechanism for cognitive control. Trends in cognitive sciences 18, 414–421 (2014).

      (4) Grech, R. et al. Review on solving the inverse problem in EEG source analysis. Journal of neuroengineering and rehabilitation 5, 25 (2008)

      (5) Palva, J. M. et al. Ghost interactions in MEG/EEG source space: A note of caution on inter-areal coupling measures. Neuroimage 173, 632–643 (2018).

      (6) Koles, Z. J. Trends in EEG source localization. Electroencephalography and clinical Neurophysiology 106, 127–137 (1998).

      (7) Rosenberg, M. D. et al. A neuromarker of sustained attention from whole-brain functional connectivity. Nature neuroscience 19, 165–171 (2016).

      (8) Goodale, S. E. et al. fMRI-based detection of alertness predicts behavioral response variability. elife 10, e62376 (2021).

      (9) Cavanagh, J. F. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216 (2015)

      (10) Hoy, C. W., Steiner, S. C. & Knight, R. T. Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG. Communications Biology 4, 910 (2021).

      (11) Neo, P. S.-H., Shadli, S. M., McNaughton, N. & Sellbom, M. Midfrontal theta reactivity to conflict and error are linked to externalizing and internalizing respectively. Personality neuroscience 7, e8 (2024).

      (12) Friston, K. J. The free-energy principle: a unified brain theory? Nature reviews neuroscience 11, 127–138 (2010)

      (13) Feldman, H. & Friston, K. J. Attention, uncertainty, and free-energy. Frontiers in human neuroscience 4, 215 (2010).

      (14) Friston, K. J. et al. Active inference and epistemic value. Cognitive neuroscience 6, 187–214 (2015).

      (15) Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptive-field effects. Nature neuroscience 2, 79–87 (1999)

      (16) Walsh, K. S., McGovern, D. P., Clark, A. & O’Connell, R. G. Evaluating the neurophysiological evidence for predictive processing as a model of perception. Annals of the new York Academy of Sciences 1464, 242– 268 (2020)

      (17) Niv, Y. & Schoenbaum, G. Dialogues on prediction errors. Trends in cognitive sciences 12, 265–272 (2008).

      (18) Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 289–300 (1995).

    1. eLife Assessment

      This manuscript provides evidence that mouse germline cysts develop an asymmetric Golgi, ER, and microtubule-associated structure that resembles the fusome in Drosophila germline cysts. This fundamental study provides new evidence that fusome-like structures exist in germ cell cysts across species. Overall, the data are convincing and represent a significant advance in our understanding of germ cell biology.

    2. Reviewer #2 (Public review):

      This study identifies Visham, an asymmetric structure in developing mouse cysts resembling the Drosophila fusome, an organelle crucial for oocyte determination. Using immunofluorescence, electron microscopy, 3D reconstruction, and lineage labeling, the authors show that primordial germ cells (PGCs) and cysts, but not somatic cells, contain an EMA-rich, branching structure that they named Visham, which remains unbranched in male cysts. Visham accumulates in regions enriched in intercellular bridges, forming clusters reminiscent of fusome "rosettes." It is enriched in Golgi and endosomal vesicles and partially overlaps with the ER. During cell division, Visham localizes near centrosomes in interphase and early metaphase, disperses during metaphase, and reassembles at spindle poles during telophase before becoming asymmetric. Microtubule depolymerization disrupts its formation.

      Cyst fragmentation is shown to be non-random, correlating with microtubule gaps. The authors propose that 8-cell (or larger) cysts fragment into 6-cell and 2-cell cysts. Analysis of Pard3 (the mouse ortholog of Par3/Baz) reveals its colocalization with Visham during cyst asymmetry, suggesting that mammalian oocyte polarization depends on a conserved system involving Par genes, cyst formation, and a fusome-like structure.

      Transcriptomic profiling identifies genes linked to pluripotency and the unfolded protein response (UPR) during cyst formation and meiosis, supported by protein-level reporters monitoring Xbp1 splicing and 20S proteasome activity. Visham persists in meiotic germ cells at stage E17.5 and is later transferred to the oocyte at E18.5 along with mitochondria and Golgi vesicles, implicating it in organelle rejuvenation. In Dazl mutants, cysts form, but Visham dynamics, polarity, rejuvenation, and oocyte production are disrupted, highlighting its potential role in germ cell development.

      Overall, this is an interesting and comprehensive study of a conserved structure in the germline cells of both invertebrate and vertebrate species. Investigating these early stages of germ cell development in mice is particularly challenging. Although primarily descriptive, the study represents a remarkable technical achievement. The images are generally convincing, with only a few exceptions.

      Major comments:

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title "Mouse germline cysts contain a fusome-like structure that mediates oocyte development":

      The term "mediates" could be misleading, as the functional data on Visham (based on comparing its absence to wild-type) actually reflects either a microtubule defect or a Dazl mutant context. There is no specific loss-of-function of visham only.

      (1b) Result title, "Visham overlaps centrosomes and moves on microtubules":

      The term "moves" implies dynamic behavior, which would require live imaging data that are not described in the article.

      (1c) Result title, "Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation":

      The presented data show that the presence of Visham in the cyst coincides temporally with the expression and activity of the UPR response; the term "associates" is unclear in this context.

      (1d) Result title, "Visham participates in organelle rejuvenation during meiosis":

      The term "participates" suggests that Visham is required for this process, whereas the conclusion is actually drawn from the Dazl mutant context, not a specific loss-of-function of visham only.

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      Comments on revisions:

      The revised manuscript has been clearly improved, and the authors have addressed all of our comments. I would like to point out two minor issues:

      (1) As suggested by the reviewers, the authors now use the term fusome instead of visham. However, they also acknowledge that this structure lacks many components of the Drosophila fusome. It may therefore be more appropriate to refer to it as a "mouse fusome" or as a "fusome-like structure (FLS)," as used in Xenopus.

      (2) I agree with Reviewer 3 that co-localization between EMA and acTubulin on still images does not convincingly demonstrate that fusome vesicles move along microtubules (Figure S2E).

    3. Reviewer #3 (Public review):

      The manuscript provides evidence that mice have a fusome, a conserved structure most well studied in Drosophila that is important for oocyte specification. Overall, a myriad of evidence is presented demonstrating the existence of a mouse fusome. This work is important as it addresses a long-standing question in the field of whether mice have fusomes and sheds light on how oocytes are specified in mammals.

      Comments on revisions:

      Overall, the authors did a good job of responding to reviewer comments that have improved the manuscript by including higher quality microscope images, revising text for clarity and using the term mouse fusome instead of using a new term. However, two of the headings in the results section that didn't correspond to the data presented in that section still have not been revised eventhough the authors stated that they were revised in their response to reviewer comments. The heading of the first section of the results is: "PGCs contain a Golgi-rich structure known as the EMA granule" even though no evidence in that section shows it is Golgi rich. The heading of the fifth section of the results is: "The mouse fusome associates with polarity and microtubule genes including pard3" however, only evidence for pard3 is presented.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Summary

      We thank the reviewer for the constructive and thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential implications of our findings regarding UPR activation and proteasome activity in germ cells.

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc. Is this a normal way to present fluorescent microscopy?

      The apparent saturation was not present in the original images, but likely arose from image compression during PDF generation. While the EMA granule was still apparent, in the revised submission, we will provide high-resolution TIFF files to ensure accurate representation of fluorescence intensity and will carefully optimize image display settings to avoid any saturation artifacts.

      (2) The authors should ensure that all claims regarding enrichment/lower vs. lower values have indicated statistical tests.

      We fully agree. In the revised version, we will correct any quantitative comparisons where statistical tests were not already indicated, with a clear statement of the statistical tests used, including p-values in figure legends and text.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs. 6 cyst numbers?

      We acknowledge that the description was not sufficiently detailed. Indeed, the test was not between 2 vs 6 cyst numbers, but between all possible ways 8-cell cysts or the larger cysts studied could fragment randomly into two pieces, and produce by chance 6-cell cysts in 13 of 15 observed examples. We will expand the legend and main text to clarify that a binomial test was used to determine that the proportion of cysts producing 6-cell fragments differed very significantly from chance.

      Revised text:

      “A binomial test was used to assess whether the observed frequency of 6-cell cyst products differed from random cyst breakage. Production of 6-cell cysts was strongly preferred (13/15 cysts; ****p < 0.0001).”

      (b) Figures 4d and 4e do not have a statistical test indicated.

      We will include the specific statistical test used and report the corresponding p-values directly in the figure legends.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      We will revise the abstract to better clarify the findings of this study. We will also replace the term Visham with mouse fusome to reflect its functional and structural analogy to the Drosophila and Xenopus fusomes, making the narrative more coherent and conclusive.

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      We appreciate this observation of a problem that occurred inadvertently when shortening an earlier version.  We will add 3–4 relevant references to appropriately support this section.

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      We will ensure that all acronyms are spelled out at first mention (e.g., Unfolded Protein Response (UPR), Endosome and Golgi-Associated Degradation (EGAD)).

      (6) The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      We are not jumping between topics, but following a narrative relevant to the central question of whether female mouse germ cells develop using a fusome.  EMA, microtubule fragmentation, polarization proteins, ER, and balbiani body are all topics with a known connection to fusomes. This is explained in the general introduction and in relevant subsections. We appreciate this feedback that further explanations of these connections would be helpful. In the revised manuscript, use of the unified term mouse fusome will also help connect the narrative across sections.  UPR/ERAD/EGAD are processes that have been studied in repair and maintenance of somatic cells and in yeast meiosis.  We show that the major regulator XbpI is found in the fusome, and that the fusome and these rejuvenation pathway genes are expressed and maintained throughout oogenesis, rather than only during limited late stages as suggested in previous literature.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

      We believe this statement is accurate given the broad scope of the term "participates." It is supported by localization of the UPR regulator XbpI to the fusome. XbpI is the ortholog of HacI a key gene mediating UPR-mediated rejuvenation during yeast meiosis.  We also showed that rejuvenation pathway genes are expressed throughout most of meiosis (not previously known) and expanded cytological evidence of stage-specific organelle rejuvenation later in meiosis, such as mitochondrial-ER docking, in regions enriched in fusome antigens. However, we recognize the current limitations of this evidence in the mouse, and want to appropriately convey this, without going to what we believe would be an unjustified extreme of saying there is no evidence.

      Reviewer #2 (Public review):

      We thank the reviewer for the comprehensive summary and for highlighting both the technical achievement and biological relevance of our study. We greatly appreciate the thoughtful suggestions that have helped us refine our presentation and terminology.

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title “Mouse germline cysts contain a fusome-like structure that mediates oocyte development”

      We will change the statement to: “Mouse germline cysts contain a fusome that supports germline cyst polarity and rejuvenation.”

      (1b) Result title “Visham overlaps centrosomes and moves on microtubules”

      We acknowledge that “moves” implies dynamics. We will include additional supplementary images showing small vesicular components of the mouse fusome on spindle-derived microtubule tracks.

      (1c) Result title “Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation”

      We will revise this title to: “The mouse fusome associates with the UPR regulatory protein Xbp1 beginning at the onset of cyst formation” to reflect the specific UPR protein that was immunolocalized.

      (1d) Result title “Visham participates in organelle rejuvenation during meiosis”

      We will revise this to: “The mouse fusome persists during organelle rejuvenation in meiosis.”

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      We appreciate the reviewer’s insightful comment. To maintain conceptual clarity and align with existing literature, we will refer to the structure as the mouse fusome throughout the manuscript, avoiding introduction of a new term.

      Reviewer #3 (Public review):

      We thank the reviewer for emphasizing the importance of our study and for providing constructive feedback that will help us clarify and strengthen our conclusions.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule"

      We agree that the enrichment of Golgi within the EMA PGCs was not shown until the next section. We will revise this heading to:

      “PGCs contain an asymmetric EMA granule.” 

      (2) Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      We will clarify that this identification is based on co-localization with Golgi markers (GM130 and GS28) and response to Brefeldin A treatment, which will be included as supplementary data. These findings support that the mouse fusome is Golgi-derived and can therefore be visualized by EM. The Golgi regions in E13.5 cyst cells move close together and associate with ring canals as visualized by EM (Figure 1E), the same as the mouse fusomes identified by EMA.

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      Space is at a premium in these figures, but we have no limitation on data documenting this absolutely clear co-localization. We will replace the existing images with high-resolution, noncompressed versions for the final figures to clearly illustrate the co-staining patterns for GM130 and Rab11a1.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      We will include additional supplementary data showing small mouse fusome vesicles aligned along microtubules.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      We agree and will revise the heading to: “Mouse fusome associates with the polarity protein Pard3.” We are adding data showing association of small fusome vesicles on microtubules.

      (6) Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      UPR genes are not upregulated in DAZ in the sense we have never documented them increasing. We show that UPR genes during this time behave like pleuripotency genes and normally decline, but in DAZ mutants their decline is slowed.  We will rephrase the paragraph to clarify that Dazl mutation partially decouples developmental processes that are normally linked, which alters UPR gene expression relative to cyst development.

      (7) Line 257-259-wave 1 and 2 follicles need to be explained in the introduction, and how these fits with the observations here clarified.

      Follicle waves are too small a focus of the current study to explain in the introduction, but we will request readers to refer to the cited relevant literature (Yin and Spradling, 2025) for further details.

      We sincerely thank all reviewers for their insightful and constructive feedback. We believe that the planned revisions—particularly the refined terminology, improved image quality, clarified statistics, and restructured abstract—will substantially strengthen the manuscript and enhance clarity for readers.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1E: need to use some immuno-gold staining to identify the Visham. Just circling an area of cytoplasm that contains ER between germ cell pairs is not enough.

      We appreciate the reviewer’s insistence that the association between the mouse fusome and Golgi be clearly demonstrated. However, the EMA granule is a large structure discovered and defined by light microscopy, and presents no inherent challenge to documenting its Golgi association by immunofluorescence experiments, which we presented and now further strengthened as described in the next paragraph.  We believe that the suggested EM experiment would add little to the EM we already presented (Figure 1E, E')  Moreover, due to facility limitations, we are currently unable to perform immunogold staining. 

      To strengthen previous immunolocalization experiments, we have now included additional immunostaining data showing the clear colocalization of the fusome region with the Golgi markers GM130 and GS28 (Figure S1H). We have also incorporated a new experiment using the Golgi-specific inhibitor Brefeldin A (BFA) see Figure S1I.  Treatment of in vitro–cultured gonads with BFA, disrupted EMA granule formation, demonstrating that EMA granules not only associate with Golgi, but require Golgi function to to be maintained.

      Additionally, in Figure 2, we showed that the fusome overlaps with the peri-centriolar region—a characteristic locus for Golgi due to its movement on microtubules.  We showed that the dynamic behavior of the fusome during the cell cycle, parallels Golgi dispersal and reassembly, and all these facts provide further strong support for the Golgi-association of the EMA granule and fusome.

      (2) Figure 1F: is this image compressed?

      We have now substituted the image in Figure 1F with a better image and have avoided the compression of the image. 

      (3) In the figure legends, are the sample sizes individual animals or individual sections? Please ensure that all figure legends for each figure panel consistently contain the sample size.

      We have now included the number of measurements (N) in every figure legend. Each experiment was performed using samples from at least three different animals, and in most cases from more than three. This information has also been added to the Methods section under Statistics. In addition, N values are now consistently provided for each graph throughout the figures.

      (4) Figure 2b/c: seemly likely based on the snapshot of different stages of cytokinesis that the "newly formed" visham is accurate, but without live imaging, this claim of "newly formed" is putative/speculative. It is OK if it is labeled as "putative" in the figure panel.  

      The behavior of the Drosophila fusome during mitosis was deduced without live imaging (deCuevas et al. 1998). We clarified that the conversion of a single mouse germ cell with one round fusome to an interconnected pair of cells with two round fusomes of greater total volume following mitosis is the basis for deducing that new fusome formation occurs each cell cycle. However, we agree with the reviewer that the phrase "newly formed" in the original label on Figure 2c suggested a specific mechanism of fusome increase that was not intended and this phrase has been removed entirely.  

      (5) Figure 2e/e is extremely difficult to follow. In order to improve the readability of these figure panels, can individual panels with a single stain be shown? The 'gap' between YFP+ sister cells is not immediately obvious in panel e or e" with the current layout. Since this is a key aspect of the author's claim about cleavage of the cyst, it would be best to make this claim more robust by showing more convincing images. In Figure 2E, the staining pattern of EMA needs to be clarified and described more fully in the text.

      We mapped discontinuities in the microtubule connections, not the fusome or YFP.  YFP is the lineage marker indicating that the cells of a single cyst are being studied. Consequently, no gap between YFP cytoplasmic expression is expected because only in the last example (figure E”), has fragmentation already occurred (and here there is a YFP gap).  The acetylated tubulin gap proceeds fragmentation.  The mitotic spindle remnants labeled by AcTub link the cells into two groups separated by a gap, which is clearly shown in the data images and in the third column where only the relevant AcTub from the cyst itself is shown. In response to the reviewers question about the fusome, which is not directly relevant to fragmentation, we have now provided images of the separate fusome channel and corresponding measurements for all three Figure 2E-E'' cysts in the supplementary Figure S4H. We have improved the text regarding this important figure to try and make it easier to follow, and also added a new example of a 10-cell cyst also in S2H (lower panels).  We also added, movies allowing full 3D study of one of the 8 cell cysts and the new 10-cell cyst.  I also suggest that the reviewer examine how the deduced mechanism of fragmentation explains previously published but not fully understood data on cyst fragmentation going back to 1998 as described in the expanded Discussion on this topic.  

      (6) It would be best to support the proposed model in Figure 2G (4+4+4) with microscopy images of a 12-cell or 16-cell cyst? Would these 12-cell or 16-cell cysts be too large to technically recover in a section?

      Unfortunately the reviewer 's suggestion that 12- or 16-cell cysts are too large to recover and present convincingly is correct. Because our analysis depends on capturing lineage-labeled cysts specifically at telophase with acetylated-tubulin connections, the likelihood of obtaining the correct stage is very low.  In addition, the dense packing of germ cells in the mouse gonad further limits our ability to fully reconstruct all the cells in large cysts, with difficulty increasing as cyst size grows.

      However, as noted, we added a well-resolved 10-cell cyst—the largest size we could confidently analyze—in a 3D video in Supplementary Figure S2H (lower panel), which shows a 6 + 4 breakage pattern.

      (7) We did not find a reference in the text for Figure 2G.

      We have now provided reference for 2G in the text and in the discussion section. 

      (8) Line 189: ERAD is used as an acronym, but is not defined until the discussion.

      We have now provided full form of acronym at its first usage in the text.

      (9) Fig 3i/i': the increase of UPR pathway components, increasing expression during zygotene, is interesting to note, but is not commented enough in the text of the paper.

      We have discussed this issue in the discussion section with specific reference to figure 3I. Please find the detailed discussion under the heading “Germ cell rejuvenation is highly active during cyst formation.”

      (10) Please quantify DNMT3A expression levels in WT control vs Dazl KO germ cells in Figure 4a.

      We have now quantified DNMT3A expression levels in WT control vs Dazl KO germ cells and have added the data in the Figure 4A.

      (11) Please introduce the rationale behind selecting DazL KO for studying cyst formation (text in line 197). This comes out of nowhere.

      True.  We significantly expanded our discussion of Dazl and citations of previous work, including evidence that it can affect cyst structures like ring canals, in the Introduction.  

      (12) It would be best to stain WT control vs DazL KO oogonia in Figure 4a with 5mC antibodies to support their claim that DNA methylation might be affected in the mutants.

      We respectfully disagree that this additional experiment is necessary within the scope of the current study. At the developmental stage examined (E12.5), germ cells in the Dazl mutant are clearly in an arrested and hypomethylated state, as supported by previous evidence (Haston et al. 2009).This initial experiments was designed to show that in our hands Dazl mutants show this known pkuripotency delay. However, the effects of Dazl mutation on female germline cyst development as it relates to polarity or the fusome was not studied before, and that is what the paper addresses, building on previous work.

      Because our study does not focus on germ-cell epigenetic modifications but rather on the consequences of Dazl loss on germ cell cyst development, adding 5mC immunostaining would not substantially advance the main conclusions. The existing data and previous published work already provide sufficient background.

      (13) Figure 4c: a very interesting figure, it would be best to quantify developmental pseudotime (perhaps using monocle3 analysis) and compare more rigorously the developmental stage of WT control vs DazL KO.

      Developmental pseudotime, such as through Monocle3 analysis, might sometimes be valuable but involves assumptions that when possible are better addressed by direct experimental examination. Our conclusions regarding cyst developmental stage are supported by straightforward evidence rather to which computational trajectory inference would add little. Specifically, we have performed analysis of germ-cell methylation state, ring canal formation, pluripotency markers, UPR pathway activity assay (Xbp1 and Proteomic assay), Golgi-stress analysis and Pard3 which collectively document the developmental status of the WT and Dazl KO germ cells. These empirical data demonstrate the same developmental pattern reflected in Figure 4c, making the less reliable pseudotime-based computational method superfluous.

      (14) Figure 4d has two panels labeled as "d".

      We have now corrected the labelling of the figure

      (15) Color coding in 4d, d', d" is confusing; please harmonize some visual presentation here.

      We have now harmonized the visual representation of all the graph in figure 4

      (16) Fig 4e' is labeled as DazL +/- but is this really a typo?

      Thank you for pointing it out. We have now corrected the typo

      (17) Figure F': typo labeled as E3.5, which is E13.5?

      Thank you for pointing it out. We have now corrected the typo

      (18) Figure F': was DazL KO mutant but no WT control.

      The WT control was not provided to avoid the redundancy. Please refer to earlier figure 3A-B, Fig S3C and D and videos S3A and S3b to refer to WT control at every stage.

      (19) Figure G: unusual choice in punctuation marks for cartoon schematic. No key to guide the reader for color-coded structures would be helpful to have something similar to 4h.

      We have now provided the key to guide the readers in the mentioned figure 4G.

      (20) The authors use WGA and EMA as interchangeable markers (Figure 5a) without fully explaining why they have switched markers.

      Because it is germ cell specific, we used EMA as a fusome marker during the time when it is found up through E13.5.  After that point we used WGA which is still usable, but also labels somatic cells.  This rationale is explicitly described at the end of the section “Fusome is highly enriched in Golgi and vesicles”, where we state:

      “EMA staining disappears from germ cells at E14.5 (Figure 1I). However, very similar (but non–germ-cell-specific) staining continued with wheat germ agglutinin (WGA) at later stages (Figure 1G, G’; Figure S1G).”

      To ensure this is fully clear to readers, we have now added an additional statement in the start of the text section discussing the figure 5:

      “For the reasons explained previously (see text for Figure 1G), WGA was used as a fusome marker beyond stage E14.5.”

      (21) Figure 5b' is compressed.

      We have now decompressed the image

      (22) Line 267, Balbiani body is misspelled.  

      We have now corrected the spelling.

      (23) The explanation of why the authors switch focus from DazL KO to DazL +/- is not adequately described. The authors should also explain the phenotype of the DazL +/- animals or reference a paper citing the hets are sterile or subfertile.

      We have now added the explanation of why Dazl KO is used in our introduction section where we have mentioned the phenotype of Dazl homozygous and heterozygous mouse.

      (24) Is Figure 5i actually DazL +/-? It is not labeled clearly in the text, the figure legend, or the figure panel. 

      We have now labelled the figure correctly in figure and in the legend.

      (25) The paper ends abruptly at line 275 with no context or summary.

      The manuscript does not end at line 275; the apparent interruption is due to a page break occurring immediately before the beginning of the Discussion section. We hope that continuation is fully visible in the reviewer 1 (your) version of the PDF.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 93: Fig. 1B: DDX4 marks germ cells; do all the red and yellow cells in the NE inset originate from the same PGC? There are only 2 cells marked in yellow among the group of red cells. Is it a z-projection issue? Or do they come from different PGCs?

      This experiment used vasa staining to identify all germ cells, which are produced by multiple PGCs. Green labeling is a lineage marker derived from a single PGC (due to the low frequency of tamoxifen-activated labeling). Consequently, the two yellow cells observed in the NE inset of Fig. 1B represent YFP-labeled germ cells (YFP + DDX4 double-positive) that have arisen from a single, lineage-traced PGC. This approach, introduced in 2013, is described in the Methods, and represents the field's single largest technical advance that has made it possible to analyze mouse germ cell development at single cell resolution.

      To ensure clarity, we have added a brief explanatory note to the figure legend indicating that yellow cells represent the lineage-traced progeny of a single PGC, while the red staining marks all germ cells.

      (2) Line 96: Figure 1C vs 1C'. The difference between female and male Visham is not obvious, although quantification shows a clear difference. How was the quantification made? Manual or automatic thresholding? Would it be possible to show only the Visham channel?

      We thank the reviewer for pointing out this problem. We have now more clearly described in the text that the female fusome increases in some cells with close attachments to other cells (future oocytes) and decreases in distant nurse cells.  It branches due to rosette formation..  In males, the fusome remains much like the initial EMA granules present in early germ cells, with only fine and difficult to see connections.  The quantification shown in Figures 1C and 1C′ was performed manually, based on the presence of either (i) fused, branched EMA-positive fusome structures or (ii) dispersed, punctate EMA granules. This assessment was carried out across multiple E13.5 male and female gonad samples to ensure robustness.  To facilitate independent evaluation, we have already provided supplementary videos S3B1 and S3B2, which display the EMA-stained E13.5 male and female gonads in three dimensions. These videos allow the structural differences to be examined more clearly than in static images.

      In response to the reviewer’s request, we now additionally include the single-channel fusome image in Supplementary Figure S1E′. This presentation highlights the fusome signal alone and further clarifies the morphological differences underlying the quantification.

      (3) L118: Figure 2A, third row = 2-cell cyst? Please specify PCNT in the legend.

      We appreciate the reviewer’s observation. In Figure 2A (third row), the cells were not specifically labeled as a 2-cell cyst; rather, the intention was to illustrate the presence of two distinct centrosomes positioned on a fused fusome structure, a configuration we frequently observe.

      We have now updated the figure legend to explicitly define PCNT.

      (4) L169: Missing reference to S3B and video S3B1?

      We have now included the reference to S3B1 and S3B2 in the text and in the legend

      (5) L170: Please describe the graph in the Figure 3D legend.

      We have now described the Graph in the legend

      (6) L171: Would it be possible to have a close-up showing both Pard3 and Visham in a ringlike pattern related to RACGAP (RC) staining? The images are too small.

      It is difficult to capture this relationship perfectly in a two dimensional picture. The images represent the maximum close-up possible that still includes enough relevant area for the necessary conclusions. We have now provided additional three close-up images exclusively for ring-canal and Pard3 association in the supplementary Figure S3C for further clarity. However, we also note that the quality of the image permits the reader of a pdf to zoom and to visualize the images in great detail.

      (7) L181: Wrong reference, should be 3 then 3I.

      Thank you for pointing it out, we have now corrected the reference.

      (8) L199: In Figure S4B, was DNMT3 staining quantified? Red intensity differs globally between images; use the somatic red level as a reference? Note: EMA seems higher in Dazl- vs. WT?

      We have now performed quantification of DNMT3 staining, which is presented in Figure 4A. While the red intensity (DNMT3 or EMA) can appear to differ between images, this variation can result from biological differences between tissues or minor technical variability despite using consistent microscope settings. To account for this, we normalized the staining intensity using the somatic cell signal as an internal reference, ensuring that the quantification reflects genuine differences between WT and Dazl-/- samples rather than global intensity variation.

      (9) L229: Should be "proteasome."

      We have now corrected the spelling error.

      (10) L233: Quantify fragmentation of Gs28? EMA doesn't seem affected. Could you quantify both Gs28 and EMA? Images are too small.

      We thank the reviewer for this suggestion. While the current images are small, they can be examined in detail using zoom to visualize the structures clearly. As noted, EMA staining is not affected, (we agree) as cells are in arrested state. This arrested state creates stress on Golgi. The fragmentation of Gs28-labeled Golgi membranes is a classical indicator of Golgi stress, even though the fragmented membranes may remain functionally active. Our results show that Dazl deletion specifically affects Golgi in germ cells, while Golgi in neighboring somatic cells appears healthy. To quantify this effect, we have now included manual quantification of Golgi fragmentation in Figure 4F, assessing tissues for the presence of fragmented versus intact Golgi structures. This confirms that Golgi fragmentation is a germ cell–specific phenotype in Dazl– samples, while pre-formed EMA-positive fusomes remain unaffected but probably in arrested state.

      (11) L237: Figure 4F graph shows E3.5, not E13.5.

      We have now corrected the typo in the figure 

      (12) L257: Figure 5D: quantify as in 5A? overlap?

      Yes, it's an overlap and shown as two separate image with ring canal for better clarity. We have now quantified the image and have produced combined graph for fusome and pard3 in Figure 5A graph.

      (13) L261: Figure 5E-E': black arrowhead not mentioned in legend.

      We have now mentioned the black arrowhead in the legend

      (14) L262: Figure 5C: arrowhead not mentioned in legend. Figure 5F: oocyte appears separated from nurse cells compared to 5C.

      Yes, that may happen as cysts undergo fragmentation; what matters is all cells are lineage labelled and hence are members of a single cyst derived from one PGC.

      (15) L263: Figure 5G has no legend reference; nurse cells are not outlined as in 5C.

      We have now outlined the nurse cells and have added the reference to the graph in the legend.

      (16) L279: "The fusome and Visham and both..." should be replaced with "Both fusome and Visham...".

      We have now replaced the term Visham with fusome as suggested by reviewers and editor.  We updated the statement to correct the grammatical error.

      (17) L1127: Video S3B1: It is unclear what to focus on.

      We have now added the Rectangle area and arrow to highlight what to focus on

      (18) L1128: Video "S3B1" should be "S3B2."

      We have now corrected the legend

      (19) Finally: curiosity question: have the authors tried to use known markers of the Drosophila fusome in mice, such as Spectrin or other markers described in Lighthouse, Buszczak and Spradling, Dev Bio, 2008? And conversely, do EMA and WGA label the fusome in Drosophila?

      Yes, we and others used the most specific markers of the Drosophila fusome such alpha-spectrin, adducin-like Hts, tropomodulin, etc. to search for fusomes in vertebrate species. It was unsuccessful in clarifying the situation, because Hts and alpha-spectrin in Drosophila and other insects generate a protein skeleton that stabilizes the fusome and is easily stained. But this structure is simply not conserved in vertebrates. The polarity behavior of the fusome, it core developmental property, is conserved, however. The mammalian fusome still acquires and maintains cyst polarity, and goes even farther and reflects both initial cyst formation and cyst cleavage, before marking oocyte vs nurse cell development in the smaller cysts.  Expression of the inner microtubule-rich portion of the fusome, its Par proteins, and many ER-related and lysosomal fusome proteins are mostly conserved but their ability to mark the fusome alone varies with time and context (only some of the examples are shown in Figure 3I'). Nearly all of the proteins identified in Lighthouse et al. 2008 are expressed.  These proteins may be involved in rejuvenation as studied here.  We modified the first section of the Discussion to explicitly compare mouse, Xenopus and Drosophila fusomes, which was not possible before this work.  

      Reviewer #3 (Recommendations for the authors):

      The authors should either revise the conclusions or add additional evidence to support their claims. In addition, minor corrections are listed below.

      We have added additional evidence as noted in responses above, and revised some claims that were stated inaccurately.  In addition, we have attempted to clarify the evidence we do present, so that its full significance is more easily grasped by readers.    

      (1) Lines 20-21 are unclear - the cyst doesn't get sent into meiosis, each oocyte does.

      Research is showing that it's more complicated than that.  All cyst cells enter "pre-meiotic S phase", and most cell cycles are conventionally considered to start after the previous M phase-

      i.e. in G1 or S, not in the next prophase, an ancient view limited just to meiosis. Absent this old tradition from meiosis cytology, pre-meiotic S would just be called meiotic S as some workers on meiosis do.  In addition, in different species, nurse cells diverge from meiosis on different schedules, including many much later in the meiotic cycle.  Two cyst cells in Drosophila fully enter meiosis by all criteria, the oocyte and one nurse cell that only exits in late zygotene.  In Xenopus and mouse, scRNAseq shows that many cyst cells enter meiosis up to leptotene and zygotene, including nurse cells that specifically downregulate meiotic genes during this time, possibly to assist their nurse cell functions, while others remain in meiosis even longer (Davidian and Spradling, 2025; Niu and Spradling, 2022). Eventually, only the oocytes within each fragmented mouse cyst complete meiosis. 

      (2) Many places in the manuscript abbreviations are never defined or not defined the first time they are used (but the second or third time): Line 23-ER, Line 29-UPR, Line 33-PGC (not defined until line 45), Line 79-EGAD.

      We have defined full acronyms now upon their first occurrence.

      (3) Line 5 should be the pachytene substage of meiosis I.

      We have now updated the statement to “In pachytene stage of meiosis I…”

      (4) Line 59-61 - this statement needs a reference(s).

      These statements are a continuation from the references cited in the previous statements. However, for further clarity we have again cited the relevant reference here (Niu and Spradling, 2022).

      (5) Line 80 - should it be oocyte proteome quality control?

      We have now updated the statement to “Oocyte proteome quality control begins early”.

      (6) Line 87 - in this case, EMA does not stand for epithelial membrane antigen (AI will call it that, but it is not correct). I believe it originally was the abbrev for (Em)bryonic (a)ntigen, though some papers call it (e)mbryonic (m)ouse (a)ntigen. And the reference here is Hahnel and Eddy, 1986, but in the reference list is a different paper, 1987 (both refer to EMA-1).

      We have now updated the acronym EMA-1 in corrected form and have corrected the citation.

      (7) Line 176 - RNA seq.

      We have now updated the statement to “We performed single cell RNA sequencing (scRNA seq) of mouse gonad”.

      (8) Line 181 - Figure 4E and 4I should be 3E and 3I.

      We have now updated the figure reference in the text to correct one.

      (9) Line 183 - missing period.

      Added.

    1. eLife Assessment

      This paper develops a fundamental theory that explains how the brain can hold in working memory not only the identity but also the order of presented stimuli. Previous theories did not explain the ability of people to immediately recall the correct order of the stimulus presentation. The authors present compelling evidence that this can be achieved through synaptic augmentation, an experimentally observed phenomenon with a time scale of tens of seconds.

    2. Reviewer #1 (Public review):

      Summary:

      The issue of how the brain can maintain serial order of presented items in working memory is a major unsolved question in cognitive neuroscience. It has been proposed that this serial order maintenance could be achieved thanks to periodic reactivations of different presented items at different phases of an oscillation, but the mechanisms by which this could be achieved by brain networks, as well as the mechanisms of read-out, are still unclear. In an influential 2008 paper, the authors have proposed a mechanism by which a recurrent network of neurons could maintain multiple items in working memory, thanks to `population spikes' of populations of neurons encoding for the different items, occurring at alternating times. These population spikes occur in a specific regime of the network and are a result of synaptic facilitation, an experimentally observed type of synaptic short-term dynamics with time scales of order hundreds of ms.

      In the present manuscript, the authors extend their model to include another type of experimentally observed short-term synaptic plasticity termed synaptic augmentation, that operates on longer time scales on the order of 10s. They show that while a network without augmentation loses information about serial order, augmentation provides a mechanism by which this order can be maintained in memory thanks to a temporal gradient of synaptic efficacies. The order can then be read out using a read-out network whose synapses are also endowed with synaptic augmentation. Interestingly, the read-out speed can be regulated using background inputs.

      Strengths:

      This is an elegant solution to the problem of serial order maintenance, that only relies on experimentally observed features of synapses. The model is consistent with a number of experimental observations in humans and monkeys. The paper will be of interest to the broad readership of eLife and I believe it will have a strong impact on the field.

      Comments on revisions:

      I am happy with how the authors have addressed my comments, and believe the paper can be published in its present form.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors present a model to explain how working memory (WM) encodes both existence and timing simultaneously using transient synaptic augmentation. A simple yet intriguing idea.

      The model presented here has the potential to explain what previous theories like 'active maintenance via attractors' and 'liquid state machine' do not, and describe how novel sequences are immediately stored in WM. Altogether, the topic is of great interest to those studying higher cognitive processes, and the conclusions the authors draw are certainly thought-provoking from an experimental perspective.

      Comments on revisions:

      The authors have done an excellent job of addressing the questions that I raised, and the manuscript is greatly improved - both in content and clarity. It is an insightful advance and I recommend publication.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.

      The choice of a minimal model to illustrate our hypothesis is deliberate. Our main goal was to suggest a physiologically-grounded mechanism to rapidly encode temporally-structured information (i.e., sequences of stimuli) in Working Memory, where none was available before. Indeed, as discussed in the manuscript, previous proposals were unsatisfactory in several respects. In view of our main goal, we believe that a spiking implementation is beyond the scope of the present work.

      We would like to note that the mechanism originally proposed in Mongillo et al. (2008), has been repeatedly implemented, by many different groups, in various spiking network models with different levels of biological realism (see, e.g., Lundquivst et al. (2016), for an especially ‘detailed’ implementation) and, in all cases, the relevant dynamics has been observed. We take this as an indication of ‘robustness’; the relevant network dynamics doesn’t critically depend on many implementation details and, importantly, this dynamics is qualitatively captured by a simple rate model (see, e.g., Mi et al. (2017)).

      In the present work, we make a relatively ‘minor’ (from a dynamical point of view) extension of the original model, i.e., we just add augmentation. Accordingly, we are fairly confident that a set of parameters for the augmentation dynamics can be found such that the spiking network behaves, qualitatively, as the rate model. A meaningful study, in our opinion, then would require extensively testing the (large) parameters’ space (different models of augmentation?) to see how the network behavior compares with the relevant experimental observations (which ones? Behavioral? Physiological?). As said above, we believe that this is beyond the scope of the present work.

      This being said, we definitely agree with the reviewer that not presenting a spiking implementation is a limitation of the present work. We have clearly acknowledged this limitation here, by adding the following paragraph to the Discussion.

      “To illustrate our theory in a simple setting, we used a minimal model network that neglects many physiological details. This, however, constitutes a limitation of the present study. It would be reassuring to see that the mechanism we propose here is robust enough to reliably operate also in spiking networks, in the presence of heterogeneity in both single-cell and synaptic properties. While we are fairly confident that this is the case, a spiking implementation of our model is beyond the scope of the present study and will be addressed in the future. Also, because of the simplicity of the model network, a comparison between the model behavior and the electrophysiological observations cannot be completely direct. Nevertheless the model qualitatively accounts for a diverse set of experimental data”.

      (2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.

      Direct experimental evidence (in monkeys) in support of the existence of highly synchronized events -- to be identified with the ‘population spikes’ of our model -- during the delay period of a memory task is available in the literature, i.e., Panichello et al. (2024). we provide a short discussion of the results of Panichello et al. (2024) and how these results directly relate to our model. We also provide a short discussion of the results of Liebe et al. (2025), which, again, are fully consistent with our model.

      We note that there is no fundamental contradiction between highly synchronized events in ‘small’ neural populations (e.g., a cell assembly) on one hand, and temporally irregular (i.e., Poisson-like) spiking at the single-neuron level and weakly synchronized activity at the network level, on the other hand. This was already illustrated in our original publication, i.e., Mongillo et al. (2008) (see, in particular, Fig. S2). We further note that the mechanism we propose to encode temporal order -- a temporal gradient in the synaptic efficacies brought about by synaptic augmentation -- would also work if the memory of the items is maintained by ‘tonic’ persistent activity (i.e., without highly synchronized events), provided this activity occurs at suitably low rates such as to prevent the saturation of the synaptic augmentation.

      We have added the following two paragraphs to the Discussion.

      “More direct support to this interpretation comes from recent electrophysiological studies [Panichello et al., 2024, Liebe et al., 2025]. By recording large neuronal populations (∼ 300) simultaneously in the prefrontal cortex of monkeys performing a WM task, [Panichello et al., 2024] found that, during the maintenance period, the decoding of the actively held item from neural activity was ’intermittent’; that is, decoding was only possible during short epochs (∼ 100ms) interleaved with epochs (also ∼ 100ms) where decoding was at chance level. The inability to decode resulted from a loss of selectivity at the population level, with a return of the single-neuron firing rates to their spontaneous (pre-stimulus) activity levels. The transitions between these two activity states (decodable/not-decodable) were coordinated across large populations of neurons in PFC. By recording single-neuron activity in the medial temporal lobe of humans performing a sequential multi-item WM task, [Liebe et al., 2025] found that during maintenance, neurons coding for a given item tended to fire at a specific phase of the underlying theta rhythm, again suggesting that the corresponding neuronal populations reactivate briefly and sequentially. In summary, these experimental results suggest that active memory maintenance relies on brief reactivations of the neural representations of the items, which we identify with the population spikes in our model, and that these reactivatations occur sequentially in time, as predicted by our theory”.

      “We note that the proposed mechanism would still work if the items were maintained by tonically-enhanced firing rates, instead of population spikes, provided that those firing rates were suitably low. However, obtaining low firing rates in model networks of persistent activity is quite difficult”.

      Reviewer #2 (Public review):

      The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?

      In the model, synaptic dynamics displays both short-term facilitation and augmentation (and shortterm depression). Indeed, synaptic facilitation, alone, would be too short-lived to encode novel sequences. This is illustrated in Fig. 1B.

      We provide a discussion of this important point, by adding the following paragraph to the Results section.

      “If augmentation was the only form of synaptic plasticity present in the network, the encoding of an item in WM would require long presentation times, or alternatively high firing rates upon presentation, precisely because K_A is small. Instead, rapid encoding is made possible by the presence of the short-term facilitation, which builds up significantly faster than augmentation, as U >> K_A . For the same reason, however, the level of facilitation rapidly reaches the steady state; therefore, short-term facilitation alone is unable to encode temporal order (see Fig. 1B). Thus, our model requires the existence of transitory synaptic enhancement on at least two time scales, such that longer decays are accompanied by slower build-ups. Intriguingly, this pattern is experimentally observed [Fisher et al., 1997]”.

      In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.

      We believe that this comment is related to the above point. The reviewer is correct; augmentation alone would require fairly long stimulus presentations to encode an item in WM. ‘Fast’ encoding, indeed, is guaranteed by the presence of short-term facilitation. This important point is emphasized; see above.

      In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?

      This is because of the slow build-up and decay of the augmentation. When the second item is prioritized, and the corresponding neuronal population re-activates, its augmentation level starts to increase. At the same time, as the first item is now de-prioritized and the corresponding neuronal population is now silent, its augmentation level starts to decrease. Because of the ‘slowness’ of both processes (i.e., augmentation build-up and decay), it takes about 5 seconds for the augmentation level of the second item to overcome the augmentation level of the first item.

      We note that the slow time scales of the augmentation dynamics, consistently with experimental observations, are necessary for our mechanism to work; see above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 46 identify -> identity.

      (2) Line 207 scale -> scales.

      Fixed. Thank you.

      (3) Lines 222-224 what about behavioral time-scale plasticity? This type of plasticity can apparently be induced very quickly.

      We have removed the corresponding paragraph.

      (4) Line 231 identification of `gamma bursts' with population spikes: These two phenomena seem to be very different - one can be weakly synchronized and can be consistent with highly irregular activity, while it is not clear whether the other can (see major issue 2). Also, it seems that population spikes occur at frequencies that are an order of magnitude lower than gamma.

      We have rewritten the corresponding paragraph and we rely now on more direct electrophysiological evidence (i.e., on the simultaneous recording of large neuronal populations) to identify putative population spikes; see above.

      Reviewer #2 (Recommendations for the authors):

      (1) On page 7, the behavioral study of Rose et al. (2016) is quite important for readers to understand the 'low-activity regime', and to fully appreciate Figure 4, it would be beneficial to explain that study in greater detail.

      We have added a panel to Fig. 4, and accompanying text in the caption, to better illustrate the main task events in the experiment of Rose et al. (2016).

      (2) Line 17: "wrong order", but wrong timing matters too

      Definitely, depending on the task. Specifically, in our example, timing is immaterial.

      (3) Line 33-34: "special training", what is considered special? One could argue that the number of trials needed to learn, depending on the TI timing, is special, depending on the task.

      We have removed the sentence as apparently it was confusing. We simply meant that ‘naive’ human subjects can perform the task (e.g., serial recall); that is, they didn’t undergo any kind of practice that can be construed as ‘training’.

      (4) Line 40-41: but timing is also part of working memory processing. Perhaps it can be merged with the next sentence.

      We have merged the two sentences.

      (5) Line 53: Is the implication here that what happens in the synapses is what drives WM, and not just that the neurons stay persistently on?

      Yes. The idea is that information can be maintained in the synaptic facilitation level, without enhanced spiking activity. Reading-out and refreshing the memory contents, however, requires neuronal activity. We explain this in some detail in the next paragraph (i.e., lines 60-65 in the revised submission).

      (6) Line 102: could a lack of excitatory activity be explained by inhibitory signaling? It appears the inhibitory component is quite understated here.

      Here we are just defining A-bar; according to Eq. (6), if r_a is 0 (i.e., no synaptic activity, for whatever reason), then A_a will converge to A-bar after a time much longer than \tau_A (i.e., a long period). We have rephrased the sentence to improve clarity.

      (7) Line 158-172: please consider revising this paragraph for a more general audience.

      We have rewritten this paragraph to improve clarity. For the same purpose, we have also slightly modified Fig. 3.

      (8) Line 227: it would seem this is due to a singular inhibitory group making the model highly dependent on the excitatory groups.

      We are not sure that we understand this comment. Here, we are just saying that if the item-coding populations don’t reactivate during the maintenance period (i.e., activity-silent regime) then the augmentation gradient cannot build up. If, on the other hand, the item-coding populations are constantly active at high rates during the maintenance period (i.e., persistent-activity regime) then then augmentation levels will rapidly saturate and, again, there will be no augmentation gradient. This is independent of how ‘silence’ or ‘activity’ of the item-coding populations is determined by the interplay of excitation and inhibition.

      (9) Line 284: this would certainly be an interesting take, but it isn't clear that the model proved this type of decoupling of the temporal aspect of the recall.

      This is an ‘educated’ speculation, based on the model and on a specific interpretation of some experimental results, as discussed in the paper and, in particular, in the last paragraph of the Discussion. We believe that the phrasing of the paragraph makes clear that this is, indeed, a speculation.

    1. eLife Assessment

      This valuable study a computational language model, i.e., HM-LSTM, to quantify the neural encoding of hierarchical linguistic information in speech, and addresses how hearing impairment affects neural encoding of speech. Overall the evidence for the findings is solid, although the evidence for different speech processing stages could be strengthened by a more rigorous temporal response function (TRF) analysis. The study is of potential interest to audiologists and researchers who are interested in the neural encoding of speech.

    2. Reviewer #1 (Public review):

      The authors relate a language model developed to predict whether a given sentence correctly followed another given sentence to EEG recordings in a novel way, showing receptive fields related to widely used TRFs. In these responses (or "regression results"), differences between representational levels are found, as well as differences between attended and unattended speech stimuli, and whether there is hearing loss. These differences are found per EEG channel.

      In addition to these novel regression results, which are apparently captured from the EEG specifically around the sentence stimulus offsets, the authors also perform a more standard mTRF analysis using a software package (Eelbrain) and TRF regressors that will be more familiar to researchers adjacent to these topics, which was highly appreciated for its comparative value. Comparing these TRFs with the authors' original regression results, several similarities can be seen. Specifically, response contrasts for attended versus unattended speaker during mixed speech, for the phoneme, syllable, and sentence regressors, are greater for normal-hearing participants than hearing-impaired participants for both analyses, and the temporal and spatial extents of the significant differences are roughly comparable (left-front and 0 - 200 ms for phoneme and syllable, and left and 200 - 300 ms for sentence).

      The inclusion of the mTRF analysis is helpful also because some aspects of the authors' original regression results, between the EEG data and the HM-LSTM linguistic model, are less than clear. The authors state specifically that their regression analysis is only calculated in the -100 - 300 ms window around stimulus/sentence offsets. They clarify that this means that most of the EEG data acquired while the participants are listening to the sentences is not analyzed, because their HM-LSTM model implementation represents all acoustic and linguistic features in a condensed way, around the end of the sentence. Thus the regression between data and model only occurs where the model predictions exist, which is the end of the sentences. This is in contrast to the mTRF analysis, which seems to have been done in a typical way, regressing over the entire stimulus time, because those regressors (phoneme onset, word onset, etc.) exist over the entire sentence time. If my reading of their description of the HM-LSTM regression is correct, it is surprising that the regression weights are similar between the HM-LSTM model and the mTRF model.

      However, the code that the authors uploaded to OSF seems to clarify this issue. In the file ridge_lstm.py, the authors construct the main regressor matrices called X1 and X2 which are passed to sklearn to do the ridge regression. This ridge regression step is calculated on the continuous 10-minute bouts of EEG and stimuli, and it is calculated in a loop over lag times, from -100 ms to 300 ms lag. These regressor matrices are initialized as zeros, and are then filled in two steps: the HM_LSTM model unit weights are read from numpy files and written to the matrices at one timepoint per sentence (as the authors describe in the text), and the traditional phoneme, syllable, etc. annotations are ALSO read in (from csv files) and written to the matrices, putting 1s at every timepoint of those corresponding onsets/offsets. Thus the actual model regressor matrix for the authors' main EEG results includes BOTH the HM_LSTM model weights for each sentence AND the feature/annotation times, for whichever of the 5 features is being analyzed (phonemes, syllables, words, phrases, or sentences).

      So for instance, for the syllable HM_LSTM regression results, the regressor matrix contains: 1) the HM_LSTM model weights corresponding to syllables (a static representation, placed once per sentence offset time), AND 2) the syllable onsets themselves, placed as a row of 1s at every syllable onset time. And as another example, for the word HM_LSTM regression results, the regressor matrix contains: 1) the HM_LSTM model weights corresponding to words (a static representation, placed once per sentence offset time), AND 2) the word onsets themselves, placed as a row of 1s at every word onset time.

      If my reading of the code is correct, there are two main points of clarification for interpreting these methods:

      First, the authors' window of analysis of the EEG is not "limited" to 400 ms as they say; rather the time dimension of both their ridge regression results and their traditional mTRF analysis is simply lags (400 ms-worth), and the responses/receptive fields are calculated over the entire 10-minute trials. This is the normal way of calculating receptive fields in a continuous paradigm. The authors seem to be focusing on the peri-sentence offset time points because that is where the HM_LSTM model weights are placed in the regressor matrix. Also because of this issue, it is not really correct when the authors say that some significant effect occurred at some latency "after sentence offset". The lag times of the regression results should have the traditional interpretation of lag/latency in receptive field analyses.

      Second, as both the traditional linguistic feature annotations and the HM_LSTM model weights are part of the regression for the main ridge regression results here, it is not known what the contribution specifically of the HM_LSTM portion of the regression was. Because the more traditional mTRF analysis showed many similar results to the main ridge regression results here, it seems probable that the simple feature annotations themselves, rather than the HM_LSTM model weights, are responsible for the main EEG results. A further analysis separating these two sets of regressors would shed light on this question.

    3. Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The study tests only a single deep neural network model for extracting linguistic features, which limits the robustness of the conclusions. A lower model fit does not necessarily indicate that a given type of information is absent from the neural signal-it may simply reflect that the model's representation was not optimal for capturing it. That said, this limitation is a common concern for data-driven, correlation-based approaches, and should be viewed as an inherent caveat rather than a critical flaw of the present work.

    4. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This valuable study combines a computational language model, i.e., HM-LSTM, and temporal response function (TRF) modeling to quantify the neural encoding of hierarchical linguistic information in speech, and addresses how hearing impairment affects neural encoding of speech. The analysis has been significantly improved during the revision but remain somewhat incomplete - The TRF analysis should be more clearly described and controlled. The study is of potential interest to audiologists and researchers who are interested in the neural encoding of speech.

      We thank the editors for the updated assessment. In the revised manuscript, we have added a more detailed description of the TRF analysis on p. of the revised manuscript. We have also updated Figure 1 to better visualize the analyses pipeline. Additionally, we have included a supplementary video to illustrate the architecture of the HM-LSTM model, the ridge regression methods using the model-derived features, and mTRF analysis using the acoustic envelop and the binary rate models.

      Public Reviews:

      Reviewer #1 (Public review):

      About R squared in the plots:

      The authors have used a z-scored R squared in the main ridge regression plots. While this may be interpretable, it seems non-standard and overly complicated. The authors could use a simple Pearson r to be most direct and informative (and in line with similar work, including Goldstein et al. 2022 which they mentioned). This way the sign of the relationships is preserved.

      We did not use Pearson’s r as in Goldstein et al. (2022) because our analysis did not involve a train-test split, which was a key aspect of their approach. Specifically, Goldstein et al. (2022) divided their data into training and testing sets, trained a ridge regression model on the training set, and then used the trained model to predict neural responses on the test set. They calculated Pearson’s r to assess the correlation between the predicted and observed neural responses, making the correlation coefficient (r) their primary measure of model performance. In contrast, our analysis focused on computing the model fitting performance (R²) of the ridge regression model for each sensor and time point for each subject. At the group level, we conducted one-sample t-tests with spatiotemporal cluster-based correction on the R² values to identify sensors and time windows where R² values were significantly greater than baseline. We established the baseline by normalizing the R² values using Fisher z-transformation across sensors within each subject. We have added this explanation on p.13 of the revised manuscript.

      About the new TRF analysis:

      The new TRF analysis is a necessary addition and much appreciated. However, it is missing the results for the acoustic regressors, which should be there analogous to the HM-LSTM ridge analysis. The authors should also specify which software they have utilized to conduct the new TRF analysis. It also seems that the linguistic predictors/regressors have been newly constructed in a way more consistent with previous literature (instead of using the HM-LSTM features); these specifics should also be included in the manuscript (did it come from Montreal Forced Aligner, etc.?). Now that the original HM-LSTM can be compared to a more standard TRF analysis, it is apparent that the results are similar.

      We used the Python package Eelbrain (https://eelbrain.readthedocs.io/en/r0.39/auto_examples/temporal-response-functions/trf_intro.html) to conduct the multivariate temporal response function (mTRF) analyses. As we previously explained in our response to R3, we did not apply mTRF to the acoustic features due to the high dimensionality of the input. Specifically, our acoustic representation consists of a 130-dimensional vector sampled every 10 ms throughout the speech stimuli (comprising a 129-dimensional spectrogram and a 1dimensional amplitude envelope). This led to interpreting the 130-dimensional TRF estimation difficult to interpret. A similar constraint applied to the hidden-layer activations from our HMLSTM model for the five linguistic features. After dimensionality reduction via PCA, each still resulted in 150-dimensional vectors. To address this, we instead used binary predictors marking the offset of each linguistic unit (phoneme, syllable, word, phrase, sentence). Since our speech stimuli were computer-synthesized, the phoneme and syllable boundaries were automatically generated. The word boundaries were manually annotated by a native Mandarin as in Li et al. (2022). The phrase boundaries were automatically annotated by the Stanford parser and manually checked by a native Mandarin speaker. These rate models are represented as five distinct binary time series, each aligned with the timing of the corresponding linguistic unit, making them well-suited for mTRF analysis. Although the TRF results from the 1-dimensional rate predictors and the ridge regression results from the high-dimensional HM-LSTM-derived features are similar, they encode different things: The rate regressors only encode the timing of linguistic unit boundaries, while the model-derived features encode the representational content of the linguistic input. Therefore, we do not consider the mTRF analyses to be analogous to the ridge regression analyses. Rather, these results complement each other and both provide informative results into the neural tracking of linguistic structures at different levels for the attended and unattended speech.

      Since the TRF result for the continuous acoustic features also concerns R2, we have added an mTRF analysis where we fitted the one-dimensional speech envelope to the EEG. We extracted the envelope at 10 ms intervals for both attended and unattended speech and computed mTRFs independently for each subject and sensor using a basis of 50 ms Hamming windows spanning –100 ms to 300 ms relative to envelope onset. The results showed that in hearing-impaired participants, attended speech elicited a significant cluster in the bilateral temporal regions from 270 to 300 ms post-onset (t = 2.40, p = 0.01, Cohen’s d = 0.63). Unattended speech elicited an early cluster in right temporal and occipital regions from –100 ms to –80 ms (t = 3.07, p = 0.001, d = 0.83). Normal-hearing participants showed significant envelope tracking in the left temporal region at 280–300 ms after envelope onset (t = 2.37, p = 0.037, d = 0.48), with no significant cluster for unattended speech. These results further suggest that hearing-impaired listeners may have difficulty suppressing unattended streams. We have added the new TRF results for envelope to Figure S3 and the “mTRF results for attended and unattended speech” on p.7 and the “mTRF analysis” in Material and Methods of the revised manuscript.

      The authors' wording about this suggests that these new regressors have a nonzero sample at each linguistic event's offset, not onset. This should also be clarified. As the authors know, the onset would be more standard, and using the offset has implications for understanding the timing of the TRFs, as a phoneme has a different duration than a word, which has a different duration from a sentence, etc.

      In our rate‐model mTRF analyses, we initially labelled linguistic boundaries as “offsets” because our ridge‐regression with HM-LSTM features was aligned to sentence offsets rather than onsets. However, since each offset coincides with the next unit’s onset—and our regressors simply mark these transition points as 1—the “offset” and “onset” models yield identical mTRFs. To avoid confusion, we have relabeled “offset” as “boundary” in Figure S2.

      As discussed in our prior responses, this design was based on the structure of our input to the HM-LSTM model, where each input consists of a pair of sentences encoded in phonemes, such as “t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1” (“It can fly <sep> This is an airplane”). The two sentences are separated by a special <sep> token, and the model’s objective is to determine whether the second sentence follows the first, similar to a next-sentence prediction task. Since the model processes both sentences in full before making a prediction, the neural activations of interest should correspond to the point at which the entire sentence has been processed by humans. To enable a fair comparison between the model’s internal representations and brain responses, we aligned our neural analyses with the sentence offsets, capturing the time window after the sentence has been fully perceived by the participant. Thus, we extracted epochs from -100 to +300 ms relative to each sentence offset, consistent with our model-informed design.

      We understand that phonemes, syllables, words, phrases, and sentences differ in their durations. However, the five hidden activity vectors extracted from the model are designed to capture the representations of these five linguistic levels across the entire sentence. Specifically, for a sentence pair such as “It can fly <sep> This is an airplane,” the first 2048-dimensional vector represents all the phonemes in the two sentences (“t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1”), the second vector captures all the syllables (“ta_1 nəŋ_2 fei_1 <sep> zhə_4 shiii_4 fei_1jii_1”), the third vector represents all the words, the fourth vector captures the phrases, and the fifth vector represents the sentence-level meaning. In our dataset, input pairs consist of adjacent sentences from the stimuli (e.g., Sentence 1 and Sentence 2, Sentence 2 and Sentence 3, and so on), and for each pair, the model generates five 2048-dimensional vectors, each corresponding to a specific linguistic level. To identify the neural correlates of these model-derived features—each intended to represent the full linguistic level across a complete sentence—we focused on the EEG signal surrounding the completion of the second sentence rather than on incremental processing. Accordingly, we extracted epochs from -100 ms to +300 ms relative to the offset of the second sentence and performed ridge regression analyses using the five model features (reduced to 150 dimensions via PCA) at every 50 ms across the epoch. We have added this clarification on p.12 of the revised manuscript.

      About offsets:

      TRFs can still be interpretable using the offset timings though; however, the main original analysis seems to be utilizing the offset times in a different, more confusing way. The authors still seem to be saying that only the peri-offset time of the EEG was analyzed at all, meaning the vast majority of the EEG trial durations do not factor into the main HM-LSTM response results whatsoever. The way the authors describe this does not seem to be present in any other literature, including the papers that they cite. Therefore, much more clarification on this issue is needed. If the authors mean that the regressors are simply time-locked to the EEG by aligning their offsets (rather than their onsets, because they have varying onsets or some such experimental design complexity), then this would be fine. But it does not seem to be what the authors want to say. This may be a miscommunication about the methods, or the authors may have actually only analyzed a small portion of the data. Either way, this should be clarified to be able to be interpretable.

      We hope that our response in RE4, along with the supplementary video, has helped clarify this issue. We acknowledge that prior studies have not used EEG data surrounding sentence offsets to examine neural responses at the phoneme or syllable levels. However, this is largely due to a lack of model that represent all linguistic levels across an entire sentence. There is abundant work comparing model predictors with neural data time-locked to offsets because they mark the point at which participants has already processed the relevant information (Brennan, 2016; Brennan et al., 2016; Gwilliams et al., 2024, 2025). Similarly, in our model– brain alignment study, our goal is to identify neural correlates for each model-derived feature. If we correlate model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. Although this limits our analysis to a subset of the data (143 sentences × 400 ms windows × 4 conditions), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We have added this clarification on p.12 of the revised manuscript.

      Reviewer #2 (Public review):

      This study presents a valuable finding on the neural encoding of speech in listeners with normal hearing and hearing impairment, uncovering marked differences in how attention to different levels of speech information is allocated, especially when having to selectively attend to one speaker while ignoring an irrelevant speaker. The results overall support the claims of the authors, although a more explicit behavioural task to demonstrate successful attention allocation would have strengthened the study. Importantly, the use of more "temporally continuous" analysis frameworks could have provided a better methodology to assess the entire time course of neural activity during speech listening. Despite these limitations, this interesting work will be useful to the hearing impairment and speech processing research community. The study compares speech-in-quiet vs. multi-talker scenarios, allowing to assess within-participant the impact that the addition of a competing talker has on the neural tracking of speech. Moreover, the inclusion of a population with hearing loss is useful to disentangle the effects of attention orienting and hearing ability. The diagnosis of high-frequency hearing loss was done as part of the experimental procedure by professional audiologists, leading to a high control of the main contrast of interest for the experiment. Sample size was big, allowing to draw meaningful comparisons between the two populations.

      We thank you very much for your appreciation of our research and we have now added a more description of the mTRF analyses on p.13-14 of the revised manuscript.

      An HM-LSTM model was employed to jointly extract speech features spanning from the stimulus acoustics to word-level and phrase-level information, represented by embeddings extracted at successive layers of the model. The model was specifically expanded to include lower level acoustic and phonetic information, reaching a good representation of all intermediate levels of speech. Despite conveniently extracting all features jointly, the HMLSTM model processes linguistic input sentence-by-sentence, and therefore only allows to assess the corresponding EEG data at sentence offset. If I understood correctly, while the sentence information extracted with the HM-LSTM reflects the entire sentence - in terms of its acoustic, phonetic and more abstract linguistic features - it only gives a condensed final representation of the sentence. As such, feature extraction with the HM-LSTM is not compatible with a continuous temporal mapping on the EEG signal, and this is the main reason behind the authors' decision to fit a regression at nine separate time points surrounding sentence offsets.

      Yes, you are correct. As explained in RE4, the model generates five hidden-layer activity vectors, each intended to represent all the phonemes, syllables, words, phrases within the entire sentence (“a condensed final representation”). This is the primary reason we extract EEG data surrounding the sentence offsets—this time point reflects when the full sentence has been processed by the human brain. We assume that even at this stage, residual neural responses corresponding to each linguistic level are still present and can be meaningfully analyzed.

      While valid and previously used in the literature, this methodology, in the particular context of this experiment, might be obscuring important attentional effects impacted by hearing-loss. By fitting a regression only around sentence-final speech representations, the method might be overlooking the more "online" speech processing dynamics, and only assessing the permanence of information at different speech levels at sentence offset. In other words, the acoustic attentional bias between Attended and Unattended speech might exist even in hearing-impaired participants but, due to a lower encoding or permanence of acoustic information in this population, it might only emerge when using methodologies with a higher temporal resolution, such as Temporal Response Functions (TRFs). If a univariate TRF fit simply on the continuous speech envelope did not show any attentional bias (different trial lengths should not be a problem for fitting TRFs), I would be entirely convinced of the result. For now, I am unsure on how to interpret this finding.

      We agree and we have added the mTRF results using the rate models for the 5 linguistic levels in the prior revision. The rate model aligns with the boundaries of each linguistic unit at each level. As explained in RE3, the rate regressors encode the timing of linguistic unit boundaries, while the model-derived features encode the representational content of the linguistic input. The mTRF results showed similar patterns to those observed using features from our HM-LSTM model with ridge regression (see Figure S2). These results complement each other and both provide informative results into the neural tracking of linguistic structures at different levels for the attended and unattended speech.

      We have also added TRF results fitting the envelope of attended and unattended speech at every 10 ms to the whole 10-minute EEG data at every 10 ms. Our results showed that in hearing-impaired participants, attended speech elicited a significant cluster in the bilateral temporal regions from 270 to 300 ms post-onset (t = 2.40, p = 0.01, Cohen’s d = 0.63). Unattended speech elicited an early cluster in right temporal and occipital regions from –100 ms to –80 ms (t = 3.07, p = 0.001, d = 0.83). Normal-hearing participants showed significant envelope tracking in the left temporal region at 280–300 ms after envelope onset (t = 2.37, p = 0.037, d = 0.48), with no significant cluster for unattended speech. These results further suggest that hearing-impaired listeners may have difficulty suppressing unattended streams. We have added the new TRF results for envelope to Figure S3 and the “mTRF results for attended and unattended speech” on p.7 and the “mTRF analysis” in Material and Methods of the revised manuscript.

      Despite my doubts on the appropriateness of condensed speech representations and singlepoint regression for acoustic features in particular, the current methodology allows the authors to explore their research questions, and the results support their conclusions. This work presents an interesting finding on the limits of attentional bias in a cocktail-party scenario, suggesting that fundamentally different neural attentional filters are employed by listeners with highfrequency hearing loss, even in terms of the tracking of speech acoustics. Moreover, the rich dataset collected by the authors is a great contribution to open science and will offer opportunities for re-analysis.

      We sincerely thank you again for your encouraging comments regarding the impact of our study.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments. The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain. The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. It is also not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. More quantitative metrics on acoustic/linguistic-related downstream tasks, such as speaker identification and phoneme/syllable/word recognition based on these intermediate layers, can better characterize the capacity of the DNN model.

      We agree that, before aligning model representations with neural data, it is essential to confirm that the model encodes linguistic information at multiple hierarchical levels. This is the purpose of our validation analysis: We evaluated the model’s representations across five layers using a test set of 20 four-syllable sentences in which every syllable shares the same vowel—e.g., “mā ma mà mǎ” (mother scolds horse), “shū shu shǔ shù” (uncle counts numbers; see Table S1). We hypothesized that the activity in the phoneme and syllable layer would be more similar than other layers for same-vowel sentences. The results confirmed our hypothesis: Hidden-layer activity for same-vowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels Figure 3C displays the scatter plot of the model activity at the five linguistic levels for each of the 20 4-syllable sentences, post dimension reduction using multidimensional scaling (MDS). We used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1).The plot reveals that model representations at the phoneme and syllable levels are more dispersed for each sentence, while representations at the higher linguistic levels—word, phrase, and sentence—are more centralized. Additionally, similar phonemes tend to cluster together across the phoneme and syllable layers, indicating that the model captures a greater amount of information at these levels when the phonemes within the sentences are similar.

      Apart from the DNN model, we also included the rate models which simply mark 1 at each unit boundaries across the 5 levels. We performed mTRF analyses with these rate models and found similar patterns to our ridge‐regression results with the DNN: (see Figure S2). This provides further evidence that the model reliably captures information across all five hierarchical levels.

      Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time.

      We agree that lower-level linguistic features may be distributed throughout the whole sentence, however, using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. Additionally, our model activity represents a “condensed final representation” at the five linguistic levels for the whole sentence, rather than incrementally during the sentence. We think the -100 to 300 ms time window relative to each sentence offset targets the exact moment when full-sentence representations are comprehended and a “condensed final representation” for the whole sentence across five linguistic level have been formed in the brain. We have added this clarification on p.13 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some specifics and clarifications of my public review:

      Initially I was interpreting the R squared as a continuous measure of predicted EEG relative to actual EEG, based on an encoding model, but this does not appear to be correct. Thank you for pointing out that the y axis is z-scored R squared in your main ridge regression plots. However, I am not sure why/how you chose to represent this that way. It seems to me that a simple Pearson r would be most informative here (and in line with similar work, including Goldstein et al. 2022 that you mentioned). That way you preserve the sign of the relationships between the regressors and the EEG. With R squared, we have a different interpretation, which is maybe also ok, but I also don't see the point of z-scoring R squared. Another possibility is that when you say "z-transformed" you are referring to the Fisher transformation; is that the case? In the plots you say "normalized", so that sounds like a z-score, but this needs to be clarified; as I say, a simple Pearson r would probably be best.

      We did not use Pearson’s r, as in Goldstein et al. (2022), because our analysis did not involve a train-test split, which was central to their approach. In their study, the data were divided into training and testing sets, and a ridge regression model was trained on the training set. They then used the trained model to predict neural responses on the held-out test set, and calculated Pearson’s r to assess the correlation between the predicted and observed neural responses. As a result, their final metric of model performance was the correlation coefficient (r). In contrast, our analysis is more aligned with standard temporal response function (TRF) approaches. We did not perform a train-test split; instead, we computed the model fitting performance (R²) of the ridge regression model at each sensor and time point for each subject. At the group level, we conducted one-sample t-tests with spatiotemporal cluster-based correction on the R² values to determine which sensors and time windows showed significantly greater R² values than baseline. To establish a baseline, we z-scored the R² values across sensors and time points, effectively centering the distribution around zero. This normalization allowed us to interpret deviations from the mean R² as meaningful increases in model performance and provided a suitable baseline for the statistical tests. We have added this clarification on p.13 of the revised manuscript.

      Thank you for doing the TRF analysis, but where are the acoustic TRFs, analogous to the acoustic results for your HM-LSTM ridge analyses? And what tools did you use to do the TRF analysis? If it is something like the mTRF MATLAB toolbox, then it is also using ridge regression, as you have already done in your original analysis, correct? If so, then it is pretty much the same as your original analysis, just with more dense timepoints, correct? This is what I meant by referring to TRFs originally, because what you have basically done originally was to make a 9-point TRF (and then the plots and analyses are contrasts of pairs of those), with lags between -100 and 300 ms relative to the temporal alignment between the regressors and the EEG, I think (more on this below).

      Also with the new TRF analysis, you say that the regressors/predictors had "a value of 1 at each unit boundary offset". So this means you re-made these predictors to be discrete as I and reviewer 3 were mentioning before (rather than using the HM-LSTM model layer(s)), and also, that you put each phoneme/word/etc. marker at its offset, rather than its onset? I'm also confused as to why you would do this rather than the onset, but I suppose it doesn't change the interpretation very much, just that the TRFs are slid over by a small amount.

      We used the Python package Eelbrain (https://eelbrain.readthedocs.io/en/r0.39/auto_examples/temporal-response-functions/trf_intro.html) to conduct the multivariate temporal response function (mTRF) analyses. As we previously explained in our response to Reviewer 3, we did not apply mTRF to the acoustic features due to the high dimensionality of the input. Specifically, our acoustic representation consists of a 130-dimensional vector sampled every 10 ms throughout the speech stimuli (comprising a 129-dimensional spectrogram and a 1-dimensional amplitude envelope). This renders the 130 TRF weights to the acoustic features uninterpretable. However, we have now added TRF results from the 1- dimension envelope to the attended and unattended speech at every 10 ms.

      A similar constraint applied to the hidden-layer activations from our HM-LSTM model for the five linguistic features. After dimensionality reduction via PCA, each still resulted in 150-dimensional vectors, further preventing their use in mTRF analyses. To address this, we instead used binary predictors marking the offset of each linguistic unit (phoneme, syllable, word, phrase, sentence). These rate models are represented as five distinct binary time series, each aligned with the timing of the corresponding linguistic unit, making them well-suited for mTRF analysis. It is important to note that these rate predictors differ from the HM-LSTMderived features: They encode only the timing of linguistic unit boundaries, not the content or representational structure of the linguistic input. Therefore, we do not consider the mTRF analyses to be equivalent to the ridge regression analyses based on HM-LSTM features

      For onset vs. offset, as explained RE4, we labelled them “offsets” because our ridge‐regression with HM-LSTM features was aligned to sentence offsets rather than onsets (see RE4 and RE15 below for the rationale of using sentence offset). However, since each unit offset coincides with the next unit’s onset—and the rate model simply mark these transition points as 1—the “offset” and “onset” models yield identical mTRFs. To avoid confusion, we have relabeled “offset” as “boundary” in Figure S2.

      I'm still confused about offsets generally. Does this maybe mean that the EEG, and each predictor, are all aligned by aligning their endpoints, which are usually/always the ends of sentences? So e.g. all the phoneme activity in the phoneme regressor actually corresponds to those phonemes of the stimuli in the EEG time, but those regressors and EEG do not have a common starting time (one trial to the next maybe?), so they have to be aligned with their ends instead?

      We chose to use sentence offsets rather than onsets based on the structure of our input to the HM-LSTM model, where each input consists of a pair of sentences encoded in phonemes, such as “t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1” (“It can fly <sep> This is an airplane”). The two sentences are separated by a special <sep> token, and the model’s objective is to determine whether the second sentence follows the first, similar to a next-sentence prediction task. Since the model processes both sentences in full before making a prediction, the neural activations of interest should correspond to the point at which the entire sentence has been processed. To enable a fair comparison between the model’s internal representations and brain responses, we aligned our neural analyses with the sentence offsets, capturing the time window after the sentence has been fully perceived by the participant. Thus, we extracted epochs from -100 to +300 ms relative to each sentence offset, consistent with our modelinformed design. If we align model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation.

      We understand that it is a bit confusing why the regressor of each level is not aligned to their own offsets in the data. The hidden-layer activations of the HM-LSTM model corresponding to the five linguistic levels (phoneme, syllable, word, phrase, sentence) are consistently 150-dimensional vectors after PCA reduction. As a result, for each input sentence pair, the model produces five distinct hidden-layer activations, each capturing the representational content associated with one linguistic level for the whole sentence. We believe our -100 to 300 ms time window relative to sentence offset reflects a meaningful period during which the brain integrates and comprehends information across multiple linguistic levels.

      Being "time-locked to the offset of each sentence at nine latencies" is not something I can really find in any of the references that you mentioned, regarding the offset aspect of this method. Can you point me more specifically to what you are trying to reference with that, or further explain? You said that "predicting EEG signals around the offset of each sentence" is "a method commonly employed in the literature", but the example you gave of Goldstein 2022 is using onsets of words, which is indeed much more in line with what I would expect (not offsets of sentences).

      You are correct that Goldstein (2022) aligned model predictions to onsets rather than offsets; however, many studies in the literature also align model predictions with unit offsets. typically because they mark the point at which participants has already processed the relevant information (Brennan, 2016; Brennan et al., 2016; Gwilliams et al., 2024, 2025). Similarly, in our study, we aim to identify neural correlates for each model-derived feature. If we correlate model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation. Although this limits our analysis to a subset of the data (143 sentences × 400 ms windows × 4 conditions), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We have added this clarification on p.12 of the revised manuscript.

      This new sentence does not make sense to me: "The regressors are aligned to sentence offsets because all our regressors are taken from the hidden layer of our HM-LSTM model, which generates vector representations corresponding to the five linguistic levels of the entire sentence".

      Thank you for the suggestion. We hope our responses in RE4, 15 and 16, along with our supplementary video have now clarified the issue. We have deleted the sentence and provided a more detailed explanation on p.12 of the revised manuscript: The regressors are aligned to sentence offsets because our goal is to identify neural correlates for each model-derived feature of a whole sentence. If we align model activity with EEG data time-locked to sentence onsets, we would be finding neural responses to linguistic levels (from phoneme to sentence) of the whole sentence at the time when participants have not processed the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation. Although this limits our analysis to a subset of the data (143 sentences × 2 sections × 400 ms windows), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We understand that phonemes, syllables, words, phrases, and sentences differ in their durations. However, the five hidden activity vectors extracted from the model are designed to capture the representations of these five linguistic levels across the entire sentence Specifically, for a sentence pair such as “It can fly <sep> This is an airplane,” the first 2048dimensional vector represents all the phonemes in the two sentences (“t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1”), the second vector captures all the syllables (“ta_1 nəŋ_2 fei_1 <sep> zhə_4 shiii_4 fei_1jii_1”), the third vector represents all the words, the fourth vector captures the phrases, and the fifth vector represents the sentence-level meaning. In our dataset, input pairs consist of adjacent sentences from the stimuli (e.g., Sentence 1 and Sentence 2, Sentence 2 and Sentence 3, and so on), and for each pair, the model generates five 2048dimensional vectors, each corresponding to a specific linguistic level. To identify the neural correlates of these model-derived features—each intended to represent the full linguistic level across a complete sentence—we focused on the EEG signal surrounding the completion of the second sentence rather than on incremental processing. Accordingly, we extracted epochs from -100 ms to +300 ms relative to the offset of the second sentence and performed ridge regression analyses using the five model features (reduced to 150 dimensions via PCA) at every 50 ms across the epoch.

      More on the issue of sentence offsets: In response to reviewer 3's question about -100 - 300 ms around sentence offset, you said "Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentence." This does not make sense to me, so can you elaborate? It sounds like you are actually saying that you only analyzed 400 ms of each trial, but that cannot be what you mean.

      Yes, we analyzed only the 400 ms window surrounding each sentence offset. Although this represents just a subset of our data (143 sentences × 400 ms × 4 conditions), it precisely captures when full-sentence representations emerge against background speech. Because our model produces a single, condensed representation for each linguistic level over the entire sentence—rather than incrementally—we think it is more appropriate to align to the period surrounding sentence offsets. Additionally, extending the window (e.g. to 2 seconds) would risk overlapping adjacent sentences, since sentence lengths vary. Our focus is on the exact period when integrated, level-specific information for each sentence has formed in the brain, and our results already demonstrate different response patterns to different linguistic levels for the two listener groups within this interval. We have added this clarification on p.13 of the revised manuscript.

      In your mTRF analysis, you are now saying that the discrete predictors have "a value of 1" at each of the "boundary offsets", and those TRFs look very similar to your original plots. It sounds to me like you should not be referring to time zero in your original ridge analysis as "sentence offset". If what you mean is that sentence offset time is merely how you aligned the regressors and EEG in time, then your time zero still has a standard, typical TRF interpretation. It is just the point in time, or lag, at which the regressor(s) and EEG are aligned. So activity before zero is "predictive" and activity after zero is "reactive", to think of it crudely. So also in the text, when you say things like "50-150 ms after the sentence offsets", I think this is not really what you mean. I think you are referring to the lags of 50 - 150 ms, relative to the alignment of the regressor and the EEG.

      Thank you very much for the explanation. We agree that, in our ridge‐regression time course, pre zero lags index “predictive” processing and post-zero lags index “reactive” processing. Unlike TRF analysis, we applied ridge regression to our high-dimensional model features at nine discrete lags around the sentence offset. At each lag, we tested whether the regression score exceeded a baseline defined as the mean regression score across all lags. For example, finding a significantly higher regression score between 50 and 150 ms suggests that our regressor reliably predicted EEG activity in that time window. So here time zero refers to the precise moment of the sentence offset—not the the alignment of the regressor and the EEG.

      I look forward to discussing how much of my interpretation here makes sense or doesn't, both with the authors and reviewers.

      Thank you very much for these very constructive feedback and we hope that we have addressed all your questions.

    1. eLife Assessment

      This study investigates low-affinity Ca2+ binding by WT calreticulin and mutant calreticulin associated with type I myeloproliferative neoplasms, as well as the impact on Ca2+ fluxes in suspension cultures of megakaryocyte-like cells in vitro in response to ER Ca2+ ATPase inhibitors that deplete endoplasmic reticulum (ER) Ca2+ store and open plasma membrane Ca2+ channels through STIM1-Orai interactions. The results are important in that they show that Ca2+ binding by calreticulin and store-operated Ca2+ entry are not fundamentally impacted by the type I deletion mutation in calreticulin, which rules out a direct effect of the calreticulin mutation on its own low-affinity Ca2+ binding and any broad impact on ER Ca2+ regulation. The strength of the data and methods used ranges from solid to convincing, although the use of suspension-based flow cytometric assays to investigate ER Ca2+ levels and Ca2+ entry can be challenged. High-affinity Ca2+ binding sites could be further considered, and possible confounding effects of Abl kinase activity in the megakaryocyte-like cell lines could be offset.

    2. Reviewer #1 (Public review):

      The authors attempted to compare calcium calcium-binding properties of wildtype calreticulin with calreticulin deletion mutant (CRTDel52) associated with myeloproliferative neoplasms.

      The researchers conducted their study using advanced techniques. They found almost no difference in calcium binding between the two proteins and observed no impact on calcium signaling, specifically store-operated calcium entry (SOCE). The study also noted an increase in ER luminal calcium-binding chaperone proteins. Surprisingly, the authors selected flow cytometry as a technique for measurements of ER luminal calcium. Considering the limitations of this approach, it would be better to use alternative approaches. This is particularly important as previous reports, using cells from MPN patients, indicate reduced ER luminal calcium and effects on SOCE (Blood, 2020). This issue matters because earlier research with MPN patient cells reported reduced ER luminal calcium levels and altered SOCE (Blood, 2020). How do the authors explain the difference between their results and previous findings about lower ER luminal calcium and changed SOCE in MPN patient cells expressing CRTDel52? Other studies have found that unfolded protein responses are activated in MPN cells with CRTDel52 calreticulin (see Blood, 2021), and increased UPR could account for higher levels of some ER-resident calcium-binding proteins observed here. Overall, it remains unclear how this work improves our understanding of MPN or clarifies calreticulin's role in MPN pathophysiology.

    3. Reviewer #2 (Public review):

      Summary:

      Tagoe and colleagues present a thorough analysis of the calcium (Ca2+) binding capacity of calreticulin (CRT), an endoplasmic reticulum (ER) Ca2+-buffer protein, using a mutant version (CRT del52) found in myeloproliferative neoplasms (MPNs). The authors use purified human CRT protein variants, CRT-KO cell lines, and an MPN cell line to elucidate the differing Ca2+ dynamics, both on the level of the protein and on cell-wide Ca2+-governed processes. In sum, the authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      First, the authors purify CRT protein and perform isothermal titration calorimetry to quantify the Ca2+ binding capacity of CRT. They use full-length human CRT, CRT del52, and two truncations of CRT (1-339 and 1-351, the former of which should lead to the entire loss of low-affinity Ca2+ binding). While CRT del52 has previously been shown to lead to a decrease in Ca2+ binding affinity in other models, the ITC data show that this is retained in CRT del52.

      Next, the authors utilize a CRT-KO cell line with subsequent addition of CRT protein variants to validate these findings with flow cytometric analysis. Cells were transfected with a ratiometric ER Ca2+ probe, and fluorescence indicates that CRT del52 is unable to restore basal ER Ca2+ levels to the same extent as CRT wild-type. To translate these findings to MPNs, the authors perform CRT-KO in a megakaryocytic cell line, where reconstitution with either CRT variant did not cause a difference in cytosolic calcium levels. The authors further test store-operated calcium entry (SOCE), an important process for maintaining ER Ca2+ levels, in these cells, and find that CRT-KO cells have lower SOCE activity, and that this can be slightly recovered with CRT addition.

      Finally, the authors ask whether other effects of CRT-KO/reconstitution can affect the cellular Ca2+ signaling pathway and levels. RNASeq analysis revealed that CRT-KO leads to an increase in various chaperone protein expressions, and that reconstitution with CRT del52 is unable to reduce expression to the same extent as reconstitution with CRT wildtype.

      Strengths:

      The authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      Weaknesses:

      (1) The authors should consider discussing the high-affinity Ca2+ binding site more in the introduction. Can they show a proof-of-concept experiment that validates that incubation of recombinant CRT reduces the function of that high-affinity Ca2+ binding site?

      (2) For Figure 2B, do you have an explanation for why the purified proteins run higher than predicted (48-52kDa) - are these proteins still tagged with pGB1?

      (3) The MEG-01 cell line has the BCR::ABL1 translocation, while CRT mutations are strictly found in BCR::ABL1 negative MPNs. Could these experiments be repeated in these cells treated with imatinib to decrease these effects, or see if basal MEG-01 Ca2+ levels/activity are changed with or without imatinib?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The researchers conducted their study using advanced techniques. They found almost no difference in calcium binding between the two proteins and observed no impact on calcium signaling, specifically store-operated calcium entry (SOCE). The study also noted an increase in ER luminal calcium-binding chaperone proteins. Surprisingly, the authors selected flow cytometry as a technique for measurements of ER luminal calcium. Considering the limitations of this approach, it would be better to use alternative approaches.

      The flow cytometric assay shows good responsiveness to conditions expected to alter ER calcium levels (Figure 4C), is high throughput compared to microscopy, and allows for averaging of signals across a large number of cells. This was thus our original method of choice.

      This is particularly important as previous reports, using cells from MPN patients, indicate reduced ER luminal calcium and effects on SOCE (Blood, 2020). This issue matters because earlier research with MPN patient cells reported reduced ER luminal calcium levels and altered SOCE (Blood, 2020). How do the authors explain the difference between their results and previous findings about lower ER luminal calcium and changed SOCE in MPN patient cells expressing CRTDel52?

      We thank the reviewer for asking for these clarifications. The referenced study (Di Buduo et al. Blood, 135(2):133-143, 2020) first showed that thrombopoietin induces spontaneous cytosolic calcium spikes in cultured megakaryocytes, which is dependent on store operated calcium entry (SOCE). In parallel, STIM1-ORAI interactions were induced by thrombopoietin. On the other hand, the addition of thrombopoietin caused the dissociation of STIM1-calreticulin interactions, based on proximity ligation assays. The implication is that signaling via the thrombopoietin receptor (TPOR/MPL) activation induces the dissociation of calreticulin-STIM1 complexes, and the formation of STIM1-ORAI complexes, which contribute to the measured spontaneous cytosolic calcium spikes. Different MPN mutations induced spontaneous calcium spikes in a thrombopoietin-independent manner, including the JAK2V617F mutations and the CALR type I and type II mutations. The study found that the number of megakaryocytes exhibiting spontaneous calcium spikes was enhanced in the context of both type I and type II CALR mutations compared to the JAK2V617F mutant. Correspondingly, the calreticulin-STIM1 interactions/cell were more significantly reduced for type I and type II CALR mutations compared to the JAK2V617F mutant. It was suggested that defective interactions between mutant calreticulin, ERp57, and STIM1 activated SOCE and generated spontaneous cytosolic calcium spikes. However, based on the findings with thrombopoietin, the spontaneous calcium spikes could simply result from thrombopoietin-independent MPL activation by the mutant calreticulin and JAK2V617F and downstream signaling. Importantly, the referenced studies did not directly measure ER luminal calcium. A number of undefined factors could account for the measured differences between the megakaryocytes from patients with calreticulin mutations vs. JAK2V617F. These include the relative mutant allele burdens, the extent of MPL activation, as well as genetic differences unrelated to calreticulin. Different from these experiments, through the use of purified proteins, our studies show that the Del52 mutant has calcium binding characteristics resembling that of the wild type protein. Additionally, through genetic manipulations in cell lines, our studies directly address the effects of calreticulin KO and its Del52 mutation upon ER luminal and cytosolic calcium levels, and cellular SOCE signals. We did not measure significant differences in any of these parameters between the KO cells and those reconstituted with wild type calreticulin or the Del52 mutant. As noted by the editors, these results show that Ca2+ binding by calreticulin and store-operated Ca2+ entry in a cell are not fundamentally impacted by the type I deletion mutation. On the other hand, in primary megakaryocytes, when co-expressed with MPL, the Del52 mutant, through its known ability to bind and activate TPOR/MPL, is expected to induce SOCE and calcium fluxes similar to those induced by thrombopoietin. These points will be clarified in the revised discussion.

      Other studies have found that unfolded protein responses are activated in MPN cells with CRTDel52 calreticulin (see Blood, 2021), and increased UPR could account for higher levels of some ER-resident calcium-binding proteins observed here.

      Multiple studies have suggested the induction of the unfolded protein response (UPR) in cells expressing MPN mutants of calreticulin.  We don’t know the specific signals that cause the upregulation of various calcium binding proteins in calreticulin-KO cells and cells expressing the Del52 mutant. Indeed, these could result from increased protein misfolding in cells with wild type calreticulin deficiency. Alternatively, the sensing of cellular calcium perturbations could induce their expression. Regardless of the precise mechanisms underlying the expression changes in calcium binding proteins, the upregulated factors are predicted to compensate for calreticulin deficiency and contribute to the maintenance of the overall cellular calcium homeostasis. These points will be clarified in the revised discussion.

      Overall, it remains unclear how this work improves our understanding of MPN or clarifies calreticulin's role in MPN pathophysiology.

      The points discussed above as well as their implications for the understanding of calreticulin’s role in MPN pathophysiology will be clarified in the revised manuscript.

      Reviewer #2 (Public review):

      Tagoe and colleagues present a thorough analysis of the calcium (Ca2+) binding capacity of calreticulin (CRT), an endoplasmic reticulum (ER) Ca2+-buffer protein, using a mutant version (CRT del52) found in myeloproliferative neoplasms (MPNs). The authors use purified human CRT protein variants, CRT-KO cell lines, and an MPN cell line to elucidate the differing Ca2+ dynamics, both on the level of the protein and on cell-wide Ca2+-governed processes. In sum, the authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      First, the authors purify CRT protein and perform isothermal titration calorimetry to quantify the Ca2+ binding capacity of CRT. They use full-length human CRT, CRT del52, and two truncations of CRT (1-339 and 1-351, the former of which should lead to the entire loss of low-affinity Ca2+ binding). While CRT del52 has previously been shown to lead to a decrease in Ca2+ binding affinity in other models, the ITC data show that this is retained in CRT del52.

      Next, the authors utilize a CRT-KO cell line with subsequent addition of CRT protein variants to validate these findings with flow cytometric analysis. Cells were transfected with a ratiometric ER Ca2+ probe, and fluorescence indicates that CRT del52 is unable to restore basal ER Ca2+ levels to the same extent as CRT wild-type. To translate these findings to MPNs, the authors perform CRT-KO in a megakaryocytic cell line, where reconstitution with either CRT variant did not cause a difference in cytosolic calcium levels. The authors further test store-operated calcium entry (SOCE), an important process for maintaining ER Ca2+ levels, in these cells, and find that CRT-KO cells have lower SOCE activity, and that this can be slightly recovered with CRT addition.

      Finally, the authors ask whether other effects of CRT-KO/reconstitution can affect the cellular Ca2+ signaling pathway and levels. RNASeq analysis revealed that CRT-KO leads to an increase in various chaperone protein expressions, and that reconstitution with CRT del52 is unable to reduce expression to the same extent as reconstitution with CRT wildtype.

      Strengths:

      The authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      We thank the reviewer for the recognition that this study is important for our understanding of both normal and malignant cell biology.

      Weaknesses:

      (1) The authors should consider discussing the high-affinity Ca2+ binding site more in the introduction. Can they show a proof-of-concept experiment that validates that incubation of recombinant CRT reduces the function of that high-affinity Ca2+ binding site?

      In a previous study (Wijeyesakere et al. 2011 J. Biol Chem, 286 8771-8785), we showed that at a starting calcium concentration of 0 mM and with 3.3 mM injections of CaCl<sub>2</sub>, the measured K<sub>D</sub> value was 16.6 mM for calcium binding to wild type murine calreticulin, (which has  ~95% % sequence identity with human calreticulin), corresponding to the high affinity site. On the other hand, at a starting calcium concentration of 50 mM and with 33 mM CaCl<sub>2</sub>  injections, the measured K<sub>D</sub> value for calcium binding to wild type murine calreticulin was 590 mM (corresponding to the low affinity sites). Thus, we did not measure the high affinity sites when the starting calcium concentration was 50 mM. This point will be clarified in the revised manuscript.

      (2) For Figure 2B, do you have an explanation for why the purified proteins run higher than predicted (48-52kDa) - are these proteins still tagged with pGB1?

      Yes, the purified proteins shown in Figure 2B retained a GB1 tag. This point will be clarified in the revised manuscript.

      (3) The MEG-01 cell line has the BCR:ABL1 translocation, while CRT mutations are strictly found in BCR:ABL1 negative MPNs. Could these experiments be repeated in these cells treated with imatinib to decrease these effects, or see if basal MEG-01 Ca2+ levels/activity are changed with or without imatinib?

      Thank you for the important point. We will assess cytosolic calcium levels in MEG-01 cells with or without imatinib.

    1. eLife Assessment

      This important study combines a two-person joint hand-reaching paradigm with game-theoretical modeling to examine whether, and how, one's reflexive visuomotor responses are modulated by a partner's control policy and cost structure. The study provides a solid and novel set of behavioral findings suggesting that involuntary visuomotor feedback is indeed modulated in the context of interpersonal coordination. The work will be of interest to cognitive scientists studying the motoric and social aspects of action control.

    2. Reviewer #1 (Public review):

      Summary:

      Sullivan and colleagues examined the modulation of reflexive visuomotor responses during collaboration between pairs of participants performing a joint reaching movement to a target. In their experiments, the players jointly controlled a cursor that they had to move towards narrow or wide targets. In each experimental block, each participant had a different type of target they had to move the joint cursor to. During the experiment, the authors used lateral perturbation of the cursor to test participants' fast feedback responses to the different target types. The authors suggest participants integrate the target type and related cost of their partner into their own movements, which suggests that visuomotor gains are affected by the partner's task.

      Strengths:

      The topic of the manuscript is very interesting, and the authors are using well-established methodology to test their hypothesis. They combine experimental studies with optimal control models to further support their work. Overall, the manuscript is very timely and shows important findings - that the feedback responses reflect both our and our partner's tasks.

      Weaknesses:

      However, in the current version of the manuscript, I believe the results could also be interpreted differently, which suggests that the authors should provide further support for their hypothesis and conclusions.

      Major Comments:

      (1) Results of the relevant conditions:

      In addition to the authors' explanation regarding the results, it is also possible that the results represent a simple modulation of the reflexive response to a scaled version of cursor movement. That is, when the cursor is partially controlled by a partner, which also contributes to reducing movement error, it can also be interpreted by the sensorimotor system as a scaling of hand-to-cursor movement. In this case, the reflexes are modulated according to a scaling factor (how much do I need to move to bring the cursor to the target). I believe that a single-agent simulation of an OFC model with a scaling factor in the lateral direction can generate the same predictions as those presented by the authors in this study. In other words, maybe the controller has learned about the nature of the perturbation in each specific context, that in some conditions I need to control strongly, whereas in others I do not (without having any model of the partner). I suggest that the authors demonstrate how they can distinguish their interpretation of the results from other explanations.

      (2) The effect of the partner target:

      The authors presented both self and partner targets together. While the effect of each target type, presented separately, is known, it is unclear how presenting both simultaneously affects individual response. That is, does a small target with a background of the wide target affect the reflexive response in the case of a single participant moving? The results of Experiment 2, comparing the case of partner- and self-relevant targets versus partner-irrelevant and self-relevant targets, may suggest that the system acted based on the relevant target, regardless of the presence and instructions regarding the self-target.

      (3) Experiment instructions:

      It is unclear what the general instructions were for the participants and whether the instructions provided set the proposed weighted cost, which could be altered with different instructions.

      (4) Some work has shown that the gain of visuomotor feedback responses reflects the time to target and that this is updated online after a perturbation (Cesonis & Franklin, 2020, eNeuro; Cesonis and Franklin, 2021, NBDT; also related to Crevecoeur et al., 2013, J Neurophysiol). These models would predict different feedback gains depending on the distance remaining to the target for the participant and the time to correct for the jump, which is directly affected by the small or large targets. Could this time be used to target instead of explaining the results? I don't believe that this is the case, but the authors should try to rule out other interpretations. This is maybe a minor point, but perhaps more important is the location (& time remaining) for each participant at the time of the jump. It appears from the figures that this might be affected by the condition (given the change in movement lengths - see Figure 3 B & C). If this is the case, then could some of the feedback gain be related to these parameters and not the model of the partner, as suggested? Some evidence to rule this out would be a good addition to the paper - perhaps the distance of each partner at the time of the perturbation, for example. In addition, please analyze the synchrony of the two partners' movements.

    3. Reviewer #2 (Public review):

      Summary:

      Sullivan and colleagues studied the fast, involuntary, sensorimotor feedback control in interpersonal coordination. Using a cleverly designed joint-reaching experiment that separately manipulated the accuracy demands for a pair of participants, they demonstrated that the rapid visuomotor feedback response of a human participant to a sudden visual perturbation is modulated by his/her partner's control policy and cost. The behavioral results are well-matched with the predictions of the optimal feedback control framework implemented with the dynamic game theory model. Overall, the study provides an important and novel set of results on the fast, involuntary feedback response in human motor control, in the context of interpersonal coordination.

      Review:

      Sullivan and colleagues investigated whether fast, involuntary sensorimotor feedback control is modulated by the partner's state (e.g., cost and control policy) during interpersonal coordination. They asked a pair of participants to make a reaching movement to control a cursor and hit a target, where the cursor's position was a combination of each participant's hand position. To examine fast visuomotor feedback response, the authors applied a sudden shift in either the cursor (experiment 1) or the target (experiment 2) position in the middle of movement. To test the involvement of partner's information in the feedback response, they independently manipulated the accuracy demand for each participant by varying the lateral length of the target (i.e., a wider/narrower target has a lower/higher demand for correction when movement is perturbed). Because participants could also see their partner's target, they could theoretically take this information (e.g., whether their partner would correct, whether their correction would help their partner, etc.) into account when responding to the sudden visual shift. Computationally, the task structure can be handled using dynamic game theory, and the partner's feedback control policy and cost function are integrated into the optimal feedback control framework. As predicted by the model, the authors demonstrated that the rapid visuomotor feedback response to a sudden visual perturbation is modulated by the partner's control policy and cost. When their partner's target was narrow, they made rapid feedback corrections even when their own target was wide (no need for correction), suggesting integration of their partner's cost function. Similarly, they made corrections to a lesser degree when both targets were narrower than when the partner's target was wider, suggesting that the feedback correction takes the partner's correction (i.e., feedback control policy) into account.

      The strength of the current paper lies in the combination of clever behavioral experiments that independently manipulate each participant's accuracy demand and a sophisticated computational approach that integrates optimal feedback control and dynamic game theory. Both the experimental design and data analysis sound good. While the main claim is well-supported by the results, the only current weakness is the lack of discussion of limitations and an alternative explanation. Adding these points will further strengthen the paper.

    1. eLife Assessment

      This important study addresses a classic debate in visual processing, using a strong method applied to a rare clinical population to evaluate hierarchical models of visual object perception. The paper finds only partial support for the hierarchical model: as expected, neural responses in ventral visual cortex show increased representational selectivity for faces along the posterior-anterior axes, but the onsets of the signals do not show a temporal hierarchy, indicating more parallel processing. The iEEG dataset is impressive, but the evidence for lack of temporal hierarchy is incomplete: essential quality checks need to be performed, and statistical analyses adapted to ensure that the data and analyses would be able to reveal temporal hierarchy if it were present in the data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript aims to test the idea that visual recognition (of faces) is hierarchically organized in the human ventral occipital-temporal cortex (VOTC). The paper proposes that if VOTC has a hierarchical organization, this should be seen in two independent features of the VOTC signal. First, hierarchy assumes that signals along the hierarchy increase in representational complexity. Second, hierarchy assumes a progressive increase in the onset time of the earliest neural response at each level of the hierarchy. To test these predictions, the authors extract high-frequency broadband signals from iEEG electrodes in a very large sample of patients (N=140). They find that face selectivity in these signals is distributed across the VOTC with increasing posterior-anterior face selectivity, hence providing evidence for the first prediction. However, they also find broadband activity to occur concurrently, therefore challenging the view of a serial hierarchy.

      Strengths:

      (1) The hypothesis (that VOTC is hierarchically organized) and predictions (that hierarchy predicts increases in representational complexity and increases in onset time) were clearly described.

      (2) The number of subjects sampled (140) is extremely large for iEEG studies that typically involve <10 subjects. Also, 444 face selective recording contacts provide a very nice sampling of the areas of interest.

      Weaknesses:

      (1) A control analysis where areas have known differences in response onset should be performed to increase confidence that the proposed analyses would reveal expected results when a difference in response onset was present across areas. From Figure 3, it can be seen that many electrodes are placed in earlier visual areas (V1-V3) that have previously been shown to have earlier broadband responses to visual images compared to VOTC (e.g. Martin et al., 2019, JNeurosci https://doi.org/10.1523/JNEUROSCI.1889-18.2018). The same analyses as in Figures 4 and 5 should be used comparing VOTC to early visual areas to confirm that the analyses would detect that V1-V3 have earlier onsets compared to VOTC.

      (2) It is unclear why correlating mean timeseries helps understand how much variance is shared between regions (Figure 4). Any variance between images is lost when averaging time series across all images, and this metric thus overestimates the variance shared between areas. Moreover, the finding that correlating time domain signals across VOTC areas does not differ from correlating signals within an area could be driven by this averaging. For example, if the same analysis was done on electrodes in left and right V1 when half of the images had contrast in the left hemifield and the other half had contrast in the right hemifield, the average signals may correlate extremely well, while this correlation falls apart on a trial-by-trial basis. These analyses therefore need to be evaluated on a trial-by-trial basis.

      (3) Previous studies on visual processing in VOTC have shown that evoked potentials are more predictive of the onset of visual stimuli than broadband activity (e.g. Miller et al., 2016, PLOS CB, https://doi.org/10.1371/journal.pcbi.1004660). Testing the prediction from a hierarchical representation that signals along the VOTC increase in onset time should therefore include an evaluation of evoked potential onsets in addition to broadband signals.

      (4) Testing the second prediction, that the onset time of processing increases along the VOTC posterior to anterior path, is difficult using the iEEG broadband signal, because from a signal processing perspective, broadband signals are inherently temporally inaccurate, given that they are filtered. Any filtering in the signal introduces a certain level of temporal smoothing. The manuscript should clearly describe the level of temporal smoothing for the filter settings used.

      (5) The onsets of neural activity in VOTC are surprisingly early: around 80-100 ms. This is earlier than what has previously been reported. For example, the cited Quian Quiroga et al. (2023) found single neuron responses to have the earlier onset around 125 ms (their Figure 3). Similarly, the cited Jacques et al., 2016b and Kadipasaoglu et al., 2017 papers also observe broadband onsets in VOTC after 100 ms. Understanding the temporal smoothing in the broadband signal, as well as showing that typical evoked potentials have latencies compared to other work, would increase confidence that latencies are not underestimated due to factors in the analysis pipeline.

      (6) Understanding the extent to which neural processing in the VOTC is hierarchical is essential for building models of vision that capture processing in the human brain, and the data provides novel insight into these processes.

      For additional context, a schematic figure of the hierarchical view and a more parallel system described in the paragraph on models of visual recognition (lines 553) would help the reader interpret and understand the implications of the paper.

    3. Reviewer #2 (Public review):

      Summary:

      This very ambitious project addresses one of the core questions in visual processing related to the underlying anatomical and functional architecture. Using a large sample of rare and high-quality EEG recordings in humans, the authors assess whether face-selectivity is organised along a posterior-anterior gradient, with selectivity and timing increasing from posterior to anterior regions. The evidence suggests that it is the case for selectivity, but the data are more mixed about the temporal organisation, which the authors use to conclude that the classic temporal hierarchy described in textbooks might be questioned, at least when it comes to face processing.

      Strengths:

      A huge amount of work went into collecting this highly valuable dataset of rare intracranial EEG recordings in humans. The data alone are valuable, assuming they are shared in an easily accessible and documented format. Currently, the OSF repository linked in the article is empty, so no assessment of the data can be made. The topic is important, and a key question in the field is addressed. The EEG methodology is strong, relying on a well-established and high SNR SSVEP method. The method is particularly well-suited to clinical populations, leading to interpretable data in a few minutes of recordings. The authors have attempted to quantify the data in many different ways and provided various estimates of selectivity and timing, with matching measures of uncertainty. Non-parametric confidence intervals and comparisons are provided. Collectively, the various analyses and rich illustrations provide superficially convincing evidence in favour of the conclusions.

      Weaknesses:

      (1) The work was not pre-registered, and there is no sample size justification, whether for participants or trials/sequences. So a statistical reviewer should assess the sensitivity of the analyses to different approaches.

      (2) Frequentist NHST is used to claim lack of effects, which is inappropriate, see for instance:

      Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350. https://doi.org/10.1007/s10654-016-0149-3

      Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is There a Free Lunch in Inference? Topics in Cognitive Science, 8(3), 520-547. https://doi.org/10.1111/tops.12214

      (3) In the frequentist realm, demonstrating similar effects between groups requires equivalence testing, with bounds (minimum effect sizes of interest) that should be pre-registered:

      Campbell, H., & Gustafson, P. (2024). The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021). Psychological Methods, 29(3), 613-623. https://doi.org/10.1037/met0000507

      Riesthuis, P. (2024). Simulation-Based Power Analyses for the Smallest Effect Size of Interest: A Confidence-Interval Approach for Minimum-Effect and Equivalence Testing. Advances in Methods and Practices in Psychological Science, 7(2), 25152459241240722. https://doi.org/10.1177/25152459241240722

      (4) The lack of consideration for sample sizes, the lack of pre-registration, and the lack of a method to support the null (a cornerstone of this project to demonstrate equivalence onsets between areas), suggest that the work is exploratory. This is a strength: we need rich datasets to explore, test tools and generate new hypotheses. I strongly recommend embracing the exploration philosophy, and removing all inferential statistics: instead, provide even more detailed graphical representations (include onset distributions) and share the data immediately with all the pre-processing and analysis code.

      (5) Even if the work was pre-registered, it would be very difficult to calculate p-values conditional on all the uncertainty around the number of participants, the number of contacts and the number of trials, as they are random variables, and sampling distributions of key inferences should be integrated over these unknown sources of variability. The difficulty of calculating/interpreting p-values that are conditional on so many pre-processing stages and sources of uncertainty is traditionally swept under the rug, but nevertheless well documented:

      Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen, 142, 573-603. https://pubmed.ncbi.nlm.nih.gov/22774788/

      Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779-804. https://doi.org/10.3758/BF03194105<br /> https://link.springer.com/article/10.3758/BF03194105

      (6) Currently, there is no convincing evidence in the article to clearly support the main claims.

      Bootstrap confidence intervals were used to provide measures of uncertainty. However, the bootstrapping did not take the structure of the data into account, collapsing across important dependencies in that nested structure: participants > hemispheres > contacts > conditions > trials.

      Ignoring data dependencies and the uncertainty from trials could lead to a distorted CI. Sampling contacts with replacement is inappropriate because it breaks the structure of the data, mixing degrees of freedom across different levels of analysis. The key rule of the bootstrap is to follow the data acquisition process, and therefore, sampling participants with replacement should come first. In a hierarchical bootstrap, the process can be repeated at nested levels, so that for each resampled participant, then contacts are resampled (if treated as a random variable), then trials/sequences are resampled, keeping paired measurements together (hemispheres, and typically contacts in a standard EEG experiment with fixed montage). The same hierarchical resampling should be applied to all measurements and inferences to capture all sources of variability. Selectivity and timing should be quantified at each contact after resampling of trials/sequences before integrating across hemispheres and participants using appropriate and justified summary measures.

      The authors already recognise part of the problem, as they provide within-participant analyses. This is a very good step, inasmuch as it addresses the issue of mixing-up degrees of freedom across levels, but unfortunately these analyses are plagued with small sample sizes, making claims about the lack of differences even more problematic--classic lack of evidence == evidence of absence fallacy. In addition, there seem to be discrepancies between the mean and CI in some cases: 15 [-20, 20]; 8 [-24, 24].

      (7) Three other issues related to onsets:

      (a) FDR correction typically doesn't allow localisation claims, similarly to cluster inferences:

      Winkler, A. M., Taylor, P. A., Nichols, T. E., & Rorden, C. (2024). False Discovery Rate and Localizing Power (No. arXiv:2401.03554). arXiv. https://doi.org/10.48550/arXiv.2401.03554

      Rousselet, G. A. (2025). Using cluster-based permutation tests to estimate MEG/EEG onsets: How bad is it? European Journal of Neuroscience, 61(1), e16618. https://doi.org/10.1111/ejn.16618

      (b) Percentile bootstrap confidence intervals are inaccurate when applied to means. Alternatively, use a bootstrap-t method, or use the pb in conjunction with a robust measure of central tendency, such as a trimmed mean.

      Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2021). The Percentile Bootstrap: A Primer With Step-by-Step Instructions in R. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920911881. https://doi.org/10.1177/2515245920911881

      (c) Defining onsets based on an arbitrary "at least 30 ms" rule is not recommended:

      Piai, V., Dahlslätt, K., & Maris, E. (2015). Statistically comparing EEG/MEG waveforms through successive significant univariate tests: How bad can it be? Psychophysiology, 52(3), 440-443. https://doi.org/10.1111/psyp.12335

      (8) Figure 5 and matching analyses: There are much better tools than correlations to estimate connectivity and directionality. See for instance:

      Ince, R. A. A., Giordano, B. L., Kayser, C., Rousselet, G. A., Gross, J., & Schyns, P. G. (2017). A statistical framework for neuroimaging data analysis based on mutual information estimated via a Gaussian copula. Human Brain Mapping, 38(3), 1541-1573. https://doi.org/10.1002/hbm.23471

      (9) Pearson correlation is sensitive to other features of the data than an association, and is maximally sensitive to linear associations. Interpretation is difficult without seeing matching scatterplots and getting confirmation from alternative robust methods.

    1. eLife Assessment

      This valuable study provides a practical computational framework for inferring latent neural states directly from calcium fluorescence recordings, bypassing the traditional need for a separate spike deconvolution step. The evidence supporting the method is solid, featuring rigorous validation across multiple latent variable model families (including HMM, GPFA, and LFADS) using both simulated and experimental data. However, the assessment of the method's generality would be further strengthened by application to a broader range of experimental datasets, such as recordings from different brain regions or using different calcium indicators.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors elegantly combined latent variable models (i.e., HMM, GPFA and dynamical system models) with a calcium imaging observation model (i.e., latent Poisson spiking and autoregressive calcium dynamics (AR)).

      Strengths:

      Integrating a calcium observation model into existing latent variable models improves significantly the inference of latent neural states compared to existing approaches such as spike deconvolution or Gaussian assumptions.<br /> The authors also provide an open-source access to their method for direct application to calcium imaging data analysis.

      Weaknesses:

      As acknowledged by the authors, their method is dependent on the quality of calcium trace extraction from fluorescence videos. It should be noted that this limitation applies to alternative strategies.

      While the contribution of this study should prove useful for researchers using calcium imaging, the novelty is limited, as it consists of an integration of the calcium imaging model from Ganmor et al. 2016 with existing LVM frameworks.

    3. Reviewer #2 (Public review):

      Summary:

      This compelling study proposes a framework to implement latent variable models using population level calcium imaging data. The study incorporates autoregressive dynamics and latent Poisson spiking to improve inference of latent states across different model classes including HMMs, Gaussian Process Factor Analysis and nonlinear dynamical systems models. This approach allows for a more seamless integration of existing methods typically used with spiking data to apply on calcium imaging data. The authors test the model on piriform cortex recordings as well as a biophysical simulator to validate their methods. This approach promises to have wide usability for neuroscientists using large population level calcium imaging.

      Strengths:

      The strengths of this study are the flexibility in the choice of models and relatively easy adaptation to user-specific use cases.

      Weaknesses:

      The weakness of the study lies in its limited validation of biological calcium imaging data. Calcium dynamics in a task-specific context in a sensory brain region might be very different from slower dynamics in a region of integration. The biophysical properties of the data would also be dependent on the SNR of the imaging platform and the generation of calcium indicator being used.

    4. Reviewer #3 (Public review):

      Summary:

      S. Keeley & collaborators propose a computational approach to infer time-varying latent variables directly from calcium traces (for instance, obtained with 2p imaging) without the need for deconvolving the traces into spike trains in a preliminary, independent step. Their approach rests on 1 of 3 families of latent models: GPFA, HMM and dynamical systems - which they augment with an observation model that maps latent variables to fluorescence traces. They validate their approach on simulated and real data, showing that the approach improves latent variable inference and model fitting, compared to more traditional approaches (although not directly compared with the 2-step one; see below). They provide a GitHub repository with code to fit their models (which I have not tested).

      Strengths:

      The approach is sound and well-motivated. The authors are specialists in latent variable models. The manuscript is succinct, well-written, and the figures are clear. I particularly liked the diversity of latent models considered, in particular latent models with continuous (GPFA) vs. discrete (HMM) dynamics, which are useful for characterizing different types of neural computations. The validation on both simulated and real data is convincing.

      Weaknesses:

      The main weakness that I see is that the approach is tested only on a single real dataset (odor response dataset). The other model fits are obtained from simulated data. While the results are convincing, it would be useful to see the approach tested on other datasets, for instance, datasets with different brain areas, different behavioral conditions, or different calcium indicators. This would help assess the generality of the approach and its robustness to different experimental conditions.

      The other points below mostly pertain to clarifications and possible extensions of the approach, and to simple model recovery experiments that would help quantify the advantage of the proposed approach over more traditional ones.

      I have a question related to interpretability and diagnosis of model fits. One advantage of the two-step approach: (1) deconvolution => (2) latent variance inference, is that one can inspect the quality of the deconvolution step independently from the latent variable inference step. In the proposed approach, it seems more difficult to diagnose potential problems with model fitting. For instance, if the inferred latent variables are not interpretable, how can one determine whether this is due to a poor choice of latent model (e.g., HMM with too few states), or a poor fit of the observation model (e.g., wrong parameters for the calcium dynamics)? Are there any diagnostic tools that could help identify potential problems with model fitting?

      Could the authors comment on whether their approach allows for instance to compare different forms of latent models (e.g., HMM vs. GPFA) in terms of model evidence, cross-validated log-likelihood or other model comparison metrics? This would be useful to quantitatively determine which type of latent dynamics is more appropriate for a given dataset.

      The HMM part reveals a pretty large number of states, with one state being interpretable (evoked response). Shouldn't we expect a simpler scenario, with 2 states? I know this is a difficult question that is more general and common with HMM approaches, but it would be useful to discuss this point. For instance, would a hierarchical HMM (with a smaller number of "super-states") be more appropriate here?

      While it certainly makes sense that models accounting for the full transformation of latent => spikes => fluorescence data should outperform the two-step (1) deconvolution => (2) latent variance inference approach, the amount of improvement is not clear. A direct comparison (e.g., w/ parameter & model recovery metrics) between the two approaches on simulated data would be useful to quantify the advantage of the proposed approach over more traditional ones.

      It would be useful to discuss the possible extension of the approach to other types of data that are related to neural activity but have different observation models, e.g., voltage imaging, or neuromodulator sensors (e.g., GRAB-NE, dLight, etc). Do the authors see any specific challenges that would arise in these cases and that would need to be addressed in the future (other than changing the Poisson spiking part)?

    1. eLife Assessment

      Insects can act as vectors of plant diseases, hence the study of insect-pathogen interactions is relevant for agriculture. This important study identifies in Diaphorina citri a dopamine receptor responsive to 'Candidatus Liberibacter asiaticus' infection, demonstrate direct regulation of this receptor by a microRNA, and integrate dopamine signaling into an established insect reproductive hormone framework. Multiple complementary experimental approaches convincingly support the findings, but key conclusions rely on correlative data and the mechanistic evidence for the proposed linear signaling cascade is incomplete. This work will be of interest for insect physiology and vector-pathogen biology, and more broadly for citrus agriculture.

    2. Reviewer #1 (Public review):

      I read this paper with great interest based on my experience in insect sciences. I have some minor comments (and recommendations) that I believe the authors should address.

      (1) The paper has an original biological question that is overly broad and mechanistically ambitious. The central biological question, namely how CLas infection enhances fecundity of Diaphorina citri via dopamine signaling, is clearly stated and well motivated by previous literature. However, my advice to the authors is that, while the general question is clear, the manuscript attempts to answer multiple mechanistic layers simultaneously. As a result, I feel that the biological narrative becomes diffuse, especially in later sections where DA, miRNA regulation, AKH signaling, and JH signaling are all proposed as parts of a single linear cascade. In summary, my key concern is that the paper often moves from correlation to causal hierarchy without fully disentangling whether these pathways act sequentially, in parallel, or redundantly. A more explicitly framed primary hypothesis (e.g., "DA-DcDop2 is necessary and sufficient for CLas-induced fecundity") may improve conceptual clarity.

      (2) On the novelty of the data, I feel they are moderately novel, with substantial confirmatory components. If I am correct, the novel contributions include the identification of DcDop2 as the DA receptor responsive to CLas infection in D. citri, the discovery that miR-31a directly targets DcDop2, which is supported by luciferase assays and RIP, and thirdly, the integration of dopamine signaling into the already-described CLas-AKH-JH-fecundity framework. My advice to the authors is to focus more on the manuscript's novelty, which lies more in pathway integration than in discovering fundamentally new biological phenomena. This is appropriate for a mechanistic paper, but should be framed as an extension of existing models rather than a paradigm shift.

      (3) On the conclusions, I recommend that the authors modify their statements a little. I feel that there are some overstated or insufficiently supported claims. For instance, the assertion that CLas "hijacks" the DA-DcDop2-miR-31a-AKH-JH cascade implies direct pathogen manipulation, but no CLas-derived effector or mechanism is identified. Also that the model suggests a linear signaling hierarchy, but the data largely show correlation and partial dependency rather than strict epistasis. In third, the term "mutualistic interaction" may be too strong, as host fitness costs outside fecundity (e.g., longevity, immunity) are not evaluated. In conclusion, I confirm that the data support a functional association, but mechanistic causality and evolutionary interpretation are somewhat overstated.

    3. Reviewer #2 (Public review):

      Summary:

      Nian and colleagues comprehensively apply metabolomics, molecular, and genetic approaches to demonstrate that CLas hijacks the DA/DcDop2-miR-31a-AKH-JH signaling cascade to enhance lipid metabolism and fecundity in D. citri, while concurrently promoting its own replication.

      Strengths:

      These findings provide solid evidence of a mutualistic interaction between CLas proliferation and ovarian development in the insect host. This insight significantly advances our understanding of the molecular interplay between plant pathogens and vector insects, and offers novel targets and strategies for HLB field management.

      Weaknesses:

      While the article investigates the involvement of dopamine signaling and specific microRNAs in enhancing fecundity and pathogen proliferation, it still needs to provide a detailed mechanistic understanding of these interactions. The precise molecular pathways and feedback mechanisms by which CLas manipulates dopamine signaling in Diaphorina citri remain unclear.

    1. eLife Assessment

      The authors address a hard question and propose a pipeline for using Large Language Models to reconstruct signalling networks as well as to benchmark future models. The findings are valuable for a defined subfield, as the proposed framework allows for assessing such approaches systematically. The overall support is solid, although the present evaluation remains limited in scope and would benefit from a wider range of networks and performance metrics.

    2. Reviewer #1 (Public review):

      Summary:

      Large language models (LLMs) have been developed rapidly in recent years and are already contributing to progress across scientific fields. The manuscript tries to address a specific question: whether LLMs can accurately infer signaling networks from gene lists. However, the evaluation is inadequate due to four major weaknesses described below. Despite these limitations, the authors conclude that current general-purpose LLMs lack adequate accuracy, which is already widely recognized. Its key contribution should instead be to provide concrete recommendations for the development of specialized LLMs for this task, which is completely absent. Developing such specific LLMs would be highly valuable, as they could substantially reduce the time required by researchers to analyze signaling networks.

      Strengths:

      The manuscript raises a good question: whether current LLMs can accurately generate signaling networks from gene lists.

      Weaknesses:

      (1) The authors evaluate LLM performance using only three signaling networks: "hypertrophy", "fibroblast", and "mechanosignaling". Given the large number of well-established signaling pathways available, this is not a comprehensive assessment. Moreover, the analysis need not be restricted to signaling networks. Other network types, including metabolic and transcriptional regulatory networks, are already accessible in well-known databases such as KEGG, Reactome, BioCyc, WikiPathways, and Pathway Commons. Including these additional networks would substantially strengthen the evaluation.

      (2) In LLM evaluation, the authors use the gene lists that exactly match those in their "ground truth" networks, thereby fixing the set of nodes and evaluating only the predicted edges. However, in practical research, the relevant genes or nodes are not fully known. A more realistic assessment would therefore include gene lists with both genes present in the ground-truth network and additional genes absent from it, to evaluate the ability of the LLM to exclude irrelevant genes.

      (3) The authors report only the recall/sensitivity of the LLM, without assessing specificity. In practical applications, if an LLM generates a large number of incorrect interactions that greatly exceed the correct ones, researchers may be misled or may lose confidence in the LLM output. Therefore, a comprehensive evaluation must include both sensitivity and specificity. Furthermore, it would be informative to check whether some of the "false positives" might in fact represent biologically plausible interactions that are absent from the manually curated "ground truth". Manually generated "ground truth" can overlook genuine interactions, and the ability of LLMs to recover such missing edges could be particularly valuable. This may even represent one of the most important potential contributions of LLMs.

      (4) It is widely known that applying differential equation models to highly complex biological networks, such as the three networks in the manuscript, is meaningless, because these systems involve a large number of parameters whose values can drastically alter the results. As Richard Feynman once said: "with four parameters I can fit an elephant, and with five I can make him wiggle his trunk." Thus, the evaluation of LLMs on "logic-based differential equation models" does not make much sense.

    3. Reviewer #2 (Public review):

      Summary:

      The authors evaluate whether commonly used LLMs (ChatGPT, Claude and Gemini) can reconstruct signalling networks and predict effects of network perturbations, and propose a pipeline for benchmarking future models. Across three phenotypes (hypertrophy, fibroblast signalling, and mechanosignalling), LLMs capture upstream ligand-receptor interactions and conserved crosstalk but fail to recover downstream transcriptional programmes. Logic-based simulations show that LLM-derived networks underperform compared to manually curated models. The authors also propose that their pipeline can be used for benchmarking future models aimed at reconstructing signalling networks.

      Strength:

      The authors compare the outcomes from three LLMs with three manually curated and validated models. Additionally, they have investigated gene network reconstruction in the context of three distinct phenotypes. Using logic-based modelling, the authors assessed how LLM-derived networks predict perturbation effects, providing functional validation beyond network overlap.

      Weaknesses:

      The authors have used legacy models for all three LLMs, and the study would benefit from testing the current versions of the LLMs (ChatGPT 5.2, Claude 4.5 and Gemini 2.5). Additional metrics such as node coverage, node invention, direction accuracy and sign accuracy would be useful to make robust comparisons across models.

    1. eLife Assessment

      Important findings from this study include clear evidence of the impact of methylphenidate on cognitive control over Pavlovian biasing of actions and decision-making in humans, in a manner dependent on baseline working memory capacity. The design used drug dosing after learning, allowing a compelling test of the influence of catecholamines on decision processes independent of learning. The task is very well designed, using a combination of aversive and appetitive Pavlovian to Instrumental Transfer, and the in-depth behavioural analysis extends the authors' previous work, which will be of interest to those working in cognitive psychopharmacology. The results challenge the view that catecholamines operate by modulating behavioural invigoration alone.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian-to-instrumental transfer (PIT) task to parse decision making from instrumental influences. While the main pharmacological effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociates this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      Comments on revisions:

      I have no further recommendations or concerns.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses showed no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner, depending on individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength a this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows the authors to precisely investigate the differential modulation of value-based decision making, depending on the context and environmental stimuli. Importantly, MPH was only administered after Pavlovian and instrumental learning, restricting the effect to PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between the PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      Weaknesses:

      Previous weaknesses regarding the neurobiological circuits underlying such effects and the possible role of dopamine vs noradrenaline have been clearly discussed in the new version of the manuscript.

      Comments on revisions:

      The authors answered my previous points. The changes to the manuscript clearly improve the clarity of the results and the strength of the study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in actionspecific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placeboontrolled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Thus, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal basis of the effects. In the revision, as per our reply to reviewer 1, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, we brought forward parts of the Discussion that clarify the originality of the current experiment to the introduction (page 4/5) and result section (page 8).

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We now clarified that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision now also discusses noradrenaline in light of our frontal control hypothesis and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine (Discussion, page 12).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and have incorporated these in our revision.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results: Control analyses, page 28.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [<sup>18</sup>F]-FDOPA PET imaging, is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [<sup>18</sup>F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

      See Supplemental methods 2: Working memory and impulsivity assessment, page 26.

      ** Recommendations for the authors:**

      Reviewer #1 (Recommendations for the authors):

      (1) Theoretical clarity. Some aspects of the paper are ideally clear: Figure 1 clearly explains the paradigm. The general take-home message is clearly described in the last line of the abstract, the last line of the introduction, the first line of the discussion, and throughout other places in the discussion. Yet the authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      The discussion includes many possible theoretical interpretations of the findings, which is laudable, but many readers may get lost in this multitude (particularly anyone who isn't an RL/DA aficionado). The group's prior work (i.e. striatal hypothesis) is first described, followed by a rather complex breakdown of valenceaction tendencies, then the seemingly preferred explanation for the current study (i.e. cognitive control hypothesis) is advanced as "an alternative account ...". This is followed by a third, more complex idea (i.e. cortico-striatal balance hypothesis), then the paper ends. A reader may be forgiven for skimming through this discussion and not having a clear idea of how to frame these effects. I think some subheaders would help, as well as clearer labeling of the theoretical interpretations in line with a more authoritative description of the author's preferred interpretation of the empirical effects.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) All statistical effects are presented as c^2 with no df. The methods only describe LMER and make no mention of what the c^2 measure represents.

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Recommendations for the authors):

      Few minor points:

      Figure 2A is not cited in the text I think

      Checked and changed.

      Figure 2C: "C" is not present in the figure. Also I could not see the data corresponding at MPH-Approach context in Neutral Pavlovian condition but I think it is probably masked by another curve.

      Checked and changed. Indeed, the one curve is masked by the other curve.

      As I stated in the public review, a clarification or more detailed analysis of working memory performance depending on if it was measured under MPH or placebo could be a plus.

      Changed this (see public review reply).

      I did not see any statement about the availability of data but I may have missed it.

      Yes, the statement can be found:

      Methods, page 13: Data and code for the study are freely available at https://data.ru.nl/collections/di/dccn/DSC_3017031.02_734.

      Reviewer #3 (Recommendations for the authors):

      The authors should check that inclusion of impulsivity in the logistic mixed model is justified and if it is justified make sure that multicollinearity is not problematic.

      See answer to public review for convenience reiterated below:

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results Control analyses, page 28.

      I would recommend that the authors make clear that the effects of methylphenidate are dependent on working memory capacity in the first sentence of the fore last paragraph of the introduction on page 4.

      Changed this accordingly, see Introduction, page 5.

      I would make sure that the text in the figures is readable without needing to enlarge the figures. I would also highlight the significant effects in the figures.

      We changed the font size accordingly and added significance statements to the caption, because depicting the significance of a four-way interaction including one continuous variable is not straightforward.

      The distributions of p(Go) by conditions such as in figure 1D or 2A are very intuitive. Figure 2B is very informative as it shows the continuous effects of working memory capacity on the PIT effect. I would add (in figure 2 or in the supplement) a plot of the p(Go) with a tertile split based on working memory. Considering that the correspondent analysis is being reported, having the plot would strengthen and simplify the understanding of the results.

      The continuous effects of working memory are based on WM values on the listening span ranging from 2.5-7, in steps of 0.5, resulting in 10 different values. A tertile split would result in binning these into two bins of three values, and one bin of four values. Given that all of the datapoints for this tertile split are already presented in the current figures, we strongly prefer not to include this additional figure.

      I would add some sentences in the results section (and maybe in the discussion if needed) addressing the results that the effect of Valence by drug by WM span is only significant in the withdrawal context but not in the approach context.

      We now added an emphasis on the specifically significant drug effects in withdrawal in the Results section, page 8.

    1. eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The simulations are well performed and clearly presented in the context of existing experimental datasets. This compelling study will be of interest to those studying gene expression in the context of chromatin.

    2. Reviewer #1 (Public review):

      This manuscript discusses from a theory point of view he mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription.

      Comments on revisions: It's a good paper.

    3. Reviewer #2 (Public review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript "Cluster size determines morphology of transcription factories in human cells".

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      Comments on revisions:

      The authors have addressed my comments with exemplary diligence, which has clarified all my major concerns. In all cases, either the relevant work was added, or it was explained in the form of a convincing argument why the suggested modifications were not implemented or not possible to implement.

      As a discretionary suggestion, the authors might consider using a title that even more directly highlights the, in my opinion, main take-away of this work. This is not because anything is incorrect about the current title, simply an even more to-the-point title might attract more readers. I would suggest something along the lines of

      "Cluster size-dependent demixing drives specialization of transcription factories"

      Overall, I congratulate the authors on their excellent work and appreciate the opportunity to engage with this manuscript during a very insightful review process.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) "gene expression" patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths:

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The well-performed and clearly presented simulations will be of interest to those studying gene expression in the context of chromatin. While the study is generally solid, it could benefit from a more direct comparison with existing experimental data sets as well as further discussion of the limits of the underlying model assumptions.

      We thank the editors for their overall positive assessment. In response to the Referees’ comments, we have addressed all technical points, including a more detailed explanation of the methodology used to extract gene transcription from our simulations and its analogy with real gene transcription. Regarding the potential comparison with experimental data and our mixing–demixing transition, we have added new sections discussing the current state of the art in relevant experiments. We also clarify the present limitations that prevent direct comparisons, which we hope can be overcome with future experiments using the emerging techniques.

      Reviewer #1 (Public Review):

      This manuscript discusses from a theory point of view the mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription. I would only advise the authors to polish and shorten their text to better highlight their key findings and make it more accessible to the reader.

      We thank the Referee for carefully reading our manuscript and recognizing its scientific value. As suggested, we tried to better highlight our key findings and make the text more accessible while addressing also the comments from the other Referees.

      Reviewer #2 (Public Review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript ”Cluster size determines morphology of transcription factories in human cells”.

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      We thank the Reviewer for their positive assessment of the soundness of our work and its contribution to the field. We have added a paragraph to the Conclusions highlighting the current state of experimental techniques and outlining near-term experiments that could be extended to test our predictions. We also emphasise that our analysis builds on state-of-the-art polymer models of chromatin and on quantitative experimental datasets, which we used both to build the model construction and to validate its outcomes (gene activity). We hope this strengthened link to experiment will catalyse further studies in the field.

      Major points:

      (1) My first point concerns terminology.The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      We agree with the Referee that the title could be misleading. In our study we characterized clusters size, that is a morphological descriptor, and cluster composition that isn’t morphology per se but used in the community in a broader sense. Nevertheless to strength the message we have changed the title in: “Cluster size determines internal structure of transcription factories in human cells”

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequencespecific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a ”small” transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      This is a good point. However in our case we can either look at clustering transcription factors or transcription units. In an experimental situation, transcription units could be “coloured”, or assigned different types, by looking at different cell types, so that they can be classified as housekeeping, or cell-type independent, or cell-type specific. This is similar to how DHS can be clustered. In this way the mixing or demixing state can be identified by looking at the type of transcription unit, removing any ambiguity due to the fact that the same protein may participate in different TF complexes..

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point (iii) are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units’ activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo’s group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors’ predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors’ key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      - RNA promotes the formation of spatial compartments in the nucleus https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      - Complex multi-enhancer contacts captured by genome architecture mapping https://www.nature.com/articles/nature21411

      - Cell-type specialization is encoded by specific chromatin topologies https://www.nature.com/articles/s41586-021-04081-2

      - Super-enhancer interactomes from single cells link clustering and transcription https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mirx have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      We appreciate the Reviewer’s insightful suggestion that a comparison between our simulation results and experimental data would strengthen the robustness of our model. In response, we have thoroughly revised the literature on multi-way chromatin contacts, with particular attention to SPRITE and GAM techniques. However, we found that the currently available experimental datasets lack sufficient statistical power to provide a definitive test of our simulation predictions, as detailed below.

      As noted by the Reviewer, SPRITE experiments offer valuable information on the composition of highorder chromatin clusters (k-mers) that involve multiple genomic loci. A closer examination of the SPRITE data (e.g., Supplementary Material from Ref. [1]) reveals that the majority of reported statistics correspond to 3-mers (three-way contacts), while data on larger clusters (e.g., 8-mers, 9-mers, or greater) are sparse. This limitation hinders our ability to test the demixing-mixing transition predicted in our simulations, which occurs for cluster sizes exceeding 10.

      Moreover, the composition of the k-mers identified by SPRITE predominantly involves genomic regions encoding functional RNAs—such as ITS1 and ITS2 (involved in rRNA synthesis) and U3 (encoding small nucleolar RNA)—which largely correspond to housekeeping genes. Conversely, there is little to no data available for protein-coding genes. This restricts direct comparison to our simulations, where the demixing-mixing transition depends critically on the interplay between housekeeping and tissue-specific genes.

      Similarly, while GAM experiments are capable of detecting multi-way chromatin contacts, the currently available datasets primarily report three-way interactions [2,3].

      In summary, due to the limited statistical data on higher-order chromatin clusters [4], a quantitative comparison between our simulation results and experimental observations is not currently feasible. Nevertheless, we have now briefly discussed the experimental techniques for detecting multi-way interactions in the revised manuscript to reflect the current state of the field, mentioning most of the references that the Reviewer suggested.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and ”stressed cells”. This distinction was initially established by Ibrahim Cisse’s lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      - Mediator and RNA polymerase II clusters associate in transcription-dependent condensates https://www.science.org/doi/10.1126/science.aar4199

      - Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering https://www.science.org/doi/10.1126/sciadv.aay6515

      - RNA polymerase II clusters form in line with surface condensation on regulatory chromatin https://www.embopress.org/doi/full/10.15252/msb.202110272

      - If ”morphology” should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper: Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin https://www.science.org/doi/10.1126/science.ade5308

      We thank the Reviewer for pointing out the discussion about small and large clusters observed in stressed cells. Our study aims to provide a broader mechanistic explanation on the formation of TF mixed and demixed clusters depending on their size. However, to avoid to generate confusion between our terminology and the classification that is already used for transcription factories in stem and stressed cells, we have now added some comments and references in the revised text.

      (5) The statement scripts are available upon request is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

      We fully agree with the Reviewer. We have now included in the main text a link to an external repository containing all the codes required to reproduce and analyze the simulations.

      Recommendations for the authors:

      Minor and technical points

      (6) Red, green, and yellow (mix of green and red) is a particularly bad choice of color code, seeing that red-green blindness is the most common color blindness. I recommend to change the color code.

      We appreciate the Reviewer’s thoughtful comment regarding color accessibility. We fully agree that red–green combinations can pose challenges for color-blind readers. In our figures, however, we chose the red–green–yellow color scheme deliberately because it provides strong contrast and intuitive representation for different TF/TU types. To ensure accessibility, we optimized brightness and saturation within red-green schemes and we carefully verified that the chosen hues are distinguishable under the most common forms of color vision deficiency, i.e. trichromatic color blindness, using color-blindness simulation tools (e.g., Coblis).

      How is the dispersing effect of transcriptional activation and ongoing transcription accounted for or expected to affect the model outcome? This affects both transcriptional clusters (they tend to disintegrate upon transcriptional activation) as well as the large scale organization, where dispersal by transcription is also known.

      We thank the Reviewer for this very insightful question. The current versions of both our toy model and the more complex HiP-HoP model do not incorporate the effects of RNA Polymerase elongation. Our primary goal was to develop a minimalisitc framework that focuses on investigating TF clusters formation and their composition. Nevertheless, we find that this straightforward approach provides a good agreement between simulations and Hi-C and GRO-seq experiments, lending confidence to the reliability of our results concerning TF cluster composition.

      We fully agree, however, that the effects of transcription elongation are an interesting topic for further exploration. For example, modeling RNA Polymerases as active motors that continually drive the system out of equilibrium could influence the chromatin polymer conformation and the structure of TF clusters. Additionally, investigating how interactions between RNA molecules and nuclear proteins, such as SAF-A, might lead to significant changes in 3D chromatin organization and, consequently, transcription [5], is also in intriguing prospect. Although we do not believe that the main findings of our study, particularly regarding cluster composition and mixed-demixed transition, would be impacted by transcription elongation effects, we recognize the importance of this aspect. As such, we have now included some comments in the Conclusions section of the revised manuscript.

      “and make the reasonable assumption that a TU bead is transcribed if it lies within 2.25 diameters (2.25σ) of a complex of the same colour; then, the transcriptional activity of each TU is given by the fraction of time that the TU and a TF:pol lie close together.” How is that justified? I do not see how this is reasonable or not, if you make that statement you must back it up.

      As pointed out by the Referee, we consider a TU to be active if at least one TF is within a distance 2.25σ from that TU. This threshold is a slightly larger than the TU-TF interaction cutoff distance, r<sub>c</sub> \= 1.8σ between TFs and TUs. The rationale for this choice is to ensure that, in the presence of a TU cluster surrounded by TFs, TUs that are not directly in contact with a TF are still considered active. Nonetheless, we find that using slightly different thresholds, such as 1.8σ or 1.1σ, leads to comparable results, as shown in Fig. S11, demonstrating the robustness of our analysis.

      Clearly, close proximity in 1D genomic space favours formation of similarly-coloured clusters. This is not surprising, it is what you built the model to do. Should not be presented as a new insight, but rather as a check that the model does what is expected.

      We believed that this sentence already conveyed that the formation of single-color clusters driven by 1D genomic proximity is not a surprising outcome. However, we have now slightly rephrased it to better emphasize that this is not a novel insight.

      That said, we would like to highlight that while 1D genomic proximity facilitates the formation of clusters of the same color, the unmixed-to-mixed transition in cluster composition is not easily predictable solely from the TU color pattern. Furthermore, in simulations of real chromosomes, where TU patterns are dictated by epigenetic marks, the complexity of these patterns makes it challenging—if not impossible—to predict cluster composition based solely on the input data of our model.

      “…how closely transcriptional activities of different TUs correlate…” Please briefly state over what variable the correlation is carried out, is it cross correlation of transcription activity time courses over time? Would be nice to state here directly in the main text to make it easier for the reader.

      We have now included a brief description in the revised manuscript explaining how the transcriptional correlations were evaluated and how the correlation matrix was constructed.

      “The second concerns how expression quantitative trait loci (eQTLs) work. Current models see them doing so post-transcriptionally in highly-convoluted ways [11, 55], but we have argued that any TU can act as an eQTL directly at the transcriptional level [11].” This text does not actually explain what eQTLs do. I think it should, in concise words.

      We agree with the Referee’s suggestion. We have revised the sentence accordingly and now provide a clear explanation of eQTLs upon their first mention. The revised paragraph now reads as follows:

      “The second concerns how expression quantitative trait loci (eQTLs)—genomic regions that are statistically associated with variation in gene expression levels—function. While current models often attribute their effects to post-transcriptional regulation through complex mechanisms [6,7], we have previously argued that any transcriptional unit (TU) can act as an eQTL by directly influencing gene expression at the transcriptional level [7]. Here, we observe individual TUs up-regulating or down-regulating the activity of others TUs – hallmark behaviors of eQTLs that can give rise to genetic effects such as “transgressive segregation” [8]. This phenomenon refers to cases in which alleles exhibit significantly higher or lower expression of a target gene, and can be, for instance, caused by the creation of a non-parental allele with a specific combination of QTLs with opposing effects on the target gene.”

      “In the string with 4 mutations, a yellow cluster is never seen; instead, different red clusters appear and disappear (Fig. 2Eii)…” How should it be seen? You mutated away most of the yellow beads. I think the kymograph is more informative about the general model dynamics, not the effects of mutations. Might be more appropriate to place a kymograph in Figure 1.

      We agree with the Referee that the kymograph is the most appropriate graphical representation for capturing the effects of mutations. Panel 2E already refers to the standard case shown in Figure 1. We have now clarified this both in the caption and in the main text. In addition, we have rephrased the sentence—which was indeed misleading—as follows:

      “From the activity profiles in Fig. 2C, we can observe that as the number of mutations increases, the yellow cluster is replaced by a red cluster, with the remaining yellow TUs in the region being expelled (Fig. 2B(ii)). This behavior is reflected in the dynamics, as seen by comparing panels E(i) and E(ii): in the string with four mutations, transcription of the yellow TUs is inhibited in the affected region, while prominent red stripes—corresponding to active, transcribing clusters—emerge (Fig. 2E(ii)).” We hope that the comparison is now immediately clear to the reader.

      “…but this block fragments in the string with 4 mutations…” I don’t know or cannot see what is meant by ”fragmentation” in the correlation matrix.

      With the sentence “this block fragments in the string with 4 mutations” we mean that the majority of the solid red pixels within the black box become light-red or white once the mutations are applied. We have now added a clarification of this point in the revised manuscript.

      “Fig. 3D shows the difference in correlation between the case with reduced yellow TFs and the case displayed in Fig. 1E.” Can you just place two halves of the different matrices to be compared into the same panel? Similar to Fig. S5. Will be much easier to compare.

      We thank the Referee for this suggestion. We tried to implement this modification, and report the modified figure below (Author response image 1). As we can see, in the new figure it is difficult to spot the details we refer to in the main text, therefore we prefer to keep the original version of the figure.

      Author response image 1.

      Heatmap comparing activity correlations of TUs in the random string under normal conditions (top half) and with reduced yellow-TF concentration (bottom half).

      What is the omnigenic model? It is not introduced.

      We thank the Reviewer for highlighting this important point. The omnigenic model, first introduced by Boyle et al in Ref. [6], was proposed to explain how complex traits, including disease risk, are influenced by a vast number of genes. Accordingly to this model, the genetic basis of a trait is not limited to a small set of core genes whose expression is directly related to the trait, but also includes peripheral genes. The latter, although not directly involved in controlling the trait, can influence the expression of core genes through gene regulatory networks, thereby contributing to the overall genetic influence on the trait. We have now added a few lines in the revised manuscript to explain this point.

      “Additionally, blue off-diagonal blocks indicate repeating negative correlations that reflect the period of the 6-pattern.” How does that look in a kymograph? Does this mean the 6 clusters of same color steal the TFs from the other clusters when they form?

      The intuition of the Referee is indeed correct. The finite number of TFs leads to competition among TUs of the same colour, resulting in anticorrelation:when a group of six nearby TUs of a given colour is active, other, more distant TUs of the same colour are not transcribing due to the lack of available TFs. As the Referee suggested,this phenomenon is visible in the kymograph showing TU activity. In Author response image 2, it can be observed that typically there is a single TU cluster for each of the three colours (yellow, green, and red). These clusters can be long-lived (e.g., the yellow cluster at the center of the kymograph) or may destroy during the simulation (e.g., the red cluster at the top of the kymograph, which dissolves at t ∼ 600 × 10<sup>5</sup> τ<sub>B</sub>). In the latter case, TFs of the corresponding colour are released into the system and can bind to a different location, forming a new cluster (as seen with the red cluster forming at the bottom of the kymograph for t > 600 × 10<sup>5</sup> τ<sub>B</sub>). This point is further discussed at the point 2.30 of this Reply where additional graphical material is provided.

      Author response image 2.

      Kymograph showing the TU activity during a typical run in the 6-pattern case. Each row reports the transcriptional state of a TU during one simulation. Black pixels correspond to inactive TUs, red (yellow, green) pixels correspond to active red (yellow, green) TUs.

      “Conversely, negative correlations connect distant TUs, as found in the single-color model…” But at the most distal range, the negative correlation is lost again! Why leave this out? Your correlation curves show the same , equilibration towards no correlation at very long ranges.

      As highlighted in Figure 5Ai, long-range negative correlations (grey segments) predominantly connect distant TUs of the same colour. This is quantified in Figure 5Bi: restricting to same-colour TUs shows that at large genomic separations the correlation is almost entirely negative, with small fluctuations at distances just below 3000 kbp where sampling is sparse; we therefore avoid further interpretation of this regime.

      “These results illustrate how the sequence of TUs on a string can strikingly affect formation of mixed clusters; they also provide an explanation of why activities of human TUs within genomic regions of hundreds of kbp are positively correlated [60].” This is a very nice insight.

      We thank the Reviewer for the very supportive comment.

      “To quantify the extent to which TFs of different colours share clusters, we introduce a demixing coefficient, θ<sub>dem</sub> (defined in Fig. 1).” This is not defined in Fig. 1 or anywhere else here in the main text.

      We thank the Referee for pointing this out. For a given cluster, the demixing coefficient is defined as

      where n is the number of colors, i indexes each color present in the model, and x<sub>i,max</sub> the largest fraction of TFs of the same i-th color in a single TF cluster.

      The demixing coefficient is defined in the Methods section; therefore, we have replaced defined in Fig. 1 with see Methods for definition.

      “Mixing is facilitated by the presence of weakly-binding beads, as replacing them with non-interacting ones increases demixing and reduces long-range negative correlations (Figure S3). Therefore, the sequence of strong and weak binding sites along strings determines the degree of mixing, and the types of small-world network that emerge. If eQTLs also act transcriptionally in the way we suggest [11], we predict that down-regulating eQTLs will lie further away from their targets than up-regulating ones.” Going into these side topics and minke points here is super distracting and waters down the message. Maybe first deal with the main conclusions on mixed vs demixed clusters in dependence on the strong and specific binding site patterns, before dealing with other additional points like the role of weak binding sites.

      Thank you for the suggestion. We now changed the paragraph to highlight the main results. The new paragraph is as follows. “These results on activity correlation and TF cluster composition suggest that, if eQTLs act transcriptionally as expected [7], down-regulating eQTLs are likely to be located further from their target genes than up-regulating ones. In addition, it is important to note that mixing is promoted by the presence of weakly binding beads; replacing these with non-interacting ones leads to increased demixing and a reduction in long-range negative correlations (Figure S3). More generally, our findings indicate that the presence of multiple TF colors offers an effective mechanism to enrich and fine-tune transcriptional regulation.”

      “…provides a powerful pathway to enrich and modulate transcriptional regulation.” Before going into the possible meaning and implications of the results, please discuss the results themselves first.

      See previous point.

      Figure 5B. Does activation typically coincide with spatial compaction of the binding sites into a small space or within the confines of a condensate? My guess would be that colocalization of the other color in a small space is what leads to the mixing effect?

      As the Reviewer correctly noted, the activity of a given TU is indeed influenced by the presence of nearby TUs of the same color, since their proximity facilitates the recruitment of additional TFs and enhances the overall transcriptional activity. In this context, the mixing effect is certainly affected by the 1D arrangement of TUs along the chromatin fiber. As emphasized in the revised manuscript, when domains of same-color TUs are present (as in the 6-pattern string), the degree of demixing is greater compared to the case where TUs of different colors alternate and large domains are absent (as in the 1-pattern string). This difference in the demixing parameter as a function of the 1D TU arrangement is clearly visible in Fig. S2B.

      “…euchromatic regions blue, and heterochromatic ones grey.” Please also explain what these color monomers mean in terms of non specific interactions with the TFs.

      Generally, in our simulation approach we assume euchromatin regions to be more open and accessible to transcription factors, whereas heterochromatin corresponds to more compacted chromatin segments [9]. To reflect this, we introduce weak, non-specific interactions between euchromatin and TFs, while heterochromatin interacts with TFs only thorugh steric effects. To clarify this point, we have now slightly revised the caption of Fig.6.

      “More quantitatively, Spearman’s rank correlation coefficient is 3.66 10<sup>−1</sup>, which compares with 3.24 10<sup>−1</sup> obtained previously using a single-colour model [11].” This comparison does not tell me whether the improvement in model performance justifies an additional model component. There are other, likelihood based approaches to assess whether a model fits better in a relevant extent by adding a free model parameter. Can these be used for a more conclusive comparison? Besides, a correlation of 0.36 does not seem so good?

      We understand the Reviewer’s concern that the observed increase in the activity correlation may not appear to provide strong evidence for the improvement of the newly introduced model. However, within the context of polymer models developed to study realistic gene transcription and chromatin organization, this type of correlation analysis is a widely accepted approach for model validation. Experimental data commonly used for such validation include Hi-C maps, FISH experiments, and GRO-seq data [10,11]. The first two are typically employed to assess how accurately the model reproduces the 3D folding of chromatin; a comparison between experimental and simulated Hi-C maps is provided in the Supplementary Information (Fig. S5), showing a Pearson correlation of 0.7. GRO-seq or RNA-seq data, on the other hand, are used to evaluate the model’s ability to predict gene transcription levels. To date, the highest correlation for transcriptional activity data has been achieved by the HiP-HoP model at a resolution of 1 kbp [10], reporting a Spearman correlation of 0.6. Therefore, the correlation obtained with our 2-color model represents a good level of agreement when compared with the more complex HiP-HoP model. In this context, the observed increase in correlation—from 0.324 to 0.366—can be regarded as a modest yet meaningful improvement.

      “…consequently, use of an additional color provides a statisticallysignificant improvement (p-value < 10<sup>−6</sup>, 2-sided t-test).” I do not follow this argument. Given enough simulation repeats, any improvement, no matter how small, will lead to statistically significant improvements.

      We agree that this sentence could be misleading. We have now rephrased it in a clearer manner specifying that each of the two correlation values is statistically significant alone, while before we were wrongly referring to the significance of the improvement.

      “Additionally, simulated contact maps show a fair agreement with Hi-C data (Figure S5), with a Pearson correlation r ∼ 0.7 (p-value < 10<sup>−6</sup>, 2-sided t-test).” Nice!

      We thank the Reviewer for the positive comment.

      “Because we do not include heterochromatin-binding proteins, we should not however expect a very accurate reproduction of Hi-C maps: we stress that here instead we are interested in active chromatin, transcription and structure only as far as it is linked to transcription.” Then why do you not limit your correlation assessment to only these regions to show that these are very well captured by your model?

      We thank the Reviewer for this insightful comment. Indeed, we could have restricted our investigation to active chromatin regions, as done in our previous works [11,12]. However, our intention in this section of the manuscript was to clarify that the current model is relatively simple and therefore not expected to achieve a very high level of agreement between experimental and simulated Hi-C maps. Another important limitation of the two color model described in the section is the absence of active loop extrusion mediated by SMC proteins, which is known to play a central role in establishing TADs boundaries. Consequently, even if our analysis were limited to active chromatin regions, the agreement with experimental Hi-C maps would still remain lower than that obtained with more comprehensive models, such as HiP-HoP, that we use later in the last section of the paper. We have now added a comment in the revised manuscript explicitly noting the lack of active loop extrusion in our 2-color model.

      “We also measure the average value of the demixing coefficient, θ<sub>dem</sub> (Materials and Methods). If θ<sub>dem</sub> = 1, this means that a cluster contains only TFs of one colour and so is fully demixed; if θ<sub>dem</sub> = 0, the cluster contains a mixture of TFs of all colors in equal number, and so is maximally mixed.” Repetitive.

      We have now rephrased the sentence in a more concise way.

      “…notably, this is similar to the average number of productivelytranscribing pols seen experimentally in a transcription factory [6].” That seems a bit fast and loose. The number of Polymerases can differ depending on state, type of factory, gene etc. and vary between anything from to a few hundreds of Polymerase complexes depending on definition of factory, and what is counted as active. Also, one would think that polymerases only make up a small part of the overall protein pool that constitutes a condensate, so it is unclear whether this is a pertinent estimate.

      Here we refer to the average size of what is normally referred to as a PolII factory, not a generic nuclear condensate. These are the clusters which arise in our simulations. These structures emerge through microphase separation and have been well characterised, for instance see [13] for a recent review. For these structures while there is a distribution the average is well defined and corresponds to a size of about 100 nm, which is very much in line with the size of the clusters we observe, both in terms of 3D diameter and number of participating proteins. Because of the size, the number of active complexes which can contribute cannot be significantly more than ∼ 10. These estimates are, we note, very much in line with super-resolution measurements of SAF-A clusters [14], which are associated with active transcription and hence it is reasonable to assume they colocalise with RNA and polymerase clusters.

      “Conversely, activities of similar TUs lying far from each other on the genetic map are often weakly negatively correlated, as the formation of one cluster sequesters some TFs to reduce the number available to bind elsewhere.” This point is interesting, and I strongly suspect that this indeed happening. But I don’t think it was shown in the analysis of the simulation results in sufficient clarity. We need direct assessment of this sequestration, currently it’s only indirectly inferred.

      Indeed, this is the mechanism underlying the emergence of negative long-range correlations among TU activity values. As the Reviewer correctly pointed out, the competition for a finite number of TFs was only indirectly inferred in the original manuscript. To address this, we have now included a new figure explicitly illustrating this effect. In Fig. S12, we show the kymograph of active TUs (left panel), as in Fig. 2E(i) of the main text, alongside a new kymograph depicting the number of green TFs within a sphere of radius 10σ centered on each green TU (right panel). For simplicity, we focus here only on green TUs and TFs. It can be observed that, during the initial part of the simulation, green TFs are localized near genomic position ∼ 2000(right panel), where green TUs are transcriptionally active (left panel). Toward the end of the simulation, TUs near genomic position ∼ 500 become active, coinciding with the relocation of TFs to this region and the depletion of the previous one.

      In the definition for the demixing coefficient (equation 1), what does the index i stand for?

      Here i is an index denoting each of the colors present in the model. We have now specified the meaning of i after Eq. 1.

      Reviewer 3 (Public Review):

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) ”gene expression” patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths.

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

      Weaknesses.

      Weakness of the work: The model has many assumptions. Some of the assumptions are a bit too simplistic. Concerns about the work are detailed below:

      We thank the Referee for this overall positive evaluation.

      We thank the Referee for this important observation. The way we The authors assume that when the diffusing beads (TFs) are near a TU, the gene expression starts. However, mammalian gene expression requires activation by enhancer-promoter looping and other related events. It is not a simple diffusion-limited event. Since many of the conclusions are derived from expression activity, will the results be affected by the lack of looping details?

      We do not need to assume promoter-enhancer contact, this emerges naturally through the bridging-induced phase separation and indeed is a key strength of our model. Even though looping is not assumed as key to transcriptional initiation, in practice the vast majority of events in which a TF is near a TU are associated with the presence of a cluster where regulatory elements are looped. So transcription in our case is associated with the bridging-induced phase separation, and there is no lack of looping, looping is naturally associated with transcription, and this is an emergent property of the model (not an assumption), which is an important feature of our model. Accordingly, both contact maps and transcriptional activity are well predicted by our model, both in the version described here and in the more sophisticated single-colour HiP-HoP model [10] (an important ingredient of which is the bridging-induced phase separation).

      Authors neglect protein-protein interactions. Without proteinprotein interactions, condensate formation in natural systems is unlikely to happen.

      We thank the Reviewer for pointing out the absence of protein-protein interactions in our simulations. While we acknowledge this limitation, we would like to emphasize that experimental studies have not observed nuclear proteins forming condensates at physiological concentrations in the absence of DNA or chromatin. For example, studies such as Ryu et al. [15] and Shakya et al. [16] show that protein-protein interactions alone are insufficient to drive condensate formation in vivo. Instead, the presence of a substrate, such as DNA or chromatin, is essential to favor and stabilize the formation of protein clusters.

      In our simulations, we propose that protein liquid-liquid phase separation (LLPS) is driven by the presence of both strong and weak attractions between multivalent protein complexes and the chromatin filament. As stated in our manuscript, the mechanism leading to protein cluster formation is the bridging induced attraction. This mechanism involves a positive feedback loop, where protein binding to chromatin induces a local increase in chromatin density, which then attracts more proteins, further promoting cluster formation.

      While we acknowledge that adding protein-protein interactions could be incorporated into our simulations, we believe this would need to be a weak interaction to remain consistent with experimental data. Additionally, incorporating such interactions would not alter the conclusions of our study.

      What is described in this paper is a generic phenomenon; many kinds of multivalent chromatin-binding proteins can form condensates/clusters as described here. For example, if we replace different color TUs with different histone modifications and different TFs with Hp1, PRC1/2, etc, the results would remain the same, wouldn’t they? What is specific about transcription factor or transcription here in this model? What is the logic of considering 3kb chromatin as having a size of 30 nm? See Kadam et al. (Nature Communications 2023). Also, DNA paint experimental measurement of 5kb chromatin is greater than 100 nm (see work by Boettiger et al.).

      We thank the Reviewer for this important observation, which we now address. To begin, we consider the toy model introduced in the first part of the manuscript, where TUs are randomly positioned rather than derived from epigenetic data. As the Reviewer points out, in this simplified context, our results reflect a generic phenomenon: the composition of clusters depends primarily on their size, independent of the specific types of proteins involved. However, the main goal of our work is to gain insights into apparently contradictory experimental findings, which show that some transcription factories consist of a single type of transcription factors, while other contain multiple types. This led us to focus on TF clusters and their role in transcriptional regulation and co-regulation of distant genes. Therefore, in the second part of the manuscript, we use DNase I hypersensitive site (DHS) data to position TUs based on predicted TF binding sites, providing a more biological framework. In both the toy model and the more realistic HiP-HoP model, we observe a size-dependent transition in cluster composition. However, we refrain from generalizing these results to clusters composed of other protein complexes, such as HP1 and PRC, as their binding is governed by distinct epigenetic marks (e.g. H3K927me3 and H3K27me3), which exhibit different genomic distributions compared to DHS marks.

      Finally, the mapping of 3kb to 30nm is an estimate which does not significantly impact our conclusions. The relationship between genomic distance (in kbp) and spatial distance (in nm) is highly dependent on the degree of chromatin compaction, which can vary across cell types and genomic context. As such, providing an exact conversion is challenging [17]. For example, in a previous work based on the HiP-HoP model [12] we compared simulated and experimental FISH measurements and found that 1kbp typically corresponds to 15 − 20nm, implying that 3kbp could span 60nm. Nevertheless, we emphasize that varying this conversion factor does not affect the core results or conclusions of our study. We have now included a clarification in the revised SI to highlight this point.

      Recommendations for the authors:

      Other points.

      Figure 1(D) caption says 2.25σ = 1.6 nanometer. Is this a typo? Sigma is 30nm.

      Yes, it was. As 1σ ∼ 30nm, we have 2.25σ = 2.25 · 30 nm = 67.2 nm ∼ 6.7 × 10<sup>−8</sup>m. We have now corrected the caption.

      Page 6, column 2nd, 3rd para, it is written that θ<sub>dem</sub> (”defined in Fig.1”). There is no θ<sub>dem</sub> defined in Fig.1, is there? I can see it defined in Methods but not in Fig. 1.

      Correct, we replaced (defined in Fig.1) with (see Methods for definition).

      Page 6, column 2, 4th para: what does “correlations overlap and correlations diverge mean”?

      With reference to the plots from Fig. 5B, correlation overlap and diverge simply refers to the fact that same-colour (red curves) and different-colour (blue curves) correlation trends may or may not overlap on each other. We have now clarified this point.

      What is the precise definition of correlation in Fig 5B (Y-axis)?

      In Fig.5B, correlation means Pearson correlation. We have now specified this point in the revised text and in the caption of Fig.5.

      References

      (1) S. A. Quinodoz, J. W. Jachowicz, P. Bhat, N. Ollikainen, A. K. Banerjee, I. N. Goronzy, M. R. Blanco, P. Chovanec, A. Chow, Y. Markaki et al., “Rna promotes the formation of spatial compartments in the nucleus,” Cell, vol. 184, no. 23, pp. 5775–5790, 2021.

      (2) R. A. Beagrie, A. Scialdone, M. Schueler, D. C. Kraemer, M. Chotalia, S. Q. Xie, M. Barbieri, I. de Santiago, L.-M. Lavitas, M. R. Branco et al., “Complex multi-enhancer contacts captured by genome architecture mapping,” Nature, vol. 543, no. 7646, pp. 519–524, 2017.

      (3) R. A. Beagrie, C. J. Thieme, C. Annunziatella, C. Baugher, Y. Zhang, M. Schueler, A. Kukalev, R. Kempfer, A. M. Chiariello, S. Bianco et al., “Multiplex-gam: genome-wide identification of chromatin contacts yields insights overlooked by hi-c,” Nature Methods, vol. 20, no. 7, pp. 1037–1047, 2023.

      (4) L. Liu, B. Zhang, and C. Hyeon, “Extracting multi-way chromatin contacts from hi-c data,” PLOS Computational Biology, vol. 17, no. 12, p. e1009669, 2021.

      (5) R.-S. Nozawa, L. Boteva, D. C. Soares, C. Naughton, A. R. Dun, A. Buckle, B. Ramsahoye, P. C. Bruton, R. S. Saleeb, M. Arnedo et al., “Saf-a regulates interphase chromosome structure through oligomerization with chromatin-associated rnas,” Cell, vol. 169, no. 7, pp. 1214–1227, 2017.

      (6) E. A. Boyle, Y. I. Li, and J. K. Pritchard, “An expanded view of complex traits: from polygenic to omnigenic,” Cell, vol. 169, no. 7, pp. 1177–1186, 2017.

      (7) C. Brackley, N. Gilbert, D. Michieletto, A. Papantonis, M. Pereira, P. Cook, and D. Marenduzzo, “Complex small-world regulatory networks emerge from the 3d organisation of the human genome,” Nat. Commun., vol. 12, no. 1, pp. 1–14, 2021.

      (8) R. B. Brem and L. Kruglyak, “The landscape of genetic complexity across 5,700 gene expression traits in yeast,” Proceedings of the National Academy of Sciences, vol. 102, no. 5, pp. 1572– 1577, 2005.

      (9) M. Chiang, C. A. Brackley, D. Marenduzzo, and N. Gilbert, “Predicting genome organisation and function with mechanistic modelling,” Trends in Genetics, vol. 38, no. 4, pp. 364–378, 2022.

      (10) M. Chiang, C. A. Brackley, C. Naughton, R.-S. Nozawa, C. Battaglia, D. Marenduzzo, and N. Gilbert, “Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure,” Cell Genomics, vol. 4, no. 12, 2024.

      (11) A. Buckle, C. A. Brackley, S. Boyle, D. Marenduzzo, and N. Gilbert, “Polymer simulations of heteromorphic chromatin predict the 3d folding of complex genomic loci,” Mol. Cell, vol. 72, no. 4, pp. 786–797, 2018.

      (12) G. Forte, A. Buckle, S. Boyle, D. Marenduzzo, N. Gilbert, and C. A. Brackley, “Transcription modulates chromatin dynamics and locus configuration sampling,” Nature Structural & Molecular Biology, vol. 30, no. 9, pp. 1275–1285, 2023.

      (13) P. R. Cook and D. Marenduzzo, “Transcription-driven genome organization: a model for chromosome structure and the regulation of gene expression tested through simulations,” Nucleic acids research, vol. 46, no. 19, pp. 9895–9906, 2018.

      (14) M. Marenda, D. Michieletto, R. Czapiewski, J. Stocks, S. M. Winterbourne, J. Miles, O. C. Flemming, E. Lazarova, M. Chiang, S. Aitken et al., “Nuclear rna forms an interconnected network of transcription-dependent and tunable microgels,” BioRxiv, pp. 2024–06, 2024.

      (15) J.-K. Ryu, C. Bouchoux, H. W. Liu, E. Kim, M. Minamino, R. de Groot, A. J. Katan, A. Bonato, D. Marenduzzo, D. Michieletto et al., “Bridging-induced phase separation induced by cohesin smc protein complexes,” Science advances, vol. 7, no. 7, p. eabe5905, 2021.

      (16) A. Shakya, S. Park, N. Rana, and J. T. King, “Liquid-liquid phase separation of histone proteins in cells: role in chromatin organization,” Biophysical journal, vol. 118, no. 3, pp. 753–764, 2020.

      (17) A.-M. Florescu, P. Therizols, and A. Rosa, “Large scale chromosome folding is stable against local changes in chromatin structure,” PLoS computational biology, vol. 12, no. 6, p. e1004987, 2016.

    1. eLife Assessment

      This important study identifies a metal transporter in the plasma membrane of the obligate intracellular pathogen, Toxoplasma gondii. Using an array of different approaches, the authors convincingly demonstrate that this transporter mediates iron and zinc uptake and regulates diverse cellular processes, including parasite metabolism and differentiation. This work will be of broad interest to cell biologists and biochemists studying metal ion transport mechanisms.

    2. Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knock-down mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage.

      The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated by exogenous addition of iron or zinc. Finally, the authors used heterologous expression of ZFT in Xenopus oocytes and yeast mutants, highlighting the dual substrate specificity of the transporter. The ability of ZFT to transport both iron and zinc is thus supported by two experimental approaches in heterologous systems. First by demonstrating ZFT ability to transport zinc, as the expression of Toxoplasma ZFT can compensate for a lack of zinc transport in yeast. Then, by showing the ability of ZFT to transport iron, as assessed in the Xenopus oocytes model. Furthermore, phenotypic analyses suggest defects in iron availability upon ZFT depletion, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function.

      Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. The converging evidence, including changes in metal concentrations upon ZFT depletion, data on metal transport obtained in heterologous systems, and phenotypic changes linked to iron deficiency, presents a convincing case. Given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful.

      Comments on revisions:

      The revised manuscript has successfully addressed all of the key points raised in the initial review. Notably, the metal transport experiments in Xenopus oocytes now provide compelling evidence supporting the role of ZFT function. I congratulate the authors on their efforts and have no further concerns to raise.

    3. Reviewer #2 (Public review):

      Summary:

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite.

      Strengths:

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. The heterologous expression of ZFT in a Xenopus oocyst system where ZFT was shown to transport iron and zinc is an important addition to the study. The authors also build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion.

      Weaknesses:

      The inclusion of the data showing iron and zinc transport when ZFT is expressed in a Xenopus oocyst system alleviated one of the main weaknesses of the original paper - the lack of direct biochemical evidence that ZFT acted as an iron transporter.

    4. Reviewer #3 (Public review):

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. In the manuscript's revision, the authors performed additional transport assays in Xenopus oocysts, providing further evidence for the transporter trafficking iron. Overall, the data by Aghabi et al. convincingly support that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes.

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging form the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration) as well as performing a yeast mutant complementation and transport assays in Xenopus oocysts expressing the T. gondii protein. This work is very thorough and clearly presented, leaving little doubt about this protein's function.

      Weaknesses:

      None. The authors have addressed all my previous queries/ concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knockdown mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage. The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated for by exogenous addition of iron or zinc. 

      While the manuscript does not directly investigate the transport function of ZFT through biochemical assays, the authors indirectly support the notion that ZFT can transport zinc by demonstrating its ability to compensate for a lack of zinc transport in a yeast heterologous system. Furthermore, phenotypic analyses suggest defects in iron availability, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function. Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. Although direct biochemical evidence for the transporter's substrate specificity and transport activity is lacking, the converging evidence, including changes in metal concentrations upon ZFT depletion, yeast complementation data, and phenotypic changes linked to iron deficiency, presents a convincing case. Some aspects of the results may appear somewhat unbalanced, particularly since iron transport could not be confirmed through heterologous complementation, while zinc-related phenotypes in the parasites have not been thoroughly explored (which is challenging given the limited number of zinc-dependent proteins characterized in Toxoplasma). Nevertheless, given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful. 

      We thank the reviewer for their assessment and would like to highlight that we now add direct biochemical characterisation in the new Figure 8, supporting our hypothesis and confirming iron transport by this protein.

      Reviewer #2 (Public review): 

      Summary: 

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite. 

      Strengths: 

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. Additionally, the authors build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion. 

      Weaknesses: 

      (1) Excess zinc was shown not to alter ZFT expression, but a cation chelator (TPEN) did lead to decreased expression. While TPEN is often used to reduce zinc levels, does it have any effect on iron levels? Could the reduction in ZFT after TPEN treatment be due to a reduction in the level of iron or another cation?

      WE thank the reviewers for this comment, we agree that TPEN is a fairly unspecific cation chelator so to determine if its effects are due to removal of zinc or other cations we treated with TPEN and either zinc or iron. Co-incubation of TPEN and zinc prevented ZFT depletion, while TPEN+FAC had no effect compared to TPEN alone (new Figure 6h and i), strongly suggesting the effects on ZFT abundance are linked to zinc and not just iron.  

      (2) ZFT expression was found to be dynamic depending on the size of the vacuole, based on mean fluorescence intensity measurements. Looking at protein levels by Western blot at different times during infection would strengthen this finding. 

      We show here that ZFT expression is highly dynamic, depending both the iron status of the host cell and the number of parasites/vacuole. However, validating this finding by western would be complex due to the highly unsynchronised nature of parasite replication and the large number (5x10<sup>6</sup> - 1x10<sup>7</sup>cells) of parasites required to visualise ZFT. Further, we show that ZFT is apparently internalised prior to degradation. For this reason, we have not attempted to validate this finding by western blotting at this time.

      (3) ZFT localization remained at the parasite periphery under low iron conditions. However, in the images shown in Figure S1c, larger vacuoles (containing 4-8 parasites) are shown for the untreated conditions, and single parasite-containing vacuoles are shown for the low iron condition. As ZFT localization is predominantly at the basal end of the parasite in larger PV and at the parasite periphery for smaller vacuoles, it would be better to compare vacuoles of similar size between the untreated and low-iron conditions.

      The reviewer brings up a good point, the concentration of iron chelator that we used here does not enable parasite replication, making an assessment of changes in localisation challenging. To address this, have new data using a much lower concentration of chelator (20 mM), which is still expected to impact the parasites (Hanna et al, 2025), but allows for replication. In this low iron environment, ZFT localisation remained significantly more peripheral (Fig. S1d,e), supporting our hypothesis that ZFT localisation is iron dependent, independent of vacuolar stage.

      Reviewer #3 (Public review): 

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements, including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. Overall, the data by Aghabi et al. reveal that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes. 

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging from the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration), as well as performing a yeast mutant complementation. This work is very thorough and clearly presented, leaving little doubt about this protein's function. 

      Weaknesses:

      This study offers no major novel insights into the biology of T. gondii. The transporter was already annotated as a zinc transporter (ToxoDB), was deemed essential (PMID: 27594426), and localized to the plasma membrane (PMID: 33053376). This study mostly confirms and validates these previous datasets. The authors identify three other proteins with a ZIT domain. Particularly, the role of TGME49_225530 is intriguing, as it is likely fitness-conferring (score: -2.8, PMID: 27594426) and has no subcellular localization assigned. Characterizing this protein as well, revealing its localization, and identifying if and how these transporters coordinate metal ion transport would have been worthwhile. 

      We agree that the work presented here validates the previous datasets, and if that was all we had done, we agree that the biological insights would be limited. However, we have gone significantly beyond the predictions, demonstrating dynamic localisation changes, iron-mediated regulation, the lack of substrate-based complementation and validating transport activity of both zinc and iron. Although in silico predictions and screens can be informative, it remains important to validate biological functions experimentally. While we agree that characterisation of TGME49_225530 (as well as the other two annotated ZIP proteins) would be interesting, and will certainly form part of our future plans, it is significantly beyond the scope of the presented manuscript.

      Another weakness is the data related to the impact of ZFT downregulation on the apicoplast in Figure 4. The authors show that downregulation of ZFT causes an increase in elongated apicoplasts (Figure 4d). The subsequent panels seem to show that the parasites present a dramatic growth defect at that time point. This growth arrest can directly explain the elongated apicoplast, but does not allow any conclusion about an impact on the organelle. In any case, an assessment of 'delayed death' as presented in Figure 4c seems futile, since the many other processes affected by zinc and iron depletion likely cause a rapid death, masking any potential delayed death.

      To address this point, we agree that given the importance of iron and zinc to the parasite that we cannot differentiate the death of the parasite due to apicoplast defects from death from other causes and we have modified the discussion to reflect this, as below.

      “However, given the delayed phenotype typically seen upon apicoplast disruption, we cannot determine if this is a direct effect of ZFT, or a downstream consequence of metal depletion”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Comments: 

      (1) The background on the typical sequence features that would identify Toxoplasma ZIP homologues should be expanded and clarified. While these proteins are likely quite divergent and may lack many conserved features, the manuscript currently does not provide enough detail to assess how similar (or different) TgZIPs are from well-characterized family members. Additionally, the justification for focusing on TGGT1_261720 (ZFT) over TGGT1_225530, as stated in the first paragraph of the results section, seems unclear. There is no predictive data supporting a potential plasma membrane localization for TGGT1_225530 (yet this cannot be excluded), and TGGT1_225530 appears to have more canonical metal-binding motifs. I believe that the fact that only TGGT1_261720 is iron-regulated should be sufficient justification for its selection, and this point could be emphasized more clearly. Furthermore, the discussion mentions a leucine residue that may be associated with broad substrate specificity, but this is not addressed in the initial comparative sequence analysis. These residues and the HK motif are not actually addressed in the Gyimesi et al. reference currently mentioned; thus this could be clarified and updated with references (such as PMID: 31914589) that provide more recent insights into key residues involved in metal selectivity in ZIP transporters.

      We thank you for this comment, to address these points:

      We agree that the iron-mediated regulation is sufficient for our focus on ZFT and have clarified the text to reflect this, as described above.

      We have also updated the references as suggested, our apologies for this oversight.

      We have further expanded the discussion, especially with reference to our new results using heterologous expression in oocytes (please see above).

      (2) Figure 1D, Figure 2A, C, H, Figure 3D, Figure 6F, H, corresponding text and paragraph 2 of the Discussion: It seems that most of the "non-specific bands" annotated in Figure 1D, which are lower molecular weight products, are not present in the parental cell line, suggesting they may not be non-specific after all. These bands also vary depending on the cell line (e.g., promoter used, see Figures 2H and 3D) or experimental conditions (e.g., iron excess or depletion). Given the dynamic localization of ZFT during intracellular development, it may be worth exploring whether these lower molecular weight bands represent degraded forms of TgZFT, possibly corresponding to the basally-clustered signal observed by immunofluorescence, with only the full-length protein associating with the plasma membrane. This possibility should be investigated or at least discussed further.

      While the lower bands are not present in the parental, we do see them in other HA-tagged lines, especially when the expression of the tagged protein is low, seen below (Author response image 1). We don’t currently have an explanation for these, but we can confirm that they do not change in abundance in parallel with the full length protein, supporting our hypothesis that these bands are an artefact of the anti-HA antibody in our system. Although ZFT is clearly degraded (e.g. Fig. 1g), we currently do not believe these bands are ZFT c-terminal degradation products.

      Author response image 1.

      Western blot of ZFT-3HA<sub>zft</sub> and another HA-tagged unrelated cytosolic protein, demonstrating that the lower bands are most likely nonspecific.

      (3) It is unfortunate that ZFT could not complement a yeast iron transporter mutant cell line, as this would have provided a strong argument for ZFT's role in iron transport. The manuscript does not provide much detail about the Δfet2/3 yeast mutant line. Fet3 is the ferroxidase subunit, while Ftr1 is the permease subunit of the high-affinity iron transport complex in yeast. Fet2, however, appears to be Saccharomyces cerevisiae's VPS41 homolog. Therefore, is Δfet2/3 the most appropriate mutant to use, or would another mutant line (e.g., ΔFtr1) be a better choice? Additionally, while Figure 7 suggests a decrease in metal uptake upon ZFT depletion, it would be useful to test whether overexpression of ZFT leads to enhanced metal incorporation, perhaps using a FerroOrange assay. 

      We thank the reviewer for their comments, which we have answered below:

      The Δfet2/3 yeast mutant was a typo and has been corrected, or apologies, we did use the  Δfet3/4 mutant line, based on previous successful experiments involving plant metal transporters (e.g  (DiDonato et al., 2004)).

      Unfortunately, we were unable to perform the FerroOrange assay in the overexpression line as this line is endogenously fluorescent in the same channel as FerroOrange.

      However, as detailed above we have now added significant new data, confirming our hypothesis that ZFT is an iron/zinc transporter through heterologous expression in Xenopus oocytes in the new figure 8. This provides direct evidence of transport of iron, and evidence that zinc can inhibit this transport, consistent with our hypothesis.  

      (4) The annotation of the blot in Figure 2H suggests that overexpressed ZFT-TY can only be detected in the absence of heat denaturation. However, this is not addressed in the text. Does heat denaturation also affect the detection of ZFT-3HA or the lower molecular weight products? This should be clarified in the manuscript. 

      Interestingly, ZFT is detectable after boiling at 95° C for 5 minutes when expressed at endogenous (or near endogenous) levels in the ZFT-3HA<sub>sag1</sub> and ZFT-3HA<sub>zft</sub> tagged parasite lines. However, overexpression of ZFT leads to a loss of detection via western blot when boiled, although the protein is detectable without heat denaturation.

      A possible explanation for this is that overexpression of protein may cause ZFT to miss-fold, making the protein more prone to aggregation following boiling, rendering the protein insoluble and unable to enter the gel. Moreover, heat aggregation can sometimes mask the epitope tags on the protein that is required for the antibody to be recognised, possibly explaining by ZFT is undetectable when overexpressed and exposed to boiling conditions, as has previously been observed for other transmembrane proteins (e.g. (Tsuji, 2020)).

      We have clarified this in the results section, although we do not have a full explanation for this, we consider it important to share for others who may be looking at expression of these proteins.

      (5) Figure 3G: It might be helpful to include an uncropped gel profile to allow readers to visualize that the main product does indeed correspond to a potential dimeric form in the native PAGE. 

      This has now been added in Figure S3e, thank you for this suggestion.

      (6) The investigation of the impact of ZFT depletion on the apicoplast could be improved. The authors suggest that ZFT knockdown inhibits apicoplast replication based on a modest increase in elongated organelles, but the term "delayed death" is not appropriate in that case, as it is typically linked to a loss of the organelle. This is not observed here and is also illustrated by the unchanged CPN60 processing profile. So, clearly, there seems to be no strong morphological effect on the apicoplast early on after ZFT depletion. On the other hand, the authors dismiss any impact on TgPDH-E2 lipoylation (which is iron-dependent) based on the fact that the lipoylated form of the protein is still detected by Western blot. However, closer inspection of the blot in Figure 4B suggests that the intensity of the annotated TgPDH-E2 signal is reduced compared to the -ATc condition (although there might be differences in protein loading, as indicated by the control) or even with the mitochondrial 2-oxoglutarate dehydrogenase-E2, whose lipoylation is presumably iron-independent (see PMID: 16778769). This experiment should be repeated, and the results quantified properly in case something was missed, and the duration of depletion conditions perhaps extended further. Of note, it would also be worthwhile to revisit size estimations, as the displayed profiles seem inconsistent with the typical sizes of lipoylated proteins detected with the anti-lipoyl antibody (e.g., ~100 kDa for PDH-E2, ~60 kDa for branched-chain 2-oxo acid dehydrogenase, and ~40 kDa 2-oxoglutarate dehydrogenase).

      We thank the reviewer for this comment. We agree that there is no strong defect on the apicoplast in the first lytic cycle and we have modified the language to remove reference to delayed death, as given the magnitude of changes associated with loss of iron and zinc, we cannot be certain about the role of the apicoplast.

      Based on this suggestion, we have now quantified the levels of lipoylation of PDH-E2, BDCK-E2 and OGDH-E2 and now include this in Figure S4b, c, d. Supporting our other results, we do not see a significant change in PDH-E2 lipolyation upon ZFT knockdown. However, although OGDH-E2 lipoylation is unchanged (Figure S4c) interestingly we do see a significant increase in BDCK-E2 lipoylation (Figure S4d). This process is not expected to be directly iron related, as mitochondrial lipoylation is through scavenging rather than synthesis however, speaks to the larger mitochondrial disruption that we see. We now consider this further in the discussion.

      For the sizes, we thank the reviewer for bringing this up, our apologies this was due to an error in the annotation, and we have now corrected this in the figure.

      (7) In the third paragraph of the discussion, the authors mention the inability to complement ZFT loss by adding exogenous metals. One argument is the potential lack of metal access to the parasitophorous vacuole (PV). Although largely unexplored, this point could be expanded further in the discussion, as the issue of metal transport to the parasite involves not only the parasite plasma membrane but also the PV membrane. Additionally, the authors mention the absence of functional redundancy in transporters, but it would be helpful to discuss potential stage-specific or differential expression of other ZIP candidates. Transcriptomic data available on Toxodb.org could provide useful insights into this, and experimental approaches, such as RT-PCR, could be used to assess the expression of these candidates in the absence of ZFT. 

      On the issue of metals crossing the PV membrane, we agree that while we do not currently know mechanisms of metal transport within the infected host cell, we do have experimental confirmation that the concentration and form of the metals that we are using can impact the parasites. We show that metal treatment inhibits parasites growth (e.g. Figure 3k-n, Figure 6a-d) and we can detect the increased metals through our experiments using FerroOrange and FluroZine (Figure 7a, c). In these experiments, parasites were treated intracellularly and so we can confirm that, regardless of the mechanism, iron and zinc can reach the parasite. While entry of metals across the PV is an intriguing question, it is beyond the scope of the present work which focuses on the role of the selected transporter.

      We agree that a more detailed discussion of the other ZIP transporters is warranted. We have extended this section of the discussion although for now, we cannot determine the role of the other ZIP transporters in Toxoplasma.

      (8) In the discussion, the authors mention that « Inhibition of respiration has previously been linked to bradyzoite conversion ». To strengthen their point, the authors could mention that mitochondrial Fe-S mutants, as well as mutants affecting mitochondrial translation or the mitochondrial electron transport chain, also initiate bradyzoite conversion (PMID: 34793583). This would reinforce the connection between mitochondrial dysfunction and stage conversion. 

      This is an excellent point and we have added this to the discussion as follows:

      “Inhibition of mitochondrial Fe-S biogenesis or mitochondrial respiration have both previously been linked to bradyzoite conversion (Pamukcu et al., 2021; Tomavo and Boothroyd, 1995), however we do not yet know the signalling factors linking iron, zinc or mitochondrial function to bradyzoite differentiation”.

      (9) As a general comment on manuscript formatting, providing page and line numbers would significantly improve the manuscript's readability and allow reviewers to more easily reference specific sections. This would help address the minor issues of typos (e.g., multiple occurrences of "promotor"). I suggest a careful read-through to correct these issues. 

      We thank the reviewer for this comment and in the resubmitted version we have corrected these issues. 

      Reviewer #2 (Recommendations for the authors): 

      (1) In the alignment (Figure 1a), the BPZIP sequence is from which organism (genus, species)? It would be helpful to include this information in the figure legend.

      Apologies for this oversight, this figure and section have been reworked and the species name (Bordetella bronchiseptica) added.

      (2) In reference to Figure 1a, the authors state, "Interestingly, all parasite ZIP-domain proteins examined have a HK motif at the M2 metal binding". I was wondering if by "all" the authors mean Toxoplasma and Plasmodium falciparum (shown in Figure 1a) or did the authors also look at other apicomplexan parasites such as Cryptosporidium or Neospora? Is this a general feature of apicomplexan parasites? 

      We looked at this, and the HK motif in the M2 binding site is conserved in Neospora Cryptosporidium, and even the digenic gregarine Porospora cf. gigantea. However, in the more distantly related Chromera we find a HH motif at the same position. This suggests that the HK motif is present in the Apicomplexa, but not conserved in the free-living Alveolata. Although we cannot speculate on the role of this motif currently, its role in metal import in Apicomplexa does deserve future scrutiny. To reflect this finding we have modified Figure 1a and the text.

      (3) In Figure 1e, to better visualize the ZFT-3HA staining at the basal pole, it would be better to omit the DAPI staining from the merged image. It is difficult to see the ZFT staining in the image of the large vacuole.

      We have removed the DAPI from this image to improve clarity.

      (4) Based on the "delayed-death" phenotype of the apicoplast, it is not surprising that no defects were observed in CPN60 processing or protein lipoylation. Have the authors considered measuring these phenotypes after a further round of growth (as was done for visualizing apicoplast morphology)? 

      We agree that changes in apicoplast function are often only seen in the second round of replication. However, here we wanted to check if ZFT depletion led to immediate changes in function of the organelle, which was not the case. It is highly likely that after the second round, we would see significant defects in the apicoplast function, however given the immediate importance of iron and zinc to many processes within the parasite, we believe that these experiments would be complicated to interpret.

      (5) Depleting ZFT led to a reduction in expression levels for the mitochondrial Fe-S protein SDHB but not for a cytosolic Fe-S protein. Is it expected that less intracellular iron (via depleted ZFT) would differentially affect mitochondrial versus cytosolic Fe-S proteins? 

      Previous studies (e.g., Maclean et al., 2024; Renaud et al., 2025) have shown that upon direct inhibition of the cytosolic Fe-S pathway, ABCE1 is fairly stable and levels can persist for 2-3 days post treatment. However, our recent work has shown that rapid and acute depletion of iron directly (though treatment with a chelator) can lead to ABCE1 levels decreasing within 24h (Hanna et al., 2025). In the case of ZFT knockdown, due to the more gradual reduction in iron levels seen (e.g. Figure 7j) we believe the parasites are prioritising key Fe-S pathways (e.g. essential proteostasis through ABCE1), probably while remodelling metabolism (as seen in our Seahorse assays). However, there are many proteins expected to be directly impacted by iron and zinc restriction that these parasites experience, and different protein classes are expected to behave differently in these conditions.

      Reviewer #3 (Recommendations for the authors): 

      (1) Is the effect on the plaque size between T7S4-ZFT (-aTc) in regular and 'high iron' conditions significant? The authors show convincingly that the plaque size is smaller due to the swapped promoter and the resulting overexpression of ZFT. But is the effect aggravated in high iron? This would be expected if excess iron were the problem.

      The plaque sizes are significantly smaller in the T7S4-ZFT line under high iron compared to the untreated condition, and compared to the parental untreated line. However, if we normalise plaque size to untreated conditions for both lines, there is not a significant change in plaque size in high iron between the parental and T7S4-ZFT. This is possibly due to the concentration of iron used (200 mM), which may not be optimal to see this effect, or the time taken for plaque assays (6-7 days), which may allow the excess iron to be stored by the host cells, changing the effective concentration of parasite exposure.

      (2) I struggle to understand the intracellular growth assay in Figure 5b. Here, T7S4-ZFT parasites show 25 % of vacuoles with more than 8 parasites (labelled 8+). But such large vacuoles are not observed in the parental strain. It appears as if the inducible strain grows faster even though it was earlier shown to have a fitness defect (see Figure 3j). Can you please clarify?

      This is a result of rapid growth of the parental line, some vacuoles in this line lysed and initiated a new round of replication at this time point while we saw no evidence at any timepoint that ZFT-depleted parasites were able to lyse the host cell. However, the initial (24-48h post ATc addition) replication rate of the ZFT KD remains similar to the parental. In this panel, we wanted to emphasize that the major phenotype we see upon ZFT depletion is vacuole disorganisation, which we believe is linked to the start of differentiation into bradyzoites.

      (3) Did the authors perform an IFA in addition to the Western blot to localize the 2nd Ty-tagged ZFT copy? It seems important to validate that the protein correctly localizes to the plasma membrane. 

      We have done so and now include these data in Figure S2b. Overexpression of ZFT-Ty localises to internal structures (probably vesicles) with some signal at the periphery, however, this limited expression at the periphery is sufficient to mediate the phenotypes that we see.

      (4) First sentence of the abstract and introduction: The authors speak of metabolism and cellular respiration as though they are two different processes. Is respiration not part of metabolism? 

      This is an excellent point, we wanted to distinguish mitochondrial respiration  from general cellular metabolism, but this was not clear. We have now changed this in the introduction to the below:

      “Iron, and other transition metals such as zinc, manganese and copper, are essential nutrients for almost all life, playing vital roles in biological processes such as DNA replication, translation, and metabolic processes including mitochondrial respiration (Teh et al., 2024)”

      (5) 2nd paragraph of the introduction: toxoplasmosis is written capitalized but should be lower case.

      This has been corrected.

      (6) Figure 4j legend: change 'shits parasites to a more quiescent stage' to 'shifts parasites'.

      This has been corrected, our apologies.

      (7) Please correct the following sentence: 'These data demonstrate ZFT depletion leads to the expression of the bradyzoite-specific markers BAG1 and DBL.' DBL is not expressed by the parasite. It is a lectin that binds to the sugars in the cyst wall.

      We have now modified this in the text. The sentence now reads: “These data show that ZFT depletion leads to the expression of the bradyzoite marker BAG1 and the production of the cyst wall, as detected by DBL”.

      (8) In the section on yeast complementation with TgZFT, the authors write: 'Based on this success, we also attempted to complement...'. Please consider changing 'Success' to something more neutral.

      We have modified the text to now read: “Based on these results, we also attempted to complement”…

      (9) In the discussion, the authors write: 'We see a delayed phenotype on the apicoplast, suggesting that metal import is also required in this organelle, although no apicoplast metal transporters have yet been identified.' Please consider the study Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites (PMID: (38163252).

      We thank the reviewer for the note and have modified the text to include this and the reference. Please see below:

      “Iron is known to be required in the apicoplast (Renaud et al., 2022), zinc also may be required, as the fitness-conferring Plasmodium zinc transporter ZIP1 is transiently localised to the apicoplast (Shrivastava et al., 2024), although the functional relevance of this localisation has not yet been established”.

      (10) The authors write: 'Iron is known to be required in the apicoplast (Renaud et al., 2022), although a potential role for zinc in this organelle has not yet been established.' The role for zinc in the apicoplast may not have been shown formally, but surely among its hundreds of proteins, and those involved in replication and transcription, there are some that depend on zinc...?

      Yes, we agree it would make sense, however multiple searches using ToxoDB and the datasets from Chen et al (2025) were unable to find any apicoplast-localised proteins with zinc-binding domains. We cannot exclude that zinc is in the apicoplast, and the results from Plasmodium (Shrivastava et al., 2024) may suggest that is, however currently we do not have any evidence for its role within this organelle.

      References

      DiDonato, R.J., Roberts, L.A., Sanderson, T., Eisley, R.B., Walker, E.L., 2004. Arabidopsis Yellow Stripe-Like2 (YSL2): a metal-regulated gene encoding a plasma membrane transporter of nicotianamine-metal complexes. Plant J 39, 403–414. https://doi.org/10.1111/j.1365-313X.2004.02128.x

      Hanna, J.C., Shikha, S., Sloan, M.A., Harding, C.R., 2025. Global translational and metabolic remodelling during iron deprivation in Toxoplasma gondii. https://doi.org/10.1101/2025.08.11.669662

      Maclean, A.E., Sloan, M.A., Renaud, E.A., Argyle, B.E., Lewis, W.H., Ovciarikova, J., Demolombe, V., Waller, R.F., Besteiro, S., Sheiner, L., 2024. The Toxoplasma gondii mitochondrial transporter ABCB7L is essential for the biogenesis of cytosolic and nuclear iron-sulfur cluster proteins and cytosolic translation. mBio 15, e00872-24. https://doi.org/10.1128/mbio.00872-24

      Pamukcu, S., Cerutti, A., Bordat, Y., Hem, S., Rofidal, V., Besteiro, S., 2021. Differential contribution of two organelles of endosymbiotic origin to iron-sulfur cluster synthesis and overall fitness in Toxoplasma. PLoS Pathog 17, e1010096. https://doi.org/10.1371/journal.ppat.1010096

      Renaud, E.A., Maupin, A.J.M., Berry, L., Bals, J., Bordat, Y., Demolombe, V., Rofidal, V., Vignols, F., Besteiro, S., 2025. The HCF101 protein is an important component of the cytosolic iron–sulfur synthesis pathway in Toxoplasma gondii. PLoS Biol 23, e3003028. https://doi.org/10.1371/journal.pbio.3003028

      Shrivastava, D., Jha, A., Kabrambam, R., Vishwakarma, J., Mitra, K., Ramachandran, R., Habib, S., 2024. Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites. ACS Infect. Dis. 10, 155–169. https://doi.org/10.1021/acsinfecdis.3c00426

      Teh, M.R., Armitage, A.E., Drakesmith, H., 2024. Why cells need iron: a compendium of iron utilisation. Trends in Endocrinology & Metabolism 35, 1026–1049. https://doi.org/10.1016/j.tem.2024.04.015 Tomavo, S., Boothroyd, J.C., 1995. Interconnection between organellar functions, development and drug resistance in the protozoan parasite, Toxoplasma gondii. International Journal for Parasitology 25, 1293–1299. https://doi.org/10.1016/0020-7519(95)00066-B.

    1. eLife Assessment

      This important study provides new insights into how Staphylococcus aureus adapts to disulfide stress through the redox-sensitive regulator Spx, which coordinates nutrient uptake, cysteine import, redox homeostasis, and bacterial growth. While the authors present compelling evidence supporting the central role of Spx in managing disulfide stress, several aspects require further clarification. In particular, the precise mechanisms regulating cysteine uptake and the proposed link between disulfide stress responses and iron limitation would benefit from additional explanation and experimental or conceptual justification.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      This manuscript presents a thoughtful and well-executed analysis of how S. aureus adapts to disulfide stress using a redox-sensitive regulator, Spx, as a lynchpin to coordinate nutrient uptake, redox balance, and growth. The work is strengthened by a systematic and complementary experimental approach that combines genetic, biochemical, and physiological measurements. The authors carefully test alternative explanations and build a coherent model linking stress sensing to downstream metabolic consequences. Several results, particularly those connecting cysteine uptake to growth defects, provide convincing support for the proposed trade-off. Overall, the authors largely achieve their aims, and the evidence generally supports the central conclusions. The conceptual framework and experimental approaches should be of broad interest to researchers studying S. aureus physiology and pathogenesis and to those studying bacterial stress responses and metabolic trade-offs.

      Weaknesses:

      Clarifying several interpretive points would further strengthen confidence in the proposed model. Some conclusions rely on data presentations or experimental designs that are not immediately clear to the reader. In particular, aspects of the protein stability analysis, global regulatory comparisons, and assays linking cysteine uptake to iron limitation would benefit from clearer justification and more precise interpretation. In addition, certain conclusions could be more carefully framed to reflect partial rather than complete rescue effects.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript titled "Activation of the Spx redox sensor counters cysteine-driven Fe(II) depletion under disulfide stress" by Hall and colleagues describes that an active redox switch is required for surviving under the diamide-induced disulfide stress. Furthermore, the SpxC10A mutant exhibits transcriptional dysregulation of genes involved in thiol maintenance and disulfide repair. The authors further demonstrate a role for Spx in regulating the uptake of L-cysteine, which otherwise leads to the chelation of intracellular iron and thus the repression of growth.

      Strengths:

      The authors demonstrate that the SpxC10A mutant accumulates high levels of thiols, leading to the chelation of intracellular iron and subsequent repression of the SpxC10A mutant's growth.

      Weaknesses:

      The authors did not show a direct regulation of L-cysteine uptake through CymR.

    4. Reviewer #3 (Public review):

      Summary:

      The paper from Hall et al. reports the effects of an altered function spx allele on the physiology of S. aureus. Since Spx is essential in this organism, the authors compare WT with a spx C10A allele that retains Spx functions that are independent of the formation of a C10-C13 disulfide. However, the major role of Spx in maintaining disulfide homeostasis in this organism appears to be reduced by this mutation, including a reduction (relative to WT) in the DIA-induction of thioredoxin, thioredoxin reductase, and BSH biosynthesis and reduction enzymes.

      Strengths:

      Based on a wide range of studies, the authors develop a model in which Spx is required for adaptation to disulfide stress, and this adaptation involves (in part) induction of both cystine/Cys uptake and the Fur regulon. Overall, the results are compelling, but further efforts to clarify the presentation will aid readers in being able to follow this very complicated story.

      Weaknesses:

      (1) More details are needed on how relative growth is defined and calculated (e.g., line 145 and Figure 1C). The raw data (growth curves) should be included when reporting relative growth so that readers can see what changed (lag, growth rate, final OD?). Later in the paper, the authors refer to "the diamide-induced growth delay of the spxC10A mutant" (line 379), but this is not apparent from the presented data.

      (2) Are the spx C10A, spx C13A, and spx C10A,C13A all really equivalent? In all cases, the Spx protein is presumably made (as confirmed for C10A in panel 1D). However, the only evidence to suggest that they are equivalent is the similar growth effects in panel 1C, and (as noted above), this data presentation can mask differences in how the mutations affect protein activity.

      (3) Figure 1D and Figure 1 Supplement 2 report results related to the effect of diamide treatment on protein half-life (t1/2). Only single results are shown for both panels, and the conclusions do not seem to be statistically robust. For example, in Figure 1, Supplement 2 concludes that Spx C10A has a t1/2 is 3.38 min (this should be labeled correctly in the Figure legend as the red line). and WT Spx is 8.69 min. However, Figure 1D suggests that the protein levels at time 0 may not be equivalent, and this is lost in the data processing. Indeed, there are significant differences in Spx levels between time 0 - and + DIA, which is curious. Further, the authors' conclusion relies very heavily on line-fitting that includes a final point that has very low signal intensity (as judged from Figure 1D) and therefore is likely the least reliable of all the data. It might be worth showing curve fitting for multiple gels. Regardless of the overfitting of the data, the general conclusion that Spx is partially stabilized against proteolysis by ClpXP, and that the C10A mutant is reduced in stabilization, is probably correct.

      (4) Figure 2 concludes that despite differences in the mRNA profiles between WT and spx C10A after 15 min. of DIA treatment, the overall level of responsiveness of the bacillithiol pool is unchanged. The authors find it "surprising" that the BSH pool responds normally despite some differences in gene expression. This is not surprising. The major events visualized in panel 2D are the chemical oxidation of BSH to BSSB and, presumably, the re-reduction by Bdr(YpdA). While it is seen that BSH synthesis (bshC) and ypdA expression may be less induced by DIA in the C10A mutant (2C), there is no evidence that the basal levels are different prior to stress. Therefore, the chemical oxidation and enzymatic re-reduction might be expected to occur at similar rates, as observed.

      (5) Line 215. For the reason stated above, there is no reason to invoke Cys uptake as needed for the reduction of BSSB. Further, since CySS (presumably an abbreviation for cystine) is imported, this itself can contribute to disulfide stress.

      (6) Line 235. Following on the above point, "diamide-induced disulfide stress increased L-CySS uptake in the spxC10A mutant to re-establish the BSH redox equilibrium." This is counterintuitive since LCySS is itself a disulfide and is thought to be reduced to 2 L-Cys in cells by BSH (leading to an increase in BSSB, not a reduction). Is there a known cystine reductase? Could cystine or L-cys be affecting gene regulation? (e.g., through CymR or Spx or ?). Cystine can also lead to mixed disulfide formation (e.g., could it modify Spx on C13?).

      (7) l. 247 "a functional Spx redox switch allows S. aureus to avoid this trade-off and maintain thiol homeostasis without excessive L-CySS uptake." Can the authors expand on how this is thought to work? Does Spx normally affect cystine uptake? I thought this was CymR? I am not following the logic here.

      (8) Line 258. "The fur mutant, which is known to accumulate iron...". My understanding is that fur mutant strains typically have higher bioavailable (free) Fe pools. This is seen in E. coli, for example, using EPR methods. However, they also often have lower total Fe due to the iron-sparing response, which represses the expression of abundant, Fe-rich proteins. Please provide a reference that supports this statement that in S. aureus fur mutants have higher total iron per cell.

      (9) Figure 4. For the reasons stated above (point 1), it is hard to interpret data presented only as "Rel. Growth". Perhaps growth curve data could be included in a supplement.

      (10) The interpretation of Figure 4 is complicated. It is not clear that there is necessarily a change in bioavailable Fe pools, although it does seem clear that Fe homeostasis is perturbed. It has been shown that one strong effect of DIA on B. subtilis physiology is to oxidize the BSH pool to BSSB (as shown also here), and this leads to a mobilization of Zn (buffered by BSH). Elevated Zn pools can inactivate some Fe(II)-dependent enzymes, which could account for the rescue by Fe(II) supplementation. Zn(II) can also dysregulate PerR and likely Fur regulons.

    1. eLife Assessment

      Optical tweezers have been instrumental to the determination of mechanical parameters of molecular motors. This study by Takamatsu et al. reports key mechanical parameters of kinesin KIF1A using fluorescence microscopy, wherein the motor is tethered to a DNA nanospring, without the use of an optical trapping apparatus, which represents an exciting development. The approach and the findings reported change current thinking about KIF1A‑mediated transport, with potential implications for understanding human disease. The findings are important and the strength of the evidence is compelling.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses a novel DNA origami nanospring to measure the stall force and other mechanical parameters of the kinesin-3 family member, KIF1A, using light microscopy. The key is to use SNAP tags to tether a defined nanospring between a motor-dead mutant of KIF5B and the KIF1A to be integrated. The mutant KIF5B binds tightly to a subunit of the microtubule without stepping, thus creating resistance to the processive advancement of the active KIF1A. The nanospring is conjugated with 124 Cy3 dyes, which allows it to be imaged by fluorescence microscopy. Acoustic force spectroscopy was used to measure the relationship between the extension of the NS and force as a calibration. Two different fitting methods are described to measure the length of the extension of the NS from its initial diffraction-limited spot. By measuring the extension of the NS during an experiment, the authors can determine the stall force. The attachment duration of the active motor is measured from the suppression of lateral movement that occurs when the KIF1A is attached and moving. There are numerous advantages of this technology for the study of single molecules of kinesin over previous studies using optical tweezers. First, it can be done using simple fluorescence microscopy and does not require the level of sophistication and expense needed to construct an optical tweezer apparatus. Second, the force that is experienced by the moving KIF1A is parallel to the plane of the microtubule. This regime can be achieved using a dual beam optical tweezer set-up, but in the more commonly used single-beam set-up, much of the force experienced by the kinesin is perpendicular to the microtubule. Recent studies have shown markedly different mechanical behaviors of kinesin when interrogated by the two different optical tweezer configurations. The data in the current manuscript are consistent with those obtained using the dual-beam optical tweezer set-up. In addition, the authors study the mechanical behavior of several mutants of KIF1A that are associated with KIF1A-associated neurological disorder (KAND).

      Strengths:

      The technique should be cheaper and less technically challenging than optical tweezer microscopy to measure the mechanical parameters of molecular motors. The method is described in sufficient detail to allow its use in other labs. It should have a higher throughput than other methods.

      Weaknesses:

      The experimenter does not get a "real-time" view of the data as it is collected, which you get from the screen of an optical tweezer set-up. Rather, you have to put the data through the fitting routines to determine the length of the nanospring in order to generate the graphs of extension (force) vs time. No attempts were made to analyze the periods where the motor is actually moving to determine step-size or force-velocity relationships.

      Comments on revisions:

      I am satisfied with the revision made by the authors in response to my first round of criticisms.

    3. Reviewer #2 (Public review):

      Summary:

      This work is important in my view because it complements other single-molecule mechanics approaches, in particular optical trapping, which inevitably exerts off-axis loads. The nanospring method has its own weaknesses (individual steps cannot be seen), but it brings new clarity to our picture of KIF1A and will influence future thinking on the kinesins-3 and on kinesins in general.

      Strengths:

      By tethering single copies of the kinesin-3 dimer under test via a DNA nanospring to a strong binding mutant dimer of kinesin-1, the forces developed and experienced by the motor are constrained into a single axis, parallel to the microtubule axis. The method is imaging-based which should improve accessibility. In principle, at least, several single-motor molecules can be simultaneously tested. The arrangement ensures that only single molecules can contribute. Controls establish that the DNA nanospring is not itself interacting appreciably with the microtubule. Forces are convincingly calibrated and reading the length of the nanospring by fitting to the oblate fluorescent spot is carefully validated. The excursions of the wild type KIF1A leucine zipper-stabilised dimer are compared with those of neuropathic KIF1A mutants. These mutants can walk to a stall plateau, but the force is much reduced. The forces from mutant/WT heterodimers are also reduced.

      Weaknesses:

      The tethered nanospring method has some weaknesses; it only allows the stall force to be measured in the case that a stall plateau is achieved, and the thermal noise means that individual steps are not apparent. The nanospring does not behave like a Hookean spring - instead linearly increasing force is reported by exponentially smaller extensions of the nanospring under tension. The estimated stall force for Kif1A (3.8 pN) is in line with measurements made using 3 bead optical trapping, but those earlier measurements were not of a stall plateau, but rather of limiting termination (detachment) force, without a stall plateau.

      Comments on revisions:

      The authors have successfully addressed my previous criticisms.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for the careful reading of our manuscript and for the constructive comments. We have provided responses to each of the comments below.

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 Public Review

      We appreciate the constructive comments of Reviewer #2, which have strengthened both the presentation and interpretation of our results.

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows. First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors detect the attachments that occur during a processive run by KIF1A by monitoring the suppression of the angular fluctuations of the fluorescent signal and plot this, for example, in Figure 3a as the Length of the NS (which presumably is a readout of force) vs time. This interval includes the time when the KIF1A is actively moving along the MT and when it is stalled. It would be interesting to know the actual stall time of the motor in order to be able to calculate a detachment rate constant. For attachment periods such as the first example highlighted in pink in Figure 3a, the stall time is pretty much equal to the attachment time since the motor is moving so fast and the stall period is so long. However, for short attachment times such as the fifth pink interval shown in this same figure or the traces with the mutant KIF1As in Figure 4 this is not so. Can the authors institute a program to identify the periods where the motor has stretched the NS spring to the point where it stalls, and then calculate this time in order to do an exponential fit to the "dwell time distribution"?

      By introducing another criterion (see Methods, “Rate of relative increase in NS’s length”), the attachment duration was separated into the two time regions noted by the reviewer. After reanalyzing all the data, we evaluated only the stall duration this time. As a result, the estimated stall-force values became more reliable and accurate. The dwell time analysis of was performed and included in the supplementary material for WT KIF1A, for which sufficient data were available.

      (2) The histogram of stall events in Figure 3b is quite broad. Please discuss.

      The newly added distributions from individual molecules (Fig. 3b) show that the variety in the stall force distribution is not due to multiple molecules, but is primarily an intrinsic property of single KIF1A molecules reflecting the complex kinetics of KIF1A under load, including occasional backward steps and reattachments. In addition, because the nanospring is a non-linear spring, a disadvantage is that even small fluctuations in extension can result in a substantial deviation in the measured stall force. These points have been added to the Discussion section.

      (3) Figure 3c, it is clear that for attachment times greater than 5s the attachment duration is independent of the Lstall, but this is not so clear for the short attachment durations. Some of this may relate to the fact that you're measuring attachment durations and not stall or dwell times as described in my first comment. Do you feel this is due to less precision in measuring the "attachment duration" during the short attachments, or just simply that more data is needed here? I assume that you do not want to imply that there is a load-dependence of the attachment durations here? Perhaps an expanded view of the data set from 0-10 seconds would clarify. 

      As described in our response to comment (1), the stall durations were separated from the attachment durations. This improved the measurement accuracy and revealed that and are uncorrelated (Fig. 3c). We appreciate this constructive comment.

      Reviewer #2 (Recommendations for the authors):

      (1) Off-axis forces are described as 'upward', 'perpendicular', and 'horizontal'. Consider referring to off-axis force, and if necessary, defining the direction of the force(s) relative to the axis of the immobilised MT. If necessary, a cartoon of XYZ axes might be added to F1c? 

      An XZ axis was added to the schematic in Fig. 1c.

      (2) If I understand correctly, stall forces are calculated by averaging the entire region in which the angular fluctuation is reduced below a threshold. In cases like the 3rd and 7th events on the trace in F1a, this will reduce the average. Perhaps consider separately averaging the later time points in each stall event? Perhaps also consider correlating the angular fluctuation signals and the spring length signal? Some fluctuations during stall plateaus might indicate slip back and re-engage events? 

      Instead of separately averaging the later time points in each stall event, we separated the stall force duration from the overall attachment duration (Fig. 3). This allowed us to obtain more accurate stall force values. The relationship between the NS length and the angular fluctuation during KIF1A slip-back events differed among individual stall events, and no clear trend was observed. Two representative examples are shown in the Author response image 1.

      Author response image 1.

      (3) Please describe all relevant methods fully instead of referencing previous work. For example, nanospring preparation refers readers to reference 21 (which in turn references an earlier paper).

      We revised the Methods section to include the procedures described in the previous reference, and we added the sequence information of the DNA origami to the supplementary information.

      (4) Were any experiments tried at reduced ATP concentration?

      (5) Were any data obtained from WT KIF5B? For kinesin-1, stall plateau forces of >7 pN are obtained.

      This study focused on comparing the stall forces of wild-type and KAND-related mutant KIF1A molecules under physiological ATP conditions, as our main goal was to characterize the disease-relevant phenotypes. Experiments at reduced ATP concentrations and with WT KIF5B are indeed important future directions but are beyond the scope of the present study. These follow-up experiments are currently in progress.

      (6) In Figure 1b, consider showing the attachment to the mutant KIF5B, and reversing the orientation so it corresponds to Figure 1c.

      KIF1A and KIF5B share the same binding method, so to indicate that the schematic in Fig. 1b represents both, we replaced ‘KIF1A’ with ‘Kinesin’.

      (7) In Figure 3d, add force axis. In general, please re-check all force axes. In Supplement S3, the stall plateau labels appear well above their corresponding axis ticks. In Figure 4, several mutants appear to be stalling at well over 5 pN, yet Table 1 gives a much lower value. Presumably, this reflects averaging effects?

      We added the force axis to Fig. 3d. Besides, we corrected Fig. S3 and Fig. 4 because there were errors in the conversion from length to force. As the reviewer pointed out, the apparent discrepancy between the force values in Fig. 4 and Table 1 arises mainly from averaging effects.

    1. eLife Assessment

      This study presents a valuable human stem cell-derived organoid model that captures key morphological and cellular features of spinal cord development and provides evidence for a YAP-dependent mechanism of lumen formation relevant to secondary neurulation. Overall, the evidence is convincing, using strong and validated approaches consistent with the current state of the art, including systematic protocol optimisation across multiple cell lines and quantitative analysis of tissue architecture. However, some claims regarding precise anterior-posterior and dorsoventral spinal cord identity, as well as several novelty claims, are at times overstated and would benefit from more direct validation and more careful positioning. The work will be of interest to developmental biologists and researchers studying neural tube defects.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Blanco-Ameijeiras et al. present an organoid-based model of the caudal neural tube that builds upon established principles from embryonic development and prior organoid work. By systematically testing and refining signaling conditions, the authors generate caudal progenitor populations that self-organize into neuroepithelia with molecular features consistent with secondary neurulation. Bulk-RNA sequencing supports the emergence of caudal neural identities, and the authors further examine cellular features such as apico-basal polarity and interkinetic nuclear migration. Finally, they provide evidence for a conserved, YAP-dependent mechanism of tube formation specific to secondary neurulation. The manuscript provides valuable methodological resources, including troubleshooting guidance that will be especially useful for the field. While this work represents a significant advance toward modeling human spinal cord development - particularly the process of secondary neurulation - the claims of complete caudalization and full AP-axis representation require additional experimental support and clarification.

      Strengths:

      (1) Methodological clarity and transparency: The first figure and accompanying text provide an exemplary explanation of protocol optimization and troubleshooting. This transparency - showing approaches that failed as well as those that succeeded - sets a high standard for reproducibility and will be highly beneficial to laboratories aiming to adopt or build upon this model.

      (2) Testing across multiple cell lines: Multiple hPSC and hiPSC lines were evaluated, strengthening the robustness and generalizability of the reported protocol.

      (3) Biological relevance: The focus on secondary neurulation fills a notable gap in current human organoid models of spinal cord development. The identification of YAP-dependent mechanisms in tube formation is a valuable insight with potential translational relevance.

      (4) Resource creation: The detailed parameters and signaling regimes will serve as a resource for the spinal cord and organoid communities.

      Weaknesses:

      (1) The manuscript over-interprets bulk RNA-seq data to make strong claims on the organoid AP patterning and caudalization. Bulk sequencing provides population-level averages and cannot confirm that individual organoids represent discrete AP levels. To support claims of generating every AP identity, the authors must perform staining or in situ hybridization for HOX genes on individual organoids. Further, the current interpretation of CDX2 as marking "very distal" identity is inaccurate in vitro; CDX2 marks caudal progenitors across the spinal cord axis. The language should be revised accordingly.

      (2) The claim of being the first organoid system to model secondary neurulation overlooks prior work showing HOXC9 in human organoids (Xue et al., Nature 2024; Libby et al., Development 2021), which would reflect the beginning of secondary neurulation. While this system may indeed be the first isolated secondary neurulation organoid model that expresses HOXD9/10 - a meaningful advance - bulk RNA-seq alone is insufficient to support the exclusivity of this claim. Additional single-organoid-level spatial analyses (via immunofluorescence of in situ hybridisation) and frequency quantification of regional identities are required to fully characterize the system.

      (3) Similarly, as written, there are overstatements taken from the bulk RNA sequencing to determine dorsal-ventral identity. Although dorsal markers are present, the dataset also contains ventral-associated genes (PAX6, SP8, NKX6-1, NKX6-2, PRDM12). To claim a "dorsal-only" identity, the authors should perform PAX7 immunostaining to demonstrate dorsalization of the entire organoid tissue.

      (4) The studies identifying YAP as a key driver of lumen fusion in Figure 6 are important and should be extended to the apical organoid system to demonstrate that this is truly a feature of secondary neurulation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Blanco-Ameijeiras and colleagues present the use of stem cells to create human spinal cord organoids that recapitulate anterior-posterior identity, with a large focus on posterior fates. In particular, the authors show robust transcriptional landscape specification that reflects certain anterior-posterior spinal cord development.

      Recapitulation of spinal cord development is essential to understand the fundamentals of developmental defects in a systematic manner. This work provides a broad approach to test certain aspects of neural tube morphogenesis, particularly posterior and dorsal identities. Perhaps the shorter protocol is an interesting upgrade for current standards, and the mechanical interpretation provides good proof of concept work that aligns with the need to better understand neural tube mechanobiology.

      Strengths:

      The manuscript addresses a major gap by focusing on posterior spinal cord identity and secondary neurulation, a phase that is less well captured by existing neural tube organoid models (although some do recapitulate that). The manuscript situates the approach within vertebrate development and human embryology.

      Morphometric quantifications are well described and provide a dynamic interpretation of cell-level interpretation, and that is a true strength of the work. This is important to develop important metrics that can later be used to compare modulations and pathway disruption.

      The protocols are well described and documented.

      Weaknesses:

      Some key data lacks proper quantification to robustly support the claims. For example, it is not clear how many organoids in total are counted in Figure 1D to derive the % of organoids expressing certain markers (e.g. SOX2 or BRA).

      Some claims are overstated. In the manuscript, the organoids show primarily dorsal and posterior identities under the current conditions, yet the discussion sometimes reads as if a more complete dorsoventral recapitulation is achieved. Therefore, one can either demonstrate ventral patterning (e.g., SHH / FOXA2) or reduce the claims about spinal cord identity, which, given the results, are more specific to a particular region.

      The mention of anterior organoids seems to distract the reader from the important work, which primarily focuses on the posterior identity. Further, it is not understood why SOX2 identity is reduced by Day 7 in Figure 1D. Since SOX2 in the manuscript is considered a neural marker (although also pluripotency along with NANOG, etc.), a further explanation should be provided. The author should also test the presence of PAX6, which is one of the earliest neuroectoderm markers in humans (Zhang X. et al., Cell Stem Cell 2010).

      The authors position the work as a substantial addition to the field. The work is very much welcomed; however, some claims align with an interpretation that leads the readers to understand a novelty that is beyond the work presented here. For example, in certain instances in the intro, the manuscript conveys that this work consists of the first recapitulation of spinal cord fates anterior or posterior, while other works (Rifes P. Nature Cell Biology 2020, Xue X. Nature 2024) recapitulate dorsoventral and anterior-posterior patterning and identity (albeit not of secondary neurulation) through controlled gradients of WNT and RA activity. To clearly position the importance of this work, the intro should focus on secondary neurulation and posterior identities.

      In a similar fashion, the claim that "Importantly though, to our knowledge these are the first neural organoids exhibiting a robust spinal cord transcriptome identity" is not very well understood when other neural tube organoid systems (including spinal cord identities) have been exhaustively profiled at the single cell level (Rifes P. Xue X. Abdel Fattah A.). Further explanation is therefore needed.

      The mechanical angle is important and adds to the large body of research that traces NT morphogenesis to mechanics. However, the YAP localization images can be much improved. Lower magnification images are needed to show the entire organoid to robustly convince the reader of the correct and varying localization of the YAP protein. The authors should also check for YAP-associated genes in their bulk RNA sequencing.

      The quantification of the YAP analysis in a total of 23 and 18 cells in the two conditions and in 7 organoids is by no means enough to draw a conclusion about YAP localization, and an increase in the number of cells is needed. Moreover, the use of dasatinib as an inhibitor for YAP is great, but there is no evidence shown that in this culture system, the inhibitor actually inhibits YAP. As such, IF images are required to confirm cytosolic YAP. Additionally, the authors can try other inhibitors (such as verteporfin) since most inhibitors are broadband.

      Given the mechanically oriented conclusions, other relevant works have shown posteriorized and ventralized neural tube organoids using RA and SHH activation, which were also mechanically stimulated via actuation, such as work done from the Ranga lab (Nature comm. 2021/2023). Although not strictly related to YAP, the therein molecular profiling, mechanical stimulation, lumen measurements, and NTD-like phenotype using PCP-mutated genes make these important relevant mentions since the current work adds important aspects with YAP analysis.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Blanco-Ameijeiras and collaborators describe the 3D differentiation of human pluripotent stem cells into the posterior spinal cord. The authors first test the exposure of different combinations of extrinsic signals to generate human neural organoids with distinct antero-posterior identities, as shown by bulk transcriptome analysis. They show that neural organoids, whether anterior or posterior, display tissue architecture, organisation and dynamics resembling the in vivo situation. Increasing the size of initial cell aggregates leads to the formation of a single lumen through a multi-lumen stage and a process of cell intercalation, mimicking the situation that they recently described for chick secondary neurulation (Gonzalez-Gobartt et al. Dev Cell. 2021 PMID: 33878300). The authors go on to show that, as in chick, YAP is involved in the resolution of multiple lumens into a single lumen. They conclude that their human organoid approach faithfully models human secondary neurulation, which may be instrumental in unravelling the mechanisms of human neural tube defects.

      Strengths:

      Overall, this is an important study demonstrating that lumen formation in human spinal organoids recapitulates key aspects of secondary neurulation observed in animal models. This organoid approach may be instrumental in unravelling the mechanisms of human neural tube defects.

      Weaknesses:

      The significance of the findings is tempered by several limitations. While the authors show convincing evidence that organoids undergo lumen formation with similar morphological, cellular and molecular features as seen in chick in their previous work (Gonzalez-Gobartt et al. Dev Cell. 2021 PMID: 33878300), whether this is linked to their caudal spinal cord identity is unclear.

    1. eLife Assessment

      In this valuable study, the authors performed cell-specific ribosome pulldown to identify gene expression (translatome) differences in the anterior (NT1) vs middle & posterior (NT2-9) cells of the C. elegans intestine, under fed, starved, or refeeding conditions. The data generated will be very helpful to the C. elegans community, and the evidence supporting the conclusions of the study is assessed to be solid. Some methodological caveats remain and are discussed.

    2. Reviewer #1 (Public review):

      Summary

      In this study, the authors have performed tissue-specific ribosome pulldown to identify gene expression (translatome) differences in the anterior vs posterior cells of the C. elegans intestine. They have performed this analysis in fed and fasted states of the animal. The data generated will be very useful to the C. elegans community, and the role of pyruvate shown in this study will result in interesting follow-up investigations.

      However, several strong claims made in the study are solely based on in silico predictions and are not supported by experimental evidence.

      Strengths:

      Several studies in the past have predicted different functions of the anterior (INT1) vs posterior (INT2-9) epithelial cells of the C. elegans intestine based on their anatomy and ultrastructure, but detailed characterization of differences in gene expression between these cell types (and whether indeed these are different 'cell types') was lacking prior to this study. The genes and drivers identified to be exclusively expressed in the anterior vs posterior segments of the intestine will be very helpful to selectively modulate different parts of the C. elegans intestine in future studies.

      Another strength of this study is the careful experimental design to test how the anterior vs posterior cell types of the intestine respond differently to food deprivation and recovery after return to food. These comparisons between 'states' of a cell in different physiological conditions are difficult to pick up in single-cell analyses due to low sequencing depth, which can fail to identify subtle modulation of gene expression.

      The TRAP-associated bulk RNA-seq approach used in this study is more suitable for such comparisons and provides additional information on post-transcriptional regulation during metabolic stress.

      A key finding of this study is that pyruvate levels modulate the translation state of anterior intestinal cells during fasting. Characterization of pyruvate metabolism genes, especially of the enzymes involved in its mitochondrial breakdown, provides novel insights into how gut epithelial cells respond to the acute absence of food.

      Weaknesses:

      Unlike previous TRAP-seq studies (PMID: 30580965, 36044259, 36977417) that reported sequencing data for both input and IP samples, this study only reports the sequencing data for IP samples. Since biochemical pulldowns are variable across replicates, it is difficult to know if the observed differences between different conditions are due to biological factors or differences in IP efficiency. More importantly, since two different TRAP lines were utilized in this study and a large proportion of the results focus on the differences between the translational profiles of INT1 vs INT2-9 cells, it is essential to know if the IP worked with similar efficiency for both TRAP strains that likely have different expression levels of the HA-tagged ribosomal protein. One way to estimate this would be to perform qRT-PCR of genes that are known to be enriched in all intestinal cells and determine whether their fold-enrichment over housekeeping genes (normalized to input) is similar in INT1 vs INT2-9 TRAP strains and across the fed vs fasted conditions. The authors, in fact, mention variability across biological replicates, due to which certain replicates were excluded from their WGCNA analysis.

      It appears that GFP expression is also detectable in INT2 (in addition to strong expression in INT1 in Fig.1A). Compared to INT3-9, which looks red, INT2 cells appear yellow, suggesting that the expression patterns of the two TRAP drivers are not mutually exclusive, which changes the interpretation of many of the results described in the study.

      Some parts of the study overemphasize the differences between the INT1 vs INT2-9 cell types, which is a biased representation of the results. For example, the authors specifically point out that 270 genes are differentially expressed in opposite directions in INT1 vs INT2-9 cell types during acute (30 min) fasting without mentioning the 1,268 genes that are differentially expressed in the same direction. They also do not mention here that 96% of the genes are differentially expressed in the same direction in INT1 and INT2-9 cell types after prolonged (180 min) fasting, suggesting that the divergent translational responses of these cell types are only observed in the first 30 minutes of food deprivation. Similar results have also been reported for the effect of fasting on locomotory and feeding behaviors, where 30 min of fasting produces more variable effects, which become more consistent after longer periods of fasting (PMID: 36083280). Hence, the effects of brief food deprivation should be interpreted with caution.

      Many of the interpretations of this study primarily rely on pathway enrichment analyses, which are based on the known function of genes. The function of uncharacterized genes that were found to be differentially expressed in INT1 vs INT2-9 cell types, e.g., the ShKT proteins, was not explored in this study. In addition, overreliance on pathway enrichment tools (instead of functional validation) has resulted in several conflicting findings. For example, one of the main messages of this study is that INT1 cells specialize in immune and stress response in response to fasting, which relies on pathway analysis in Figs 5E and 5F. However, pathway analysis at a different time point (shown in Figure S5A) indicates that INT2-9 cells show a much stronger increase in translation of stress and pathogen-responsive genes compared to INT1 cells. Hence, some of the results should be interpreted as different translational effects in INT1 vs INT2-9 cells after different lengths of food deprivation, without making broad claims about selective pathways being affected only in specific cell types.

      The authors have compared their TRAP-seq results with genes enriched in the anterior and posterior intestine clusters from a previously published whole-animal adult scRNA dataset (PMID: 37352352). They claim that their TRAP-seq results are in agreement with the findings of the scRNA study. However, among the 10 genes from the 'posterior intestine' scRNA cluster in Fig.S1E, six are downregulated in the INT1 vs INT2-9 comparison, while four are upregulated. Hence, there is no clear agreement between the two studies in terms of the top enriched genes in the anterior vs posterior intestine, which should be considered for cross-study comparisons in the future.

      The authors describe in the manuscript that they have performed INT1-specific RNAi for two C-type lectin genes that are upregulated during fasting. Due to a recent expansion of C-type lectin genes in C. elegans, there is a high chance of off-target effects of RNAi that is designed for members of this gene family. More trustworthy results could have been obtained using CRISPR-based loss-of-function alleles for these genes, one of which is publicly available. Also, the authors do not provide any explanation for why knockdown of these stress-response genes, which are activated in INT1 cells in response to food deprivation, results in improved resistance to pathogens. This, in fact, suggests a role of INT1 cells in increasing pathogen susceptibility, and not pathogen resistance, during food deprivation.

      Many of the studies in this field (e.g., references 2-4 in this article) have investigated the effects of food deprivation ranging from 4 hr to 24 hr, which results in activation of starvation responses in C. elegans. In contrast, the authors have used shorter time periods of fasting (30 min and 180 min), and most of their follow-up experiments have used 30 min of food deprivation. Previous work has shown that the effects of food deprivation can either accumulate over time (i.e., the effect gets stronger with longer food deprivation) or can be transient (i.e., only observed briefly after removal of food and not observed during long-term food deprivation). Starvation-induced transcription factors such as DAF-16/FoxO and HLH-30 show strong translocation to the nucleus only after 30 min of fasting. Though gene expression changes in all stages of food deprivation are of biological relevance, the authors have missed the opportunity to explore whether increased INS-7 secretion from the anterior intestine is dependent on these starvation-induced transcription factors (which can be easily tested using loss-of-function alleles) or is due to other fast-acting regulatory mechanisms induced due to the absence of food contents in the gut lumen. A previous study (PMID: 40991693) has shown that DAF-16 activation during prolonged starvation shuts down insulin peptide secretion from the intestinal epithelial cells. Hence, it is not clear if increased INS-7 secretion is only a feature of short-term food deprivation or is also a signature of long-term starvation (e.g., at 8 hr or 16 hr timepoints). Since most of the INS-7 secretion data in this study are for 30 min of fasting, it remains unknown whether the discovered regulators of INS-7 secretion can be generalized for extended food deprivation that triggers major metabolic changes, such as fat loss (e.g., conditions shown in Figure 1D).

      Two previous studies (PMID: 18025456, 40991693) have shown a strong reduction in the expression of ins-7 in the anterior intestine using GFP-based reporters (both promoter fusions and endogenous CRISPR-generated) and in whole-animal RNA-seq data from starved animals. These results are in contrast to the increased INS-7 secretion from INT1 cells during fasting that is reported in this study. The authors here have reported that INS-7 translation is higher in INT1 compared to INT2-9 during fed, acute fasted, and chronic fasted conditions, but they have not shown whether INS-7 translation is upregulated during acute and chronic fasting in INT1 cells in their TRAP-seq analysis. Knowing whether increased INS-7 secretion during acute fasting is due to increased transcription, translation, or secretion of INS-7 is crucial to resolve the discrepancy between these studies.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors set out to understand whether the discrete segments of the C.elegans intestine were specialized to carry out distinct functions during an animal's exposure and adaptation to a fast-changing nutrient environment. To achieve this, the authors used a method called Translating ribosome affinity purification (TRAP), which provides a snapshot of what genes are being translated into proteins (and therefore functionally prioritized by the animal) under different fasting and re-feeding conditions. By expressing the TRAP constructs in two distinct segments of the intestine (INT1) and (INT2-9), the authors were able to identify how these segments responded to changing nutrient availability.

      Already under steady state nutrient conditions, the authors found that INT1 and INT2-9 appeared to have different 'tasks', with INT1 expressing more immune- and stress-response related genes. Exposing animals to different regimens of starvation and refeeding also showed marked differences between the intestinal segments, and the gene expression patterns in INT1 were consistent with INT1 cells playing an integrative role in linking nutrient cues to the secretion of insulin molecules that regulate fat metabolism with food intake. In summary, the data presented catalogue, for the first time, gene expression differences between two areas of the intestine, suspected to play different roles, and through clever experiments, links these gene expression changes to responses to nutrient availability.

      Strengths:

      The data presented catalogue - for the first time and in a careful manner - gene expression differences between two areas of the intestine. They strongly support the presence of intriguing differences between two areas of the intestine in immune, metabolic, and stress-response regulation, and link these gene expression changes to the responses of these regions to nutrient availability.

      Weaknesses:

      The conclusions of this paper are mostly well-supported by data, but the relevance of the changing gene expression patterns could be better clarified and extended in the discussion.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Liu and colleagues utilize TRAP-seq to profile the repertoire of actively translated mRNAs in different intestinal cell types (anterior INT1 vs. posterior INT2-9 cells) in C. elegans. A key goal of this study was to identify transcripts differentially expressed/translated between these intestinal cell subtypes in the context of animals being well fed or subjected to acute (30 minutes) or chronic (3 hours) starvation, followed by refeeding.

      The authors identify a number of differentially expressed genes across all of the conditions tested. They then provide an initial survey of the landscape of translatome changes through Weighted Gene Network Correlation Analysis (WGNA), and some high-level functional surveys via Gene Ontology (GO) term analysis and protein domain analysis. The authors validate the enriched expression patterns of some of their identified candidate genes using fluorescent promoter fusion reporters, confirming INT1-specific expression. The authors further implicate the role of several other candidate genes in pathogen avoidance and in response to nutritional cues by knocking them down specifically in INT1 cells by RNAi. Finally, the authors identify pyruvate as a major nutrient signal coming from the bacterial diet that suppresses the release of a key insulin peptide (INS-7), and identify some of the genes expressed in INT1 that are required for this response.

      Strengths:

      (1) Good use of and justification for TRAP-seq, because scRNA-seq would be difficult under the varied conditions used (starvation, refeeding).

      (2) The manuscript is generally clear to read, and the data are generally well-presented with good supporting data that includes replicates, sample sizes, error measurements, and associated statistics.

      (3) The dataset will be an interesting resource to mine for future studies focusing on mechanisms of how particular intestinal cell types respond to different environmental signals.

      Weaknesses:

      (1) A limitation of TRAP-seq, although powerful, is that only relative comparisons can be made between genotypes/conditions to identify differentially-expressed genes, rather than assessing whether a given gene is expressed at a certain level in a cell type under a certain condition. This limitation is due to the non-specific association of sticky RNA species with the beads during the immunoprecipitation step. This is a minor point, however, and the authors do a nice job of focusing their analysis on differentially expressed transcripts in the current study.

      (2) Another limitation of the current study is that the experiments testing the role of candidate genes identified by their profiling experiments do not delve a bit deeper into providing a mechanistic understanding of the phenotypes being studied. At present, the results are thus viewed more as a genomics-based screen with some limited follow-up on interesting hits. However, this reviewer appreciates that when placed in the context of the work presented, a presentation of the profiling data along with some validation is an excellent starting point for future mechanistic studies elaborating on these interesting candidates.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The main goal of the study was to survey the dynamic responses at the level of actively translated mRNAs of the INT1 vs INT2-9 cells in response to metabolic challenge.

      Overall, the authors use established methods to perform their genome-wide analysis, and the set of differentially regulated genes is enriched for expected molecular functions and forms coherent networks in anticipated pathways.

      The validation experiments (promoter::GFP fusion reporters, INT1-specific knockdowns of highly regulated genes) further corroborate the quality of the TRAP-seq datasets generated.

      I have a few points for the authors that would further strengthen this work:

      (1) The authors rightfully focus on the top differentially-regulated candidates, but it's unclear at present how far down their fold change list would lead to expression pattern validations. It would be useful to test a few more promoter::GFP fusion reporters at different enrichment/fold-change/statistical cutoffs.

      (2) Although the INT1-specific RNAi provides a convenient strategy for rapidly perturbing and testing genes of interest for phenotypes, independently validating the knockdowns with genetic mutants, or alternatively (if genes are essential), degron alleles.

      Impact:

      The TRAP-seq data and list of differentially-expressed candidate genes will form an interesting set of high-priority candidates to study for their role in the reception and transduction of nutritional cues in response to food status and pathogens. This data will thus benefit the C. elegans community of researchers studying the mechanisms governing these phenomena.

    1. eLife Assessment

      In this useful paper, the authors present a comprehensive method for the purification of recombinant Snake Venom Metalloproteinases (SVMPs) using the MultiBac expression system, explain the self-activation of the enzymes by Zn2+ incubation, and establish high-throughput screening (HTS) techniques. The authors addressed a key problem: producing a substantial amount of pure and enzymatically active SVMPs required for structural and functional studies. Altogether, this work builds a solid foundation for the large-scale production of active SVMPs for future biochemical and structural characterization as well as for drug discovery, albeit leaving certain caveats about the universal applicability of the described methodology for the production of any recombinant SVMPs.

    2. Reviewer #1 (Public review):

      Summary:

      The authors Hall et al. establish a purification method for snake venom metalloproteinases (SVMPs). By generating a generic approach to purify this divergent class of recombinant proteins, they enhance the field's accessibility to larger quantities of SVMPs with confirmed activity and, for some, characterized kinetics. In some cases, the recombinant protein displayed comparable substrate specificity and substrate recognition compared to the native enzyme, providing convincing evidence of the authors' successful recombinant expression strategy. Beyond describing their route towards protein purification, they further provide evidence for self-activation upon Zn2+ incubation. They further provide insights on how to design high-throughput screening (HTS) methods for drug discovery and outline future perspectives for the in-depth characterization of these enzyme classes to enable the development of novel biomedical applications.

      Strengths:

      The study is well-presented and structured in a compelling way. The purification strategy results in highly pure protein products, well characterized by size exclusion chromatography, SDS page as well as confirmed by mass spectrometry analysis. Further, a significant portion of the manuscript focuses on enzyme activity, thereby validating function. Particularly convincing is the comparability between recombinant vs. native enzymes; this is successfully exemplified by insulin B digestion. By testing the fluorogenic substrate, the authors provide evidence that their production method of recombinant protein can open up possibilities in HTS. Since their purification method can be applied to three structurally variable SVMP classes, this demonstrates the robust nature of the approach.

      Weaknesses:

      The universal applicability of the approach could be emphasized more clearly. The potential for this generic protocol for recombinant SVMP zymogen production to be adapted to other SVMPs is somewhat obscured by the detailed optimization steps. A general schematic overview would strengthen the manuscript, presented as a final model, to illustrate how this strategy can be extended to other targets with similar features. Such a schematic might, for example, outline the propeptide fusion design, including its tags, relevant optimizations during expression, lysis, purification (e.g., strategies for metal ion removal and maintenance of protease inactivity), as well as the controllable auto-activation.

      The product obtained from the purification protocol appears to be a heterogeneous mixture of self-activated and intact protein species. The protocol would benefit from improved control over the self-activation process. The Methods section does not indicate whether residual metal ions were attempted to be removed during the purification, which could influence premature activation. Additionally, it has not been discussed whether the shift to pH 8 in the purification process is necessary from the initial steps onwards, given that a lower pH would be expected to maintain enzyme latency.

      The characterization of PIII activity using the fluorogenic peptide effectively links the project to its broader implications for drug design. However, the absence of comparable solutions for PI and PII classes limits the overall scope and impact of the finding.

      Overall, the authors successfully purified active SVMP proteins of all three structurally diverse classes in high quality and provided convincing evidence throughout the manuscript to support their claims. The described method will be of use for a broader community working with self-activating and cytotoxic proteases.

    3. Reviewer #2 (Public review):

      Summary:

      The aim of the study by Hall et al. was to establish a generic method for the production of Snake Venom Metalloproteases (SVMPs). These have been difficult to purify in the mg quantities required for mechanistic, biochemical, and structural studies.

      Strengths:

      The authors have successfully applied the MultiBac system and describe with a high level of detail the downstream purification methods applied to purify the SVMP PI, PII, and PIII. The paper carefully presents the non-successful approaches taken (such as expression of mature proteins, the use of protease inhibitors, prodomain segments, and co-expression of disulfide-isomerases) before establishing the construct and expression conditions required. The authors finally convincingly describe various activity assays to demonstrate the activity of the purified enzymes in a variety of established SVMP assays.

      Weaknesses:

      The manuscript suffers from a lack of bottoming out and stringent scientific procedures in the methodology and the characterization of the generated enzymes.

      As an example, a further characterization of the generated protein fragments in Figure 3 by intact mass spectroscopy would have aided in accurate mass determination rather than relying on SEC elution volumes against a standard. Protein shape and charge can affect migration in SEC. Also, the analysis of N-linked glycosylation demonstrates some reactivity of PIII to PNGase F, but fails to conclude whether one or more sites are occupied, or whether other types of glycosylation is present. Again, intact mass experiments would have resolved such issues.

      The activity assays in Figure 4 are not performed consistently with kinetic assays and degradation assays performed for some, but not all, enzymes, and there is no Echis ocellatus comparison in Figure 4h. Overall, whilst not affecting the main conclusion, this leaves the reader with an impression of preliminary data being presented. For consistency, application of the same assays to all enzymes (high-grade purified) would have provided the reader with a fuller picture.

      Overall, the data presented demonstrates a very credible path for the production of active SVMP for further downstream characterization. The generality of the approach to all SVMP from different snakes remains to be demonstrated by the community, but if generally applicable, the method will enable numerous studies with the aim of either utilizing SVMPS as therapeutic agents or to enable the generation of specific anti-venom reagents, such as antibodies or small molecule inhibitors.

    4. Reviewer #3 (Public review):

      Summary:

      The presented study describes the long journey towards the expression of members' SVMP toxins from snake venom, which are toxins of major importance in a snakebite scenario. As in the past, their functional analysis relied on challenging isolation; the toxins' heterologous expression offers a potential solution to some major obstacles hindering a better understanding of toxin pathophysiology. Through a series of laborious and elegantly crafted experiments, including the reporting of various failed attempts, the authors establish the expression of all three SVMP subtypes and prove their activity in bioassays. The expression is carried out as naturally occurring zymogens that autocleave upon exposure to zinc, which is a novel modus operandi for yielding fusion proteins and sheds also some new light on the potential mechanism that snakes use to activate enzymatic toxins from zymogenic preforms.

      Strengths:

      The manuscript draws from an extensive portfolio of well-reasoned and hypothesis-driven experiments that lead to a stepwise solution. The wetlands data generated is outstanding, although not all experiments along this rocky road to victory were successful. A major strength of the paper is that, translationally speaking, it opens up novel routes for biodiscovery since a first reliable platform for expression of an understudied, yet potent toxin class is established. The discovered strategy to pursue expression as zymogens could see broad application in venom biotechnology, where several toxin types are pending successful expression. The work further provides better insights into how snake toxins are processed.

      Weaknesses:

      The manuscript contains several chapters reporting failed experiments, which makes it difficult to follow in places. The reporting of experimental details, especially sample sizes and replicates, could be optimised. At the time of writing, it remains unclear whether the glycosilations detected at a pIII SVMP could have an impact on the bioactivities measured, which is a major aspect, and future follow-ups should clarify this. Finally, the work, albeit of critical importance, would benefit from a more down-to-earth evaluation of its findings, as still various persistent obstacles that need to be overcome.

      Major comments to the manuscript:

      (1) Lines 148-149: "indicating that expressing inactivated SVMPs could be a viable, although inefficient, approach". I think this text serves a good purpose to express some thoughts on the nature of how the current draft is set up. It is quite established that various proteases cause extreme viability losses to their expression host (whether due to toxicity, but surely also because of metabolic burden), which is why their expression as inactive fusion proteins is the default strategy in all cases I have thus far seen. I believe that, especially in venom studies, this is of importance given the increased toxicity often targeting cellular integrity, and especially here, because Echis are known to feed on arthropods at younger life history stages, making it very likely that some venom components are especially active against insects and other invertebrates. With that in mind, I would argue that exploring their production in inactive form is the obvious strategy one would come up with and not really the conclusion of a series of (well-conducted and scientifically sound!) experiments. For me, the insight of inactive expression is largely confirmatory of what is established, unless I miss something in the authors' rationale. If yes, it would be important to clarify that in the online version.

      (2) Line 173: Here, Alphafold 3 was used, whereas in previous sections (e.g., line 153, line 210), it was Alphafold 2. I suggest using one release across the manuscript.

      (3) Line 252-254: I fully agree, the PIII SVMP is glycosylated. Glycosylation is an important mediator of snake venom activity, and several works have described their importance in the field. This raises the question, which glycosylations have been introduced here in the SVMP, and to verify that these are glycosylations that belong to those found in snakes. This is important as insects facilitate thousands of N- and O- O-glycosylations to modulate the activity of their proteome, of which many are specific to insects. If some of these were integrated into the SVMP, this could have an impact on downstream produced bioassays and also antigenicity (the surface would be somewhat different from natural toxins, causing different selection).

      (4) General comment for the bioassays: It would be good to specify the replicates again and report the data, including standard deviations.

      Discussion:

      I think the data generated in the study is very valuable and will be instrumental for pushing the frontiers in SVMP research, but still I would like to see a bit of modesty in their discussion. As I have pointed out above, it is unclear which effect the glycosilations may have (i.e., are the glycosilations found reminiscent of natural ones?), despite their being functionally important. Also, yes, isolation of SVMPs is challenging, but the reality is that their expression is equally challenging, as evidenced by the heaps of presented negative data (with which I have no problems, I think reporting such is actually important). So far, the "generic" protocol has been used to express one member per structural class of Echis SVMP, but no evidence is provided that it would work equally well on other members from taxonomically more distant snakes (e.g., the pIII known from Naja oxiana). It is very likely, but at the time of writing, purely speculative. Lastly, the reality is also that the expression in insect cells can only be carried out by highly specialized labs (even in the expression world, as most laboratories work with bacterial or fungal hosts), whereas the isolation can be attempted in most venom labs. That said, production in insect cells also has economic repercussions as it will be very challenging to generate yields that are economically viable versus other systems, which is pivotal because the authors talk about bioprospecting and the toxins used in snakebite agent research. Again, I believe the paper is highly important and excellently crafted, but I think especially the discussion should see some refinement to address the drawbacks and to evaluate the paper's findings with more modesty.

    1. eLife Assessment

      The authors used genetic mutations in VANGL2 to study cell morphological changes during differentiation of hPSCs and understand the mechanisms underlying neural tube closure defects. The findings are important as they establish a quantitative, reproducible 2D human iPSC-to-neural-progenitor platform for analyzing cell-shape dynamics during differentiation. The convincing evidence provided, combined with the relative simplicity of the model and its tractability as a patient-specific and reverse genetic platform, make it attractive.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ampartzidis et al. report the establishment of an iPSC-derived neuroepithelial model to examine how mutations from spina bifida patients disrupt fundamental cellular properties that underlie neural tube closure. The authors utilize an adherent neural induction protocol that relies on dual SMAD inhibition to differentiate three previously established iPSC lines with different origins and reprogramming methods. The analysis is comprehensive and outstanding, demonstrating reproducible differentiation, apical-basal elongation, and apical constriction over an 8-day period among the 3 lines. In inhibitor studies, it is shown that apical constriction is dependent on ROCK and generates tension, which can be measured using an annular laser ablation assay. Since this pathway is dependent on PCP signaling, which is also implicated in neural tube defects, the authors investigated whether VANGL2 is required by generating 2 lines with a pathogenic patient-derived sequence variant. Both lines showed reduced apical constriction and reduced tension in the laser ablation assays. The authors then established lines obtained from amniocentesis, including 2 control and 2 spina bifida patient-derived lines. These remarkably exhibited different defects. One line showed defects in apical-basal elongation, while the other showed defects in neural differentiation. Both lines were sequenced to identify candidate variants in genes implicated in NTDs. While no smoking gun was found in the line that disrupts neural differentiation (as is often the case with NTDs), compound heterozygous MED24 variants were found in the patient whose cells were defective in apical-basal elongation. Since MED24 has been linked to this phenotype, this finding is especially significant.

      Some details are missing regarding the method to evaluate the rigor and reproducibility of the study.

      Major Comments:

      It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      For the patient-specific lines - how many lines were derived per patient?

      Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      Significance:

      This paper is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants. This will not only demonstrate that sequence variants result in loss of function but also determine which cellular behaviors are disrupted.

    3. Reviewer #2 (Public review):

      Summary:

      The authors' work focuses on studying cell morphological changes during differentiation of hPSCs into neural progenitors in a 2D monolayer setting. The authors use genetic mutations in VANGL2 and patient-derived iPSCs to show that (1) human phenotypes can be captured in the 2D differentiation assay, and (2) VANGL2 in humans is required for neural contraction, which is consistent with previous studies in animal models. The results are solid and convincing, the data are quantitative, and the manuscript is well written. The 2D model they present successfully addresses the questions posed in the manuscript. However, the broad impact of the model may be limited, as it does not contain NNE cells and does not exhibit tissue folding or tube closure, as seen in neural tube formation. Patient-derived lines are derived from amniotic fluid cells, and the experiments are performed before birth, which I find to be a remarkable achievement, showing the future of precision medicine.

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      (2) Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      (4) Figure 2d. Do the cells become thicker after recoil?

      (5) Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      Significance:

      This study establishes a quantitative, reproducible 2D human iPSC-to-neural-progenitor platform for analyzing cell-shape dynamics during differentiation. Using VANGL2 mutations and patient-derived iPSCs, the work shows that (1) human phenotypes can be captured in a 2D differentiation assay and (2) VANGL2 is required for neural contraction (apical constriction), consistent with animal studies. The results are solid, the data are quantitative, and the manuscript is well written. Although the planar system lacks non-neural ectoderm and does not exhibit tissue folding or tube closure, it provides a tractable baseline for mechanistic dissection and genotype-phenotype mapping. The derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models. However, overall, I did not learn anything substantively new from this manuscript; the conclusions largely corroborate prior observations rather than extend them. In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Ampartzidis et al., significantly extends the human induced pluripotent stem cell system originally characterized by the same group as a tool for examining cellular remodeling during differentiation stages consistent with those of human neural tube closure (Ampartzidis et al., 2023). Given that there are no direct ways to analyze cellular activity in human neural tube closure in vivo, this model represents an important platform for investigating neural tube defects which are a common and deleterious human developmental disease. Here, the authors carefully test whether this system is robust and reproducible when using hiPSC cells from different donors and pluripotency induction methods and find that despite all these variables the cellular remodeling programs that occur during early neural differentiation are statistically equivalent, suggesting that this system is a useful experimental substrate. Additionally, the carefully selected donor populations suggest these aspects of human neural tube closure are likely to be robust to sexual dimorphism and to reasonable levels of human genetic background variation, though more fully testing that proposition would require significant effort and be beyond the scope of the current work. Subsequent to this careful characterization, the authors next tested whether this system could be used to derive specific insights into cell remodeling during early neural differentiation. First, they used a reverse genetics approach to knock in a human point mutation in the critical regulator of planar cell polarity and apical constriction, Vangl2. Despite being identified in a patient, this R353C variant has not been directly functionally tested in a human system. The authors find that this variant, despite showing normal expression and phospho-regulation, leads to defects consistent with a failure in apical constriction, a key cell behavior required to drive curvature change during cranial closure. Finally, the authors test the utility of their hiPSC platform to understand human patient-specific defects by differentiating cells derived from two clinical spina bifida patients. The authors identify that one of these patients is likely to have a significant defect in fully establishing early proneural identity as well as defects in apicobasal thickening. While early remodeling occurs normally in the other patient, the authors observe significant defects in later neuronal induction and maturation. In addition, using whole exome sequencing the authors identify candidate variant loci that could underly these defects.

      Major comments:

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      Significance:

      Overall, I am enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects. This work systematizes an important and novel tool to examine the cellular basis of neural tube defects. While other hiPSC models of neural tube closure capture some tissue level dynamics, which this model does not, they require complex microfluidic approaches and have limited accessibility to direct imaging of cell remodeling. Comparatively, the relative simplicity of the reported model and the work demonstrating its tractability as a patient-specific and reverse genetic platform make it unique and attractive. This work will be of interest to a broad cross section of basic scientists interested in the cellular basis of tissue remodeling and/or the early events of nervous system development as well as clinical scientists interested in modeling the consequences of patient specific human genetic deficits identified in neural tube defect pregnancies.

    5. Author response:

      General Statements

      In this manuscript we characterize an exquisitely reproducible model of iPSC differentiation into neuroepithelial cells, use it to mechanistically study cell shape changes and planar cell polarity signaling activation during this transition, then apply it to identify patient-specific cell deficiencies in both forward and reverse genetic screens as a power tool for patient-stratification in personalized medicine. To our knowledge, we provide the first evidence of a human pathogenic mutation directly impairing apical constriction: an evolutionarily conserved behavior of epithelial cells which is the subject of intense research. 

      We are very pleased with the balanced and rigorous reviews generated through Review Commons, which we have already used to improve our manuscript. Reviewer 1 highlights that our study “is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants.” Reviewer 2 agrees that “results are solid and convincing, the data are quantitative, and the manuscript is well written”, and that our “derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models.” Reviewer 3 is “enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects.” 

      Below, we have replied to each of the reviewers’ comments.

      Description of the planned revisions

      R2.2. Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      We are happy to do this analysis fully in revision. Our initial analysis performing crosscorrelation between apical area and CDH2 protein in one line shows the highest crosscorrelation at Δt = -1, suggesting neuroepithelial CDH2 increases before apical area decreases. In contrast, the same analysis comparing apical area versus PAX6 shows Δt = 0, suggesting concurrence. This analysis will be expanded to include the other markers we quantified and the manuscript text amended accordingly. We are keen to undertake additional experiments to test whether these cells swap their key cadherins – CDH1 and CDH2 - before they begin to undergo morphological changes (see the response to Reviewer 3’s minor comment 1 immediately below).

      R3.1(Minor) There seems to be a critical window at day 5 of the differentiation protocol, both in terms of cell morphology and the marker panel presented in Figure 1i. Do the authors have any data spanning the hours from day 5 to 6? If not, I don't think they need to generate any, but do I think this is a very interesting window worthy of further discussion for a couple of reasons. First, several studies of mouse neural tube closure have shown that various aspects of cell remodeling are temporally separable. For example, between Grego-Bessa et al 2016 and Brooks et al 2020 we can infer that apicobasal elongation rapidly increases starting at E8.5, whereas apical surface area reduction and constriction are apparent somewhat earlier at E8.0. I think it would be interesting to see if this separability is conserved in humans. Second, is there a sense of how the temporal correlation between the pluripotent and early neural fate marker data presented here corroborate or contradict the emerging set of temporally resolved RNA seq data sets of mouse development at equivalent early neural stages?

      Cell shape analysis between days 5 and 6 has now been added (see the response to point 2.1 below). As the reviewer predicted, this is a transition point when apical area begins to decrease and apicobasal elongation begins to increase.

      We also thank the reviewer for this prompt to more closely compare our data to the previous mouse publications, which we have added to the discussion. The Grego-Bessa 2016 paper appears to show an increase in thickness between E7.75 and E8.5, but these are not statistically compared. Previous studies showed rapid apicobasal elongation during the period of neural fold elevation, when neuroepithelial cells apically constrict. This has now been added to the discussion: 

      Discussion: “In mice, neuroepithelial apicobasal thickness is spatially-patterned, with shorter cells at the midline under the influence of SHH signalling[14,77,78]. Apicobasal thickness of the cranial neural folds increases from ~25 µm at E7.75 to ~50 µm at E8.5[79]: closely paralleling the elongation between days 2 and 8 of differentiation in our protocol. The rate of thickening is non-uniform, with the greatest increase occurring during elevation of the neural folds[80], paralleled in our model by the rapid increase in thickness between days 4-6 as apical areas decrease. Elevation requires neuroepithelial apical constriction and these cells’ apical area also decreases between E7.75 and E8.5 in mice[79], but we and others have recently shown that this reduction is both region and sex-specific[14,81]. Specifically, apical constriction occurs in the lateral (future dorsal) neuroepithelium: this corresponds with the identity of the cells generated by the dual SMAD inhibition model we use[56]. More recently, Brooks et al[82] showed that the rapid reduction in apical area from E8-E8.5 is associated with cadherin switching from CDH1 (E-cadherin) to CDH2 (N-cadherin). This is also directly paralleled in our human system, which shows low-level co-expression of CDH1 and CDH2 at day 4 of differentiation, immediately before apical area shrinks and apicobasal thickness increases.”

      Prompted by the in vivo data in Brooks et al (2025)[82], we are keen to further explore the timing of CDH1/CDH2 switching versus apical constriction with new experimental data in revisions.

      R3.2(Minor) 2) Can the authors elaborate a bit more on what is known regarding apicobasal thickening and pseudo-stratification and how their work fits into the current understanding in the discussion? This is a very interesting and less well studied mechanism critical to closure, which their model is well suited to directly address. I am thinking mainly of the Grego-Bessa at al., 2016 work on PTEN, though interestingly the work of Ohmura et al., 2012 on the NUAK kinases also shows reduced tissue thickening (and apical constriction) and I am sure I have missed others. Given that the authors identify MED24 as a likely candidate for the lack of apicobasal thickening in one of their patient derived lines, is there any evidence that it interacts with any of the known players?

      We have now added further discussion on the mechanisms by which the neuroepithelium undergoes apicobasal elongation. Nuclear compaction is likely to be necessary to allow pseudostratification and apicobasal elongation. The reviewer’s comment has led us to realise that diminished chromatin compaction is a potential outcome of MED24 down-regulation in our GOSB2 patient-derived line. Figure 4D suggests the nuclei of our MED24 deficient patientderived line are less compacted than control equivalents and we propose to quantify nuclear volume in more detail to explore this possibility.

      Additionally, we have already expanded our discussion as suggested by the reviewer:

      Discussion: “Mechanistic separability of apical constriction and apicobasal elongation is consistent with biomechanical modelling of Xenopus neural tube closure showing that both are independently required for tissue bending[61]. Nonetheless, neuroepithelial apical constriction and apicobasal elongation are co-regulated in mouse models: for example, deletion of Nuak1/2[83], Cfl1[84], and Pten[79] all produce shorter neuroepithelium with larger apical areas. Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium.

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68]. Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85]. As general regulators of polymerase activity, MED proteins have the potential to alter the timing or level of expression of many other genes, including those already known to influence pseudostratification or apicobasal elongation. MED depletion also causes redistribution of cohesion complexes[86] which may impact chromatin compaction, reducing nuclear volume during differentiation.”

      R3.3(Minor) 3) Is there any indication that Vangl2 is weakly or locally planar polarized in this system? Figure 2F seems to suggest not, but Supplementary Figure 5 does show at least more supracellular cable like structures that may have some polarity. I ask because polarization seems to be one of the properties that differs along the anteroposterior axis of the neural plate, and I wonder if this offers some insight into the position along the axis that this system most closely models?

      VANGL2 does not appear to be planar polarised in this system. This is similar to the mouse spinal neuroepithelium, in which apical VANGL2 is homogenous but F-actin is planar polarised (Galea et al Disease Models and Mechanisms 2018). We do observe local supracellular cablelike enrichments of F-actin in the apical surface of iPSC-derived neuroepithelial cells:

      Author response image 1.

      Preliminary identification of apical supracellular cables suggestive of local polarity. Top: F-actin staining shown in inverted grey LUT highlighting enrichment along directionally-polarised cell borders (blue arrows). Bottom: Staining orientation (blue ~ X axis, red ~ Y axis) based on OrientationJ analysis illustrating localised organisation of F-actin enrichment.

      We propose to compare the length of F-actin cables and coherency of their orientation at the start and end of neuroepithelial differentiation, and in wild-type versus VANGL2mutant epithelia.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Major points

      (1) It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      These experimental details have now been clarified. Unless otherwise stated, all findings were confirmed in three independently differentiated plates from the same line or at least one differentiation from each of three lines. 

      Methods: Unless otherwise stated, for each iPSC line three independently differentiated plates were generated and analysed, with each plate representing a separate differentiation experiment performed on different days.

      (2) For the patient-specific lines - how many lines were derived per patient?

      This has now been clarified in the methods. Microfluidic reprogramming of a small number of amniocytes produces one line per patient representing a pool of clones. Subcloning from individual cells would not be possible within the timeframe of a pregnancy. 

      Methods: For patient-specific iPSC lines, one independent iPSC line was obtained per patient following microfluidic mmRNA reprogramming.

      (3) Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      We have now expanded these details:

      Methods: “VANGL2 knock-in lines were generated using CRSIPR-Cas9 homology directed repair editing by Synthego (SO-9291367-1). The guide sequence was AUGAGCGAAGGGUGCGCAAG and the donor sequence was CAATGAGTACTACTATGAGGAGGCTGAGCATGAGCGAAGGGTGTGCAAGAGGAGGGCCAGGTGGGTCCCTGGGGGAGAAGAGGAGAG.

      Sequence modification was confirmed by Sanger sequencing before delivery of the modified clones, and Sanger sequencing was repeated after expansion of the lines (Supplementary Figure 5) as well as SNP arrays (Illumina iScan, not shown) confirming genomic stability.”

      Author response image 2.

      Snapshot of Illumina iScan SNP array showing absence of chromosomal duplications or deletions in the CRISPR-modified VANGL2-knockin lines or their congenic control.

      (4) Suggested text changes.

      Some additional suggestions for improvement.

      The abstract could be more clearly written to effectively convey the study's importance. Here are some suggestions

      Line 26: Insert "apicobasal" before "elongation" - the way it is written, I initially interpreted it as anterior-posterior elongation.

      Line 29: Please specify that the lines refer to 3 different established parent iPSC lines with distinct origins and established using different reprogramming methods, plus 2 control patient-derived lines. - The reproducibility of the cell behaviors is impressive, but this is not captured in the abstract.

      Line 32: add that this mutation was introduced by CRISPR-Cas9 base/prime editing.

      The last sentence of the abstract states that the study only links apical constriction to human NTDs, but also reveals that neural differentiation and apical-basal elongation were found. The introduction could also use some editing.

      Line 71: insert "that pulls actin filaments together" after "power strokes" Line 73: "apically localized," do you mean "mediolaterally" or "radially"?

      Line 75: Can you specify that PCP components promote "mediolaterally orientated" apical constriction Lines 127: Specify that NE functions include apical basal elongation and neurodifferentiation are disrupted in patient-derived models

      All have now been corrected.

      Reviewer #2:

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      We used ZO-1 to quantify apical areas of the VANGL2-konckin lines in Figure 3. Segmentation of neuroepithelial apical areas based on F-actin staining is commonplace in the field (e.g. in the Brooks et al 2022 paper cited by another reviewer), and is generally robust because the cell junctions are much brighter than any apical fibres not associated with the apical cortex. However, we accept that at earlier stages of differentiation there may be more apical fibres when cells are cuboidal. We have therefore repeated our analysis of apical area using ZO-1 staining as suggested, analysing a more temporally-detailed time course in one iPSC line. This new analysis confirms our finding of lack of apical area change between days 2-4 of differentiation, then progressive reduction of apical area between days 4-8, further validating our system. Including nuclear images is not helpful because of the high nuclear index of pseudostratified epithelia (e.g. see Supplementary Figure 7) which means that nuclei overlap along the apicobasal axis. Individual nuclei cannot be related to their apical surface in projected images.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      The outlines on these images are not intended to show cell boundaries, but rather link landmarks visible at both timepoints to calculate cluster (not cell) change in area. This is as previously shown in Galea et al Nat Commun 2021 and Butler et al J Cell Sci 2019. We have now amended the visualisation of retraction to make representation of differences between conditions more intuitive. 

      (4) Figure 2d. Do the cells become thicker after recoil?

      This is unlikely because the ablated surface remains in the focal plane. Unfortunately, we are unable to image perpendicularly to the direction of ablation to test whether their apical surface moves in Z even by a very small amount. This has now been clarified in the results:

      Results: “The ablated surface remained within the focal plane after ablation, indicating minimal movement along the apical-basal axis.”

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      The GOSB2 iPSC line we describe does represent the in vivo situation in Med24 knockout mouse embryos, but is clearly less severe because we are still able to detect MED24 protein expressed in this line. We do not have detailed clinical data of the patient from which this line was obtained to determine whether their neurological development is normal. However, it is well established that some individuals who have spina bifida also have abnormalities in supratentorial brain development. It is therefore likely that abnormalities in neuron differentiation/maturation are concomitant with spina bifida. Our findings in the GOSB2 line complement earlier studies which also identified deficiencies in the ability of patient-derived lines to form neurons, but were unable to functionally assess neuroepithelial cell behaviours we studied. This has now been clarified in the discussion:

      Discussion: “Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium. 

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68].

      Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85].”

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      We appreciate the reviewer’s praise of our personalised medicine approach and fully agree that neural tube defects are rarely monogenic. The patient lines we studied were not intended to provide mechanistic insight, but rather to demonstrate the future applicability of our approach to patient care. Our vision is that every patient referred for fetal surgery of spina bifida will have amniocytes (collected as part of routine cystocentesis required before surgery) reprogrammed and differentiated into neuroepithelial cells, then neural progenitors, to help stratify their postnatal care. One could also picture these cells becoming an autologous source for future cellbased therapies if they pass our reproducible analysis pipeline as functional quality control. This has now been clarified in the discussion:

      Discussion: “The multi-genic nature of neural tube defect susceptibility, compounded by uncontrolled environmental risk factors (including maternal age and parity[102]), mean that patient-derived iPSC models are unlikely to provide mechanistic insight. They do provide personalised disease models which we anticipate will enable functional validation of genetic diagnoses for patients and their parents’ recurrence risk in future pregnancies, and may eventually stratify patients’ postnatal care. We also envision this model will enable quality control of patient-derived cells intended for future autologous cell replacement therapies, as is being developed in post-natal spinal cord injury[103]. Thus, the highly reproducible modelling platform we evaluate – which is robust to differences in iPSC reprogramming method, sex and ethnicity – represents a valuable tool for future mechanistic insights and personalised disease modelling applications.”

      Significance:

      In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

      We disagree with the reviewer that “the model was unsuccessful in one of the two patientderived lines”. The GOSB1 line demonstrated deficiency of neuron differentiation independently of neuroepithelial biomechanical function, whereas the GOSB2 line showed earlier failure of neuroepithelial function. We also do not, at this stage, make patient-specific predictive claims: this will require longer-term matching of cell model findings with patient phenotypes over the next 5-10 years.  

      Reviewer #3:

      Major comments

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      As the reviewer observes, our cultures cannot bend because they are adhered on a rigid surface. The apical and basal lengths of the cultures will therefore necessarily be roughly equal in length. Some inwards bending of the epithelium is expected at the edges of the dish, but these cannot be imaged. The live imaging we show in Figure 2 illustrates that, just as happens in vivo, apical constriction is asynchronous. This means not all cells will have ‘bottle’ shapes in the same culture. We now illustrate the evolution of these shapes in more detail in Supplementary Figure 1.

      Additionally, the reviewer’s comment motivated us to investigate local buckles in the apical surface of our cultures when their apical surfaces are dilated by ROCK inhibition. We hypothesised that the very straight apical surface in normal cultures is achieved by a balance of apical cell size and tension with pressure differences at the cell-liquid interface. Consistent with our expectation, the apical surface of ROCK-inhibited cultures becomes wrinkled (Supplementary figure 4). The VANGL2-KI lines do not develop this tortuous apical surface (as shown in Figure 3), which is to be expected given their modification is present throughout differentiation unlike the acute dilation caused by ROCK inhibition.

      This new data complements our visualisation of apical constriction in live imaging, apical accumulation of phospho-myosin, and quantification of ROCK-dependent apical tension as independent lines of evidence that our cultures undergo apical constriction. 

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      There is no significant difference in recoil between the control lines in Figures 2 and 3, albeit the data in Figure 3 is more variable (necessitating more replicates: none were excluded). We also showed laser ablation recoil data in Supplementary Figure 10, in which we did identify a graphing error (now corrected, also no significant difference in recoil from the other control groups as shown in Author response image 3).

      Author response image 3.

      Recoil following laser ablation is not significantly different between different experiments. X axis labels indicate the figure panel each set of ablation data is shown in. Points represent an independent differentiation dish.

      (4)(Minor) I think some of the commentary on the strengths and limitations of the model found in the Results section should be collated and moved to the discussion in a single paragraph. For example, this could also briefly touch on/compare to some of the other models utilizing hiPSCs (These are mentioned briefly in the intro, but this comparison could be elaborated on a bit after seeing all the great data in this work).

      These changes have now been made:

      Discussion: “Some of these limitations, potentially including inclusion of environmental risk factors, can be addressed by using alternative iPSC-derived models[93,94]. For example, if patients have suspected causative mutations in genes specific to the surface (non-neural) ectoderm, such as GRHL2/3, 3D models described by Karzbrun et al[49] or Huang et al[95] may be informative. Characterisation of surface ectoderm behaviours in those models is currently lacking. These models are particularly useful for high-throughput screens of induced mutations[95], but their reproducibility between cell lines, necessary to compare patient samples to non-congenic controls, remains to be validated. Spinal cell identities can be generated in human spinal cord organoids, although these have highly variable morphologies[96,97]. As such, each iPSC model presents limitations and opportunities, to which this study contributes a reductionist and highly reproducible system in which to quantitatively compare multiple neuroepithelial functions.”

      (5) While the authors are generally good about labeling figures by the day post smad inhibition, in some figures it is not clear either from the images or the legend text. I believe this includes supplemental figures 2,5,6,8, and 10 (apologies if I simply missed it in one or more of them)

      These have now been added.

      (6) The legend for Figure 2 refers to a panel that is not present and the remaining panel descriptions are off by a letter. I'm guessing this is a versioning error as the text itself seems largely correct, but it may be good to check for any other similar errors that snuck in

      This has now been corrected.

      (7) The cell outlines in Figure 3d are a bit hard to see both in print and on the screen, perhaps increase the displayed intensity?

      This has now been corrected.

      Description of analyses that authors prefer not to carry out

      R2.5. Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      The reviewer is correct that this is one of the exciting potential future applications of our model, which will first require us to generate stable fluorescently-tagged lines (to identify those cells which lack VANGL2). We will also need to extensively analyze controls to validate that mixing fluo-tagged and untagged lines does not alter the homogeneity of differentiation, or apical constriction, independently of VANGL2 deletion. As such, the reviewer is suggesting an altogether new project which carries considerable risk and will require us to secure dedicated funding to undertake.

      R3.8(Minor) The authors show a fascinating piece of data in Supplementary Figure 1, demonstrating that nuclear volume is halved by day 8. Do they have any indication if the DNA content remains constant (e.g., integrated DAPI density)? I suppose it must, and this is a minor point in the grand scheme, but this represents a significant nuclear remodeling and may impact the overall DNA accessibility.

      We agree with the reviewer that the reduction in nuclear volume is important data both because it informs understanding of the reduction in total cell volume, and because it suggests active chromatin compaction during differentiation. Unfortunately, the thicker epithelium and superimposition of nuclei in the differentiated condition means the laser light path is substantially different, making direct comparisons of intensity uninterpretable. Additionally, the apical-most nuclei will mostly be in G2/M phase due to interkinetic nuclear migration. As such, the comparison of DAPI integrated density between epithelial morphologies would not be informative (Author response image 4).

      Author response image 4.

      Lateral views of DAPI-stained nuclei on Days 2 and 8 of differentiation. Note the rapid loss of staining intensity below the apical pseudo-row of nuclei on Day 8. This intensity change is likely due to the apical nuclei being in G2/M phase and therefore having more DNA, and rapid loss of 405nm wavelength signal at depth.

    1. eLife Assessment

      The authors describe an interesting approach to studying the dynamics and function of membrane proteins in different lipid environments. The fundamental findings have theoretical and practical implications beyond the study of EGFR to all membrane signalling proteins. The evidence supporting the conclusions is compelling, based on the use of a nanodisk system to study membrane proteins in vitro, combined with state-of-the-art single-molecule FRET. The work will be of broad interest to cell biologists and biochemists.

    2. Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling, how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interested impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate but not all. The authors have just this bimodal model statistically and although adding a third component is a better fit, I agree with the authors that it cannot be justified statistically, given the data. Further work beyond the scope of this study would be needed to try to define further states.

    3. Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function. The manuscript describes a comprehensive study on the analysis of membrane protein function in context of different lipid environments.

      Weaknesses:

      As the implemented strategy is relatively new, some uncertainties in the interpretation of the data consequently remain. However, using state-of-the-art techniques, the authors support their results by appropriate data and sufficient controls in the revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.

      We thank the reviewer for highlighting the strengths of the study, including the use of nanodiscs, single-molecule FRET, and MD simulations to probe full-length EGFR in controlled membrane environments.

      We agree that statistical justification is important for interpreting the distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two Gaussians (µ = 2.64 and 3.43 ns) are not separable, implying they represent one broad distribution rather than two states.

      Author response table 1.

      Both the two- and three-Gaussian models include a low-value component (µ = ~1.3 ns), but the apparent improvement of the three-Gaussian model arises only from splitting the central population into two overlapping Gaussians. Thus, while the BIC favors the three-Gaussian model statistically, Ashman’s D demonstrates that the central peak should not be interpreted as bimodal. Therefore, when all the distributions are fit globally, the data are best explained as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework.

      We have clarified this in the revised manuscript by adding a section in the Methods (page 26) titled Model Selection and Statistical Analysis, which describes the results of the global two- versus three-Gaussian fits evaluated using BIC and Ashman’s D. Additional details of these analyses are also provided in response to Reviewer #1, Question 8 (Recommendations for the authors).

      Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.

      We thank the reviewer for noting the strengths of our approach, particularly the use of complementary techniques and the development of a new pipeline to study lipid effects on membrane protein function.

      Weaknesses:

      Due to the relative novelty of the approach, a number of concerns remain.

      (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?

      We monitored insertion of anionic lipids into nanodiscs by performing zeta potential measurements, which report on surface charge, and cholesterol insertion by Laurdan fluorescence, which reports on membrane order. Both assays provide information at the ensemble level, not single-nanodisc resolution. We clarified this in the Methods section (see below).

      Cholesterol clustering is well documented in ternary systems with saturated lipids and sphingolipids [Veatch, Biophys J., 2003; Risselada, PNAS, 2008]. However, in unsaturated POPC-cholesterol mixtures such as those used here, cholesterol primarily alters bilayer order and large-scale segregation is not typically observed.  The addition of POPS to the POPC-cholesterol mixture perturbs cholesterol-induced ordering, lowering the likelihood of cholesterol-rich domains [Kumar, J. Mol. Graphics Modell., 2021].

      Lipid heterogeneity between nanodiscs would be expected to give rise to heterogeneity in hydrodynamic properties, including potentially broadening the dynamic light scattering (DLS) distributions. However, the full width at half maximum (FWHM) values from the DLS measurements (see Author response table 2) do not indicate a broadening with cholesterol. Statistical testing (Mann-Whitney U test for non-normal data) showed no significant difference between samples with and without cholesterol (p = 0.486; n = 4 per group). While the sample size is small making firm conclusions challenging, these results suggest that large-scale heterogeneity is unlikely.

      Author response table 2.

      In the case of POPS lipids, clustering of POPS in EGFR embedded nanodiscs is a recognized property of receptor-lipid interactions. Molecular dynamics simulations have shown that POPS, although constituting only 30% of the inner leaflet, accounts for ~50% of the lipids directly contacting EGFR [Arkhipov, Cell, 2013], underscoring that anionic lipids are preferentially recruited to the receptor’s immediate environment.

      For nanodiscs containing cholesterol and anionic lipids, our smFRET experiments were designed to isolate the effect of EGF binding. The nanodisc population is the same in the ± EGF conditions as EGF was introduced just prior to performing sm-FRET experiments, and not during nanodisc assembly. Thus, for a given lipid composition, any observed differences between ligand-free and ligand-bound states reflect conformational changes of EGFR.

      Methods, page 23, “Zeta potential measurements to quantify surface charge of nanodiscs: Data analysis was processed using the instrumental Malvern’s DTS software to obtain the mean zeta-potential value. This ensemble measurement reports the average surface charge of the nanodisc population, verifying incorporation of anionic POPS lipids.”

      Methods, page 23, “Fluorescence measurements with Laurdan to confirm cholesterol insertion into nanodiscs: The excitation spectrum was recorded by collecting the emission at 440 nm and emission spectra was recorded by exciting the sample at 385 nm. Laurdan fluorescence provides an ensemble readout of membrane order and confirms cholesterol incorporation into the nanodisc population. While laurdan does not resolve the composition of individual nanodiscs, prior work has shown that POPC–cholesterol mixtures are miscible without forming cholesterol-rich domains[91,92], thus the observed ordering changes likely reflect the intended input cholesterol content at the ensemble level.”

      (91) Veatch, S. L. & Keller, S. L. Separation of liquid phases in giant vesicles of ternary mixtures of phospholipids and cholesterol. Biophysical journal, 85(5), 3074-3083 (2003).

      (92) Risselada, H. J. & Marrink, S. J. The molecular face of lipid rafts in model membranes. Proceedings of the National Academy of Sciences 105(45), 17367–17372 (2008).

      (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?

      We thank the reviewer for these insightful questions. Yes, the EGFR:ApoA1∆49 template ratio of 100:1 was empirically determined through optimization experiments now shown in the revised Supplementary Fig. 3. Cell-free reactions were performed across a range of EGFR:ApoA1∆49 template ratios (1:2 to 1:200) and sampled at different time points (2-19 hours). As shown in the gels, EGFR expression increased with higher template ratios and longer reaction times up to ~9 hours, while ApoA1 expression became clearly detectable only after 6 hours. Based on these results, we selected an EGFR:ApoA1∆49 ratio of 100:1 and 8-hour reaction time as the optimal condition, which yielded sufficient full-length EGFR incorporated into nanodiscs for ensemble and single-molecule experiments.

      In cell-free systems, protein yield does not scale directly with DNA template concentration, as translation efficiency is limited by factors such as ribosome availability and co-translational membrane insertion [Hunt, Chem. Rev., 2024; Blackholly, Front. Mol. Biosci., 2022]. Consistent with this, we observed that ApoA1∆49 is produced at higher levels than EGFR despite the lower DNA input (Supplementary Fig. 2b). Providing an excess EGFR template prevents the reaction from becoming limited by scaffold availability and helps compensate for the fact that, as a large multi-domain receptor, EGFR expression can yield truncated as well as full-length products. This strategy ensures that sufficient full-length receptors are available for nanodisc incorporation. We will clarify this in the Methods section (see below).

      We observed little to no visible precipitation under the reported cell-free conditions, likely due to the following reasons: (i) EGFR and ApoA1∆49 are co-expressed in the cell-free reaction, and ApoA1∆49 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink (ii) ApoA1∆49 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      The sample contains donor-labeled EGFR (snap surface 594) together with acceptor-labeled lipids (cy5-labeled PE doped in the nanodisc). We assess the oligomerization state of EGFR in nanodiscs using single-molecule photobleaching of the donor channel. Snap surface 594 is a benzyl guanine derivative of Atto 594 that reacts with the SNAP tag with near-stoichiometry efficiency [Sun, Chembiochem, 2011]. Most molecules (~75%) exhibited a single photobleaching step, consistent with incorporation of a single EGFR per nanodisc [Srinivasan, Nat. Commun., 2022]. A minority of traces (~15%) showed two photobleaching steps and about ~10% of traces showed three or more photobleaching steps, consistent with occasional multiple insertions. For all smFRET analysis, we restricted the dataset to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      Methods, page 20, “Production of labeled, full-length EGFR nanodiscs: Briefly, the E.Coli slyD lysate, in vitro protein synthesis E.Coli reaction buffer, amino acids (-Methionine), Methionine, T7 Enzyme, protease inhibitor cocktail (Thermofisher Scientific), RNAse inhibitor (Roche) and DNA plasmids (20ug of EGFR and 0.2ug of ApoA1∆49) were mixed with different lipid mixtures. The DNA template ratio of EGFR:ApoA1∆49 = 100:1 was empirically chosen by testing different ratios on SDS-PAGE gels and selecting the condition that maximized full-length EGFR expression in DMPC lipids (Supplementary Fig. 3).”

      (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.

      Negative-stain TEM was performed to confirm nanodisc formation and morphology, but this method does not resolve whether a given disc contains EGFR. To directly assess receptor stoichiometry, we instead relied on single-molecule photobleaching of snap surface 594-labeled EGFR (see response to Point 2). These experiments showed that the majority of nanodiscs contain a single receptor, with a minority containing two receptors. For all smFRET analyses, we restricted data to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      We did not normalize EGFR fluorescence to total protein concentration because the bulk protein fraction after IMAC purification includes both receptor-loaded and empty nanodiscs. The latter contribute to ApoA1∆49 mass but do not contain receptors and including them would underestimate receptor occupancy. Importantly, the presence of empty nanodiscs does not affect our measurements as photobleaching and single-molecule FRET analyses selectively report only on receptor-containing nanodiscs. This clarification has been added to the Methods.

      Methods, page 26, “Fluorescence Spectroscopy: Traces with a single photobleaching step for the donor and acceptor were considered for further analysis. Regions of constant intensity in the traces were identified by a change-point algorithm95. Donor traces were assigned as FRET levels until acceptor photobleaching. The presence of empty nanodiscs does not influence these measurements, as photobleaching and single-molecule FRET analyses selectively report on receptor-containing nanodiscs.”

      (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?

      We agree that not all nanodisc-embedded EGFR molecules may be fully functional and that the fraction of folded protein could vary with lipid composition. In our ATP-binding assay, EGFR detection relies on the C-terminal SNAP-tag fused to an intrinsically disordered region. Successful labeling requires that this segment be translated, accessible, and folded sufficiently to accommodate the SNAP reaction, which imposes an additional requirement compared to the rigid, structured kinase domain where ATP binds. Misfolded or truncated EGFR molecules would therefore likely fail to label at the C-terminus. These factors strongly imply that our assay predominantly reports on receptor molecules that are intact and well folded.

      Additionally, our molecular dynamics simulations at 0% and 30% POPS support the experimental ATP-binding measurements (Fig. 2c, d). This consistency between both the experimental and simulated evidence, including at 0% POPS where reduced receptor folding might be expected, suggests that the observed lipid-dependent changes are more likely due to modulation of the functional receptor rather than receptor misfolding. We have clarified these points by adding the following

      Results, page 7, “Role of anionic lipids in EGFR kinase activity: In the presence of EGF, increasing the anionic lipid content decreased the number of contacts from 71.8 ± 1.8 to 67.8 ± 2.4, indicating increased accessibility, again in line with the experimental findings. Because detection of EGFR relies on labeling at the C-terminus and ATP binding requires an intact kinase domain, the ATPbinding assay is for receptors that are properly folded and competent for nucleotide binding. The consistency between experimental results and MD simulations suggests that the observed lipiddependent changes are more likely due to modulation of functional EGFR than to artifacts from misfolding.”

      Reviewer #1 (Recommendations for the authors):

      The experimental program presented here is excellent, and the results are highly interesting. My enthusiasm is dampened by the presentation in places which is confusing, especially Figure 3, which contains so many of the results. I also have some reservations about the bimodal interpretation of the lifetime data in Figure 3.

      We thank the reviewer for their positive assessment of our experimental approach and results. In the revised version, we have improved figure organization and readability by adding explicit labels for lipid composition and EGF presence/absence in all lifetime distributions, moving key supplementary tables into main text, and reorganizing the supplementary figures as Extended Data Figures following eLife’s format. Figures and tables now appear in the order in which they are referenced in the text to further improve readability.

      Regarding the bimodal interpretation of the lifetime distribution, we have performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC) and Ashman’s D analysis, which supported the bimodal interpretation. Details of this analysis are provided in our response to comment (8) below and included in the manuscript.

      Specific comments below:

      (1) Abstract -"Identifying and investigating this contribution have been challenging owing to the complex composition of the plasma membrane" should be "has".

      We have corrected this error in the revised manuscript.

      (2) Results - p4 - some explanation of what POPC/POPS are would be helpful.

      We have added the text below discussing POPC and POPS.

      Results, page 4, “POPC is a zwitterionic phospholipid forming neutral membranes, whereas POPS carries a net negative charge and provides anionic character to the bilayer[56]. Both PC and PS lipids are common constituents of mammalian plasma membranes, with PC enriched in the outer leaflet and PS in the inner leaflet[22].”

      (22) Lorent, J. H., Levental, K. R., Ganesan, L., Rivera-Longsworth, G., Sezgin, E., Doktorova, M., Lyman, E. & Levental, I. Plasma membranes are asymmetric in lipid unsaturation, packing and protein shape. Nature Chemical Biology 16, 644–652 (2020).

      (56) Her, C., Filoti, D. I., McLean, M. A., Sligar, S. G., Ross, J. A., Steele, H. & Laue, T. M. The charge properties of phospholipid nanodiscs. Biophysical journal 111(5), 989–998 (2016).

      (3) Figure 2b - it would be easier to compare if these were plotted on top of each other. Are we at saturating ATP binding concentration or below it? Also, please put a key to say purple - absent and orange +EGF on the figure. I am also confused as to why, with no EGF, ATP binding is high with 0% POPS, but low when EGF is present, but that then reverses with physiological lipid content.

      While we agree that a direct comparison would be easier, the ATP-binding experiments for the ± EGF conditions were actually performed independently on separate SDS-PAGE gels, which unfortunately precludes such a comparison. We have added a color key to clarify the -EGF and +EGF datasets.

      The experiments were carried out at 1 µM of the fluorescently labeled ATP analogue (atto647Nγ ATP). Reported kinetic measurements for the isolated EGFR kinase domain indicate an K<sub>m</sub> of 5.2 µM suggesting that our experimental concentration is below, but close to the saturating range ensuring sensitivity to changes in accessibility of the binding site rather than saturating all available receptors.

      We have revised the manuscript to clarify these details by including the following text:

      Results, page 6, “To investigate how the membrane composition impacts accessibility, we measured ATP binding levels for EGFR in membranes with different anionic lipid content. 1 µM of fluorescently-labeled ATP analogue, atto647N-γ ATP, which binds irreversibly to the active site, was added to samples of EGFR nanodiscs with 0%, 15%, 30% or 60% anionic lipid content in the absence or presence of EGF.”

      Methods, page 24, “ATP binding experiments: Full-length EGFR in different lipid environments was prepared using cell-free expression as described above. 1μM of snap surface 488 (New England Biolabs) and atto647N labeled gamma ATP (Jena Bioscience) was added after cell-free expression and incubated at 30 °C , 300 rpm for 60 minutes. 1μM of atto647N-γ ATP was used, corresponding to a concentration near the reported Km of 5.2 µM for ATP binding to the isolated EGFR kinase domain[93], ensuring sensitivity to lipid-dependent changes in ATP accessibility.”

      (ii) Nucleotide binding is suppressed under basal conditions, likely to ensure that the catalytic activity is promoted only upon EGF stimulation.

      The molecular dynamics simulations at 0% and 30% POPS further support this interpretation, showing that anionic lipids modulate the accessibility of the ATP-binding site in a manner consistent with experimental trends (Fig. 2c and 2d).

      We have clarified these points in the main text with the following additions:

      Results, page 6, “In the presence of EGF, ATP binding overall increased with anionic lipid content with the highest levels observed in 60% POPS bilayers. In the neutral bilayer, ligand seemed to suppress ATP binding, indicating anionic lipids are required for the regulated activation of EGFR.”

      Results, page 7, “In the absence of EGF, increasing the anionic lipid content from 0\% POPS to 30% POPS increased the number of ATP-lipid contacts 58.6±0.7 to 74.4±1.2, indicating reduced accessibility, consistent with the experimental results and suggesting anionic lipids are required for ligand-induced EGFR activity.”

      (93) Yun, C. H., Mengwasser, K. E., Toms, A. V., Woo, M. S., Greulich, H., Wong, K. K., Meyerson,M. & Eck, M.J. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. PNAS, 105(6), 2070–2075 (2008).

      (4) Figure 2d - how was the 16A distance arrived at?

      We thank the reviewer for pointing this out. The 16 Å cutoff was chosen based on the physical dimensions of the ATP analogue used in the experiments. Specifically, the largest radius of the atto647N-γ ATP molecule is ~16.9 Å, which defines the maximum distance at which lipid atoms could sterically obstruct access of ATP to the binding pocket. Accordingly, in the simulations, contacts were defined as pairs of coarse-grained atoms between lipid molecules and the residues forming the ATP-binding site (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831) separated by less than 16 Å.

      We have rewritten the rationale for selecting the 16 Å cutoff in the Methods section to improve clarity.

      Methods, page 28, “Coarse-grained, Explicit-solvent Simulations with the MARTINI Force Field: We analyzed our simulations using WHAM[108,109] to reweight the umbrella biases and compute the average values of various metrics introduced in this manuscript. Specifically, we calculated the distance between Residue 721 and Residue 1186 (EGFR C-terminus) of the protein. To quantify the accessibility of the ATP-binding site, we calculated the number of contacts between lipid molecules and the residues forming the ATP-binding pocket (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831)[110]. Close contact between the bilayer and these residues would sterically hinder ATP binding; thus, the contact number serves as a proxy for ATP-site accessibility. The cutoff distance for defining a contact was set to 16 Å, corresponding to the largest molecular radius of the fluorescent ATP analogue (atto647N-γ ATP, 16.96 Å111). Accordingly, we defined a contact as a pair of coarse-grained atoms, one from the lipid membrane and one from the ATP binding site, within a mutual distance of less than 16 Å.”

      (5) Figure 2e-h - I think a bar chart/violin plot/jitter plot would make it easier to compare the peak values. The statistics in the table should just be quoted in the text as value +/- error from the 95% confidence interval. The way it is written currently is confusing, as it implies that there is no conformational change with the addition of EGF in neutral lipids, but there is ~0.4nm one from the table. I don't understand what you mean by "The larger conformational response of these important domains suggests that the intracellular conformation may play a role in downstream signaling steps, such as binding of adaptor proteins"?

      We thank the reviewer for these suggestions. For the smFRET lifetime distributions (Figure 2j, k; previously Figure 2e, f), we have now included jitter plots of the donor lifetimes in the Supplementary Figure 11 to facilitate direct visual comparison of the median and distribution widths for each lipid composition and ±EGF conditions. The distance distributions for the ATP to C-terminus in Figure 2e, f (previously Figure 2g, h) were obtained from umbrella-sampling simulations that calculate free-energy profiles rather than raw, unbiased distance values. Because the sampling is guided by biasing potentials, individual distance values cannot be used to construct violin or jitter plots. We therefore present the simulation data only as probability density distributions, which best reflect the equilibrium distributions derived from them.

      We have also revised the text to report the median ± 95% confidence interval, improving clarity and consistency with the statistical table.

      Results, page 9: “In the neutral bilayer (0% POPS), the distributions in the absence of EGF peaks at 8.1 nm (95% CI: 8.0–8.2 nm) and in the presence of EGF peaks at 8.6 nm (95% CI: 8.5–8.7 nm) (Table 1, Supplementary Table 1). In the physiological regime of 30% POPS nanodiscs, the peak of the donor lifetime distribution shifts from 9.1 nm (95% CI: 8.9–9.2 nm) in the absence of EGF to 11.6 nm (95% CI: 11.1–12.6 nm) in the presence of EGF (Table 1, Supplementary Table 1), which is a larger EGF-induced conformational response than in neutral lipids.”

      Finally, we have rephrased the sentence in question for clarity. The revised text now reads:

      Results, page 9: “The larger conformational response observed in the presence of anionic lipids suggests that these lipids enhance the responsiveness of the intracellular domains to EGF, potentially ensuring interactions between C-terminal sites and adaptor proteins during downstream signaling.”

      (6) "r, highlighting that the charged lipids can enhance the conformational response even for protein regions far away from the plasma membrane" - is it not that the neutral membrane is just very weird and not physiological that EGFR and other proteins don't function properly?

      We agree with the reviewer that completely neutral (0% POPS) membranes are not physiological and likely do not support the native organization or activity of EGFR. We have revised the text to clarify that the 30% POPS condition represents a more native-like lipid environment that restores or stabilizes the expected conformational response, rather than "enhancing" it. The revised sentence now reads:

      Results, page 10: “Both experimental and computational results show a larger EGF-induced conformational change in the partially anionic bilayer, consistent with the notion that a partially anionic lipid bilayer provides a more native environment that supports proper receptor activation, compared to the non-physiological neutral membrane.”

      (7) "snap surface 594 on the C-terminal tail as the donor and the fluorescently-labeled lipid (Cy5) as the acceptor (Supplementary Fig. 2, 11)." Why not refer to Figure 3a here to make it easier to read?

      We have added the reference to Figure 3a, and we thank the Reviewer for the suggestion.

      (8) Figure 3 - the bimodality in many of these plots is dubious. It's very clear in some, i.e. 0% POPS +EGF, but not others. Can anything be done to justify bimodality better?

      We agree that statistical justification is important for interpreting lifetime distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two of the Gaussians are not separable, implying they represent one broad distribution rather than two discrete states. Therefore, when all the distributions are fit globally, the data are best described as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. We better justified our choice of model by incorporating the results of the global two- vs three-Gaussian fits with BIC and Ashman’s D analysis in the revised manuscript.

      Methods, page 27: “Model Selection and Statistical Analysis

      Global fitting of lifetime distributions was performed across all experimental conditions using maximum likelihood estimation. Both two-Gaussian and three-Gaussian distribution models were evaluated as described previously.62 Model performance was compared using the Bayesian Information Criterion (BIC),[101] which balances model likelihood and complexity according to

      BIC = -2 ln L + k ln n

      where L is the likelihood, k is the number of free parameters, and n is the number of singlemolecule photon bunches across all experimental conditions. A lower BIC value indicates a statistically better model[101]. The separation between Gaussian components was subsequently assessed using the Ashman’s D where a score above 2 indicates good separation[102]. For two Gaussian components with means µ1, µ2 and standard deviations σ1, σ2,

      where Dij represents the distance metric between Gaussian components i and j. All fitted parameters, likelihood values, BIC scores, and Ashman’s D values are summarized in Supplementary Table 5.”

      (101) Schwarz, G. Estimating the dimension of a model. The Annals of Statistics, 461–464 (1978).

      (102) Ashman, K. M., Bird, C. M. & Zepf, S. E. Detecting bimodality in astronomical datasets. The Astronomical Journal 108(6), 2348–2361 (1994).

      (9) Figure 3c - can you better label the POPS/POPC on here?

      We thank the reviewer for this suggestion. In the revised manuscript, Figure 3b (previously Figure 3c) has been updated to label the lipid composition corresponding to each smFRET distribution to make the comparison across conditions easier to follow.

      (10) Figure 3g - it looks like cholesterol causes a shift in both the peaks, such that the previous open and closed states are not the same, but that there are 2 new states. This is key as the authors state: "Remarkably, high anionic lipids and cholesterol content produce the same EGFR conformations but with opposite effects on signaling-suppression or enhancement." But this is only true if there really are the same conformational states for all lipid/cholesterol conditions. Again, the bimodal models used for all conditions need to be justified.

      We appreciate the reviewer’s insightful comment. We agree that the interpretation of the lifetime distributions depends on whether cholesterol and anionic lipids modulate existing conformational states or create new ones. To test this, we performed global fits of all distributions using the two- and three-Gaussian models and compared them using the Bayesian Information Criterion (BIC) and Ashman’s D, the results of which are described in detail in response to (8) above.

      Both fitting models, two- and three-Gaussian, identified the same short lifetime component (µ = 1.3 ns), suggesting this reflects a well separated conformation. While the three-Gaussian model gave a lower BIC, Ashman’s D analysis indicated that the two of the three components (µ = 2.6 ns and 3.4 ns) are not statistically separable, suggesting they represent a single broad conformational population rather than distinct states. If instead these two components reflected distinct states present under different conditions, Ashman’s D analysis would have found the opposite result. This supports our interpretation that high cholesterol and high anionic lipid content produce similar conformation ensembles with opposite effects on signaling output.

      Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework. We have clarified this rationale in the revised manuscript and added the results of the BIC and Ashman’s D analysis to support this interpretation.

      (11) Why are we jumping about between figures in the text? Figure 1d is mentioned after Figure 2. Also, DMPC is shown in the figures way before it is described in the text. It is very confusing. Figure 3 is so compact. I think it should be spread out and only shown in the order presented in the text. Different parts of the figure are referred to seemingly at random in the text. Why is DMPC first in the figure, when it is referred to last in the text?

      Following the Reviewer’s comment, we have revised the figure order and layout to improve readability and ensure consistency with the text. The previous Figures 1d-f which introduce the single-molecule fluorescence setup are now Figure 2g-i, positioned immediately before the first single-molecule FRET experiments (Fig 2j, k). The DMPC distribution in Figure 3 has been moved to the Supplementary Information (Supplementary Fig. 17), where it is shown alongside POPC, as these datasets are compared in the section “Mechanism of cholesterol inhibition of EGFR transmembrane conformational response”. The smFRET distributions in Figure 3 are now presented in the same sequence as they are discussed in the text, and the figure has been spread out for better clarity.

      (12) Throughout, I find the presentation of numerical results, their associated error, and whether they are statistically significantly different from each other confusing. A lot of this is in supplementary tables, but I think these need to go in the main text.

      To improve clarity and ensure that key quantitative results are easily accessible, we have moved the relevant supplementary tables to the main text. Specifically, the following tables have been incorporated into the main manuscript:

      (i) Median distance between the ATP binding site and the EGFR C-terminus, or between membrane and EGFR C-terminus from smFRET measurements (previously supplementary table 1 is now main table 1)

      (ii) Median distance between the membrane and the EGFR C-terminus in different anionic lipid environments (previously supplementary table 4 is now main table 2)

      (iii) Median distance between the membrane and the EGFR C-terminus in different cholesterol environments (previously supplementary table 8 and 12 is now combined to be main table 3)

      (13) Supplementary figures - in general, there is a need to consider how to combine or simplify these for eLife, as they will have to become extended data figures.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have reorganized the supplementary figures into extended data figures in accordance with eLife’s format. Specifically:

      - Supplementary Figs. 1–7 are now grouped as Extended Data Figures for Figure 1 in the main text. They are now Figure 1 - figure supplements 1–7.

      - Supplementary Fig. 8–11 is now Extended Data Figure associated with Figure 2. It is now Figure 2 - figure supplements 1–4.

      - Supplementary Figs. 12–17 are now grouped as Extended Data Figures for Figure 3. They are now Figure 3 - figure supplements 1–6.

      (14) Supplementary Figure 2 - label what the two bands are in the EGFR and pEGFR sets at the bottom of panel c.

      We thank the reviewer for this comment. The two bands shown in the EGFR and pEGFR blots in Supplementary Fig. 2d (previously Supplementary Fig. 2c) corresponds to replicate samples under identical conditions. We have now clarified this in the figure legend and labeled the lanes as “Rep 1” and “Rep 2” in the revised figure and modified the figure legend.

      Supplementary Figure 2, page 31: “(d) Western blots were performed on labelled EGFR in nanodiscs. Anti-EGFR Western blots (left) and anti-phosphotyrosine Western blots (right) tested the presence of EGFR and its ability to undergo tyrosine phosphorylation, respectively, consistent with previous experiments on similar preparations[18, 54, 55]. The two lanes in each blot correspond to replicate samples under identical conditions.”

      (15) Supplementary Figures 3+4 - a bar chart/boxplot or similar would be easier for comparison here.

      In the revised version, we have replaced the histograms with jitter plots showing the nanodisc size distributions for each condition in supplementary figures 4 and 5 (previously supplementary figures 3 and 4). The plots display individual measurements with a horizontal line indicating the mean size (mean ± standard deviation values provided in the caption).

      (16) Supplementary Figures 10, 12, 13, 15, 16 - I would jitter these.

      We have incorporated jitter plots for the relevant datasets in Supplementary Figures 11, 13, 15, 16 and 17 (previously supplementary figures 10, 12 13, 15 and 16) to provide a clearer visualization of the data distributions and median values.

      Reviewer #2 (Recommendations for the authors):

      (1) Reactions were performed in 250 µL volumes. What is the average yield of solubilized EGFR in those reactions? Are there differences in the EGFR solubilization with the various lipid mixtures?

      The amount of solubilized EGFR produced in each 250 µL cell-free reaction was below the reliable detection limit for quantitative absorbance assays. At these protein levels, little to no EGFR precipitation was observed for all lipid compositions. Although exact yields could not be determined, fluorescence-based detection confirmed the presence of functional, nanodiscincorporated EGFR suitable for smFRET and ensemble fluorescence experiments. We observed variability in total yield between independent reactions within the same lipid composition, which is common for cell-free systems, but no consistent trend attributable to lipid composition.

      (2) Figure S2: It would be better to have a larger overview of the particles on a grid to get a better impression of sample homogeneity.

      TEM images showing a larger field of view have been added for each lipid composition in Supplementary Figures 4 and 5.

      (3) Figure 2b: It appears that there is some variation in the stoichiometry of ApoA1 and EGFR within the samples. Have equal amounts of each sample been analyzed? Are there, in addition, some precipitates of EGFR? It would further be good to have a negative control without expression to get more information about the additional bands in Figure S2b. As they do not appear in the fluorescent gel, it is unlikely that they represent premature terminations of EGFR.

      The fluorescence intensity from the bound ATP analogue (Atto 647N-ATP) and from the snap surface 488 label, which binds stoichiometrically to the SNAP tag at the EGFR C-terminus, was measured for each sample. The relative amount of ATP binding was quantified for each sample by normalizing to the EGFR content (Figure 2b). This normalization accounts for the different amounts of EGFR produced in each condition.

      We did not observe any visible precipitation under the reported cell-free conditions, likely due to the following reasons:

      (i) EGFR and ApoA1 are co-expressed in the cell-free reaction, and ApoA1 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink

      (ii) ApoA1 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      A control cell-free reaction containing only ApoA1∆49 (1 µg) and no EGFR template, analyzed after affinity purification, showed a single prominent band at ~ 25 kDa (gel image below), corresponding to ApoA1, along with faint background bands typical of Ni-NTA purification from cell-lysates. These weak, non-specific bands likely arise from co-purification of endogenous E.coli proteins.  

      The ApoA1∆49-only control gel has now been included as part of the supplementary figure 2.

      (4) Figure S2c: It would be better to show the whole lanes to document the specificity of the antibodies. Anti-Phosphor antibodies are frequently of poor selectivity. In that case, a negative control with corresponding tyrosine mutations would be helpful.

      We have updated Figure S2d (previously Figure S2c) to include the full gel lanes to better illustrate the specificity of both the total EGFR and phospho-EGFR (Y1068) antibodies. The results show a single clear band at the expected molecular weight for EGFR, conforming antibody specificity.

      (5) The Results section already contains quite some discussion. I would thus recommend combining both sections.

      We thank the reviewer for the suggestion. We have now created a results and discussion section to better reflect the content of these paragraphs, with the previous discussion section now a subsection focused on implications of these results.

    1. eLife Assessment

      This valuable paper advances understanding of the role of the HGF receptor, MET, in cancer cell invasion by demonstrating HGF-induced coordinated trafficking of MET and metalloprotease MT1-MMP into invadopodia. The results are generally solid, but there are concerns about the cell biology and whether the trafficking mechanism is clinically relevant. It's also unclear whether this is a general mechanism or specific to triple-negative breast cancer cells. The paper will be of interest to cancer cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endodomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      Weaknesses:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1-MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      Weaknesses:

      (1) Inappropriate stimulation times for endocytosis and recycling assays.

      The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligand-induced RTK trafficking.

      (2) Low efficiency of MET silencing in Figure S1I.

      The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      (3) Missing information on incubation times and inconsistencies in MET protein levels.

      The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 prior to immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies.

      Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues.

      (4) Insufficient representation and randomization of microscopic data.

      For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      (5) Use of a single siRNA/shRNA per target.

      As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      (6) Insufficient controls for antibody specificity.

      The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      (7) Inadequate demonstration of MET recycling.

      MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      (8) Insufficient evidence for MET-MT1-MMP interaction.

      The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      (9) Inconsistent use of cell lines and lack of justification.

      The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT-549 in supplementary figures).

      (10) Inconsistency in invadopodia numbers under identical conditions.

      The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      (11) Questionable colocalization in some images.

      In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      (12) Abstract, Introduction, and Discussion require substantial rewriting.

      (a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context.

      (b) The introduction should better describe the cellular processes and proteins investigated in this study.

      (c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endosomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      We sincerely thank the reviewer for his/her positive feedback and for considering our study to be well executed and rigorous. The valuable suggestions and comments will certainly improve the understanding of the role of the RAB14-RCP-KIF16B axis in MET trafficking and breast cancer invasion. Below we have addressed each of the concerns and suggestions point to point raised by the reviewer.

      Weakness:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed

      We thank the reviewer for raising this point. We want to clarify that in TNBCs, the lack of the hormonal receptor progesterone receptor, estrogen receptor, and HER2 makes the overexpression of EGFR and MET crucial in terms of prognosis and treatment (PMID: 27655711, 25368674). Hence study of MET signalling and trafficking is more relevant for TNBCs compared to other cancer cells. We will add an explanation in the discussion section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      We greatly appreciate the positive assessment we have from the reviewer, who also acknowledged the novelty and relevance of our study. Below, we have carefully addressed the comments/concerns raised regarding this study and will strengthen the reliability and robustness by revisiting the data, providing additional analyses where required, and clarifying methodological details.

      Weakness:

      (1) Inappropriate stimulation times for endocytosis and recycling assays. The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligandinduced RTK trafficking.

      We understand the reviewer’s concern regarding the HGF stimulation time point for endocytosis and recycling. We want to highlight that to study the recycling/surface delivery of MET in response to HGF, we performed TIRF microscopy-based imaging, where images were taken within 1h of HGF addition (Fig. 2I). Additionally, we will incorporate surface biotinylation to show the recycling of MET as suggested in comment -7. Moreover, we have observed the effect of HGF on gelatin degradation and invadopodia formation after 3h of HGF stimulation. We were curious to know where MET resides with prolonged ligand stimulation. Hence, to study the localization of MET to invadopodia or the endocytic markers, the cells were stimulated with HGF for 2-3 hours. 

      (2) Low efficiency of MET silencing in Figure S1I. The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      We understand the concern raised by the reviewer. We want to emphasize that we have employed three different approaches to investigate the effect of MET silencing/inhibition on invadopodia formation. (i) A MET kinase inhibitor, PHA665752, which shows reduced invadopodia formation. (Fig. 1D, E). (ii) Silencing with shRNA: Since the level of silencing of MET with the shRNA was not sufficient, cells were stained with MET as a readout for MET silencing, and images of the cells with depleted MET expression were captured, and invadopodia numbers were quantified (Fig. 1F). (iii) Using the SMARTpool siRNA of MET, we have shown the MT1-MMP containing invadopodia in Fig S5E, which shows another evidence of the role of MET in invadopodia activity. An additional graph showing invadopodia formation derived from the siRNA-mediated MET silencing will be added to the revised figure.

      (3) Missing information on incubation times and inconsistencies in MET protein levels. The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 before immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies. Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues. 

      We apologise for the unintentional omission of experimental detailing about HGF or drug incubation time, which will be incorporated into the figure legend appropriately. The blot will be replaced with a more appropriate representative image.

      Regarding the decreased MET level in the drug-treated condition: literature suggests that the MET inhibitor PHA665752 also promotes MET degradation, corroborating our result shown in Fig. S1J (PMID: 15788682, 18327775). Further in Fig. S1J, the relative phosphorylation of MET when compared to the total MET level in the HGF-treated condition is higher. We will add the quantification in the revised manuscript to add more clarity.

      Next, in the fig. S1C, the rabbit anti-MET (CST, D1C2 XP) antibody has been used, which binds to a c-terminal motif of MET and identifies both the 170kDa as well as 140kDa protein representing the uncleaved and cleaved form of MET. In Fig. S1J, the mouse antiMET (CST, L6E7) antibody has been used, which binds to an N-terminal motif of MET and recognizes only the 140kDa protein.

      (4) Insufficient representation and randomization of microscopic data. For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      We thank the reviewer for raising this point. The single-cell images are shown for clarity and to visualize the subcellular features; however, the conclusions are made based on the quantitative analysis of multiple cells collected from multiple frames (at least 30 frames per condition). Here, we would like to highlight that the image acquisition has been done over random fields in a coverslip. In the graphs shown in Fig. 1B, 1C, 4F, S1F, S1H, S5J’ it can be seen that there are frames where there is no degradation or invadopodia formed, which has also been taken into account. For a better representation of the population of cellforming invadopodia, a graph showing the percentage of cells forming invadopodia will be added to the figure.

      (5) Use of a single siRNA/shRNA per target. As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      We would like to clarify that we are using SMARTPool siRNA, which contains 4 individual siRNAs for the target gene. Literature suggests that using a pool of siRNA has reduced offtarget effects compared to using single oligos for gene silencing (PMID: 14681580, 33584737, 24875475).

      While SMARTpool siRNA minimizes the off-target effect, it does not eliminate the possibility of it. To confirm that the observed phenotypes are specifically attributable to the genes investigated in this study, we will perform additional experiments using two independent siRNAs targeting RCP and RAB14. RAB4 is known to be associated with MET trafficking (PMID: 21664574, 30537020), and we have taken RAB4 as a positive control. Hence, we feel the suggested experiment is not required to support the conclusion made regarding RAB4.

      For MET, we have used shRNA and an inhibitor to show the effect of MET inhibition/perturbation in the invadopodia-associated activity, which validates the observations of siRNA-mediated gene silencing.

      We have shown the effect of MT1-MMP depletion on invadopodia formation using a CRISPR-based gene knock-out study, and another study from our group has shown the effect using siRNA (PMID: 31820782), which supports our MT1-MMP KO cell observation.    

      (6) Insufficient controls for antibody specificity. The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      MET immunofluorescence staining in the MET-depleted condition has been provided in Fig. 1F, and an immunoblot for the siRNA-mediated gene silencing has been provided in Fig. S5C. We will add the entire field of view to show the MET silencing in Fig. 1F.

      The inhibition of MET kinase activity using PHA665752 abolished the MET phosphorylation, as shown in Fig S1J. In line with Joffre et.al. Fig 3C, S2I shows increased Tyr 1234/1235 phosphorylation of M1250T MET mutant (PMID: 21642981). Further, studies have shown the specificity of the antibody by immunoblotting and immunofluorescence using MET inhibitors (PMID: 21973114, 41009793).

      For the MT1-MMP immunoblot showing significant depletion in MT1-MMP protein level by the SMARTpool siRNA has been provided in Fig. S5L. Further MT1-MMP silencing has been validated by immunofluorescence in the following studies. PMID: 22291036, 21571860, 20505159.

      (7) Inadequate demonstration of MET recycling. MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      We appreciate the reviewer’s suggestion for an alternative approach to show MET trafficking. We aim to demonstrate MET trafficking using a biochemical approach, which will be included in the revised version. 

      (8) Insufficient evidence for MET-MT1-MMP interaction. The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      We thank the reviewer for pointing out the lack of MET-MT1-MMP interaction at the endogenous level. We have carried out the immunoprecipitation of endogenous MET to validate the interaction with MT1-MMP. However, we could not capture the interaction of these proteins at endogenous levels. We hypothesize that the interaction between MT1MMP and MET may be weak in nature, with a high K<sub>d</sub> value, and accordingly, it was difficult to precipitate the endogenous MT1-MMP by MET. The immunoblot will be added to the revised manuscript and discussed.

      (9) Inconsistent use of cell lines and lack of justification. The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT549 in supplementary figures).

      MDA-MB-231 and BT-549 are two well-characterized TNBC cell lines, which are being used extensively to study breast cancer cell invasion. These two cell lines also show overexpression of MET, making them suitable model cell lines for our study. 

      MDA-MB-231 has less transfection efficiency compared to BT-549. Additionally, MET is also a difficult gene to transfect, making it hard to perform experiments in MDA-MB-231 with MET overexpression. Though most of the experiments have been performed in both cell lines, a few of the studies have been performed only in the BT-549 cells. Further, we have focused on displaying the different approaches taken to validate an observation in the main figure, which led to showing the data in distinct cell lines.

      Also, showing observations in different cell lines is a practice that has been followed by multiple authors in the past. (PMID:  39751400, 41079612, 25049275, 22366451)

      (10) Inconsistency in invadopodia numbers under identical conditions. The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      We sincerely thank the reviewer for pointing out the inconsistency in invadopodia numbers across 2 experiments. Fig. 1C has 2 conditions: UT and the HGF-treated condition. The Untreated condition has the serum-free media without any stimulation. Whereas we have added vehicle (DMSO) in Fig. 1D, E, since the drug is resuspended in DMSO. This difference in the treatment is likely to be responsible for the decreased numbers of invadopodia in Fig. 1E.

      (11) Questionable colocalization in some images. In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      We thank the reviewer for the valuable comment. The apparent lack of convincing colocalization is likely due to the relatively lower fluorescence intensity of MET at these structures. We will add the line intensity plots for the indicated puncta to show the intensity of both channels in the figure.

      To quantify the colocalization of two channels, we have used the automated image analysis software motiontracking (motiontracking.mpi-cbg.de), which has been detailed in the method section. Motiontracking considers only those objects to be colocalized if there is an overlapping area of more than 35% between the two channels. Lastly, the apparent colocalization is corrected for random colocalization, which is the random permutation of object colocalization. This makes object-based colocalization more reliable than intensitybased colocalization. 

      (12) Abstract, Introduction, and  Discussion require substantial rewriting. a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context. b) The introduction should better describe the cellular processes and proteins investigated in this study. c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

      We thank the reviewer for this suggestion. We will modify the abstract, introduction, and discussion as per the suggestion.

    1. eLife Assessment

      This paper reports new data on the structure of the human CTF18-RFC clamp loader complex bound to the PCNA clamp. The new and convincing data compliment previous reports of CTF-RFC-PCNA structures and as such, represents an important contribution.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report the structure of the human CTF18-RFC complex bound to PCNA. Similar structures (and more) have been reported by the O'Donnell and Li labs. This study should add to our understanding of CTF18-RFC in DNA replication and clamp loaders in general. However, there are numerous major issues that I recommend the authors fix.

      Strengths:

      The structures reported are strong and useful for comparison with other clamp loader structures that have been reported lately.

    3. Reviewer #2 (Public review):

      Summary

      Briola and co-authors have performed a structural analysis of the human CTF18 clamp loader bound to PCNA. The authors purified the complexes and formed a complex in solution. They used cryo-EM to determine the structure to high resolution. The complex assumed an auto-inhibited conformation, where DNA binding is blocked, which is of regulatory importance and suggests that additional factors could be required to support PCNA loading on DNA. The authors carefully analysed the structure and compared it to RFC and related structures.

      Strength & Weakness

      Their overall analysis is of high quality, and they identified, among other things, a human-specific beta-hairpin in Ctf18 that flexible tethers Ctf18 to Rfc2-5. Indeed, deletion of the beta-hairpin resulted in reduced complex stability and a reduction in the rate of primer extension assay with Pol ε. Moreover, the authors identify that the Ctf18 ATP-binding domain assumes a more flexible organisation.

      The data are discussed accurately and relevantly, which provides an important framework for rationalising the results.

      All in all, this is a high-quality manuscript that identifies a key intermediate in CTF18-dependent clamp loading.

    4. Reviewer #3 (Public review):

      Summary:

      CTF18-RFC is an alternative eukaryotic PCNA sliding clamp loader which is thought to specialize in loading PCNA on the leading strand. Eukaryotic clamp loaders (RFC complexes) have an interchangeable large subunit which is responsible for their specialized functions. The authors show that the CTF18 large subunit has several features responsible for its weaker PCNA loading activity, and that the resulting weakened stability of the complex is compensated by a novel beta hairpin backside hook. The authors show this hook is required for the optimal stability and activity of the complex.

      Relevance:

      The structural findings are important for understanding RFC enzymology and novel ways that the widespread class of AAA ATPases can be adapted to specialized functions. A better understanding of CTF18-RFC function will also provide clarity into aspects of DNA replication, cohesion establishment and the DNA damage response.

      Strengths:

      The cryo-EM structures are of high quality enabling accurate modelling of the complex and providing a strong basis for analyzing differences and similarities with other RFC complexes. They use complementary pre-steady state FRET and polymerase primer extension assays to investigate the role of a unique structural element in CTF18.

      Weaknesses:

      The manuscript would have benefited from a more detailed biochemical analysis using mutagenesis and assays to tease apart the functional relevance of the many differences with the canonical RFC complex.

      Overall appraisal:

      Overall, the work presented here is solid and important. The data is sufficient to support the stated conclusions.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors report the structure of the human CTF18-RFC complex bound to PCNA. Similar structures (and more) have been reported by the O'Donnell and Li labs. This study should add to our understanding of CTF18-RFC in DNA replication and clamp loaders in general. However, there are numerous major issues that I recommend the authors fix. 

      Strengths: 

      The structures reported are strong and useful for comparison with other clamp loader structures that have been reported lately. 

      Comments on revisions: 

      The revised manuscript is greatly improved. The comparison with hRFC and the addition of direct PCNA loading data from the Hedglin group are particular highlights. I think this is a strong addition to the literature.

      We thank the reviewer for their positive comments.  

      I only have minor comments on the revised manuscript. 

      (1) The clamp loading kinetic data in Figure 6 would be more easily interpreted if the three graphs all had the same x axes, and if addition of RFC was t=0 rather than t=60 sec.

      We now analyze and plot EFRET as a function of time after complex addition, effectively setting the loader addition to t = 0 for each trace (Figure 6 and Figs S10-14 in the new manuscript). Baseline (Ymin) and plateau (Ymax) EFRET values were obtained by averaging the stable signal regions immediately before and after clamp-loader addition, respectively. Traces are normalized to their own dynamic range before fitting.

      (2) The author's statement that "CTF18-RFC displayed a slightly faster rate than RFC" seems to me a bit misleading, even though this is technically correct. The two loaders have indistinguishable rate constants for the fast phase, and RFC is a bit slower than CTF18-RFC in the slow phase. However, the data also show that RFC is overall more efficient than CTF18-RFC at loading PCNA because much more flux through the fast phase (rel amplitudes 0.73 vs 0.36). Because the slow phase represents such a reduced fraction of loading events, the slight reduction in rate constant for the slow phase doesn't impact RFC's overall loading. And because the majority of loading events are in the fast phase, RFC has a faster halftime than CTF18-RFC. (Is it known what the different phases correspond to? If it is known, it might be interesting to discuss.)

      We removed the quoted statement. We avoid comparing amplitude partitions (A₁/A_T) for CTF18-RFC because (i) a substantial fraction of the reaction occurs within the <7 s dead time, and (ii) single- vs double-exponential identifiability differs across complexes. Instead, we report model-minimal progress times: RFC t<sub>0.5</sub> ≤ 7 s (faster onset), CTF18-RFC ~ 8 s, CTF18<sup>Δ165–194</sup>-RFC ~ 12 s; completion (t<sub>0.95</sub>): RFC ≈ 77 s, CTF18-RFC ≈ 77 s, mutant ≈ 145 s. This shows RFC has the steeper onset, while CTF18-RFC catches up in completion, and the mutant is slower overall. We briefly note that RFC’s phases have been assigned in prior stopped-flow work and are consistent with a rapid entry step and a slower repositioning/complex release phase; we do not assign phases for CTF18-RFC here and instead rely on model-minimal timing comparisons to avoid over-interpretation. 

      (3) AAA+ is an acronym for "ATPases Associated with diverse cellular Activities" rather than "Adenosine Triphosphatase Associated". 

      Corrected to ATPases Associated with diverse cellular Activities (AAA+).

      Reviewer #2 (Public review): 

      Summary 

      Briola and co-authors have performed a structural analysis of the human CTF18 clamp loader bound to PCNA. The authors purified the complexes and formed a complex in solution. They used cryo-EM to determine the structure to high resolution. The complex assumed an auto-inhibited conformation, where DNA binding is blocked, which is of regulatory importance and suggests that additional factors could be required to support PCNA loading on DNA. The authors carefully analysed the structure and compared it to RFC and related structures. 

      Strength & Weakness 

      Their overall analysis is of high quality, and they identified, among other things, a humanspecific beta-hairpin in Ctf18 that flexible tethers Ctf18 to Rfc2-5. Indeed, deletion of the beta-hairpin resulted in reduced complex stability and a reduction in a primer extension assay with Pol ε. Moreover, the authors identify that the Ctf18 ATP-binding domain assumes a more flexible organisation. 

      The data are discussed accurately and relevantly, which provides an important framework for rationalising the results. 

      All in all, this is a high-quality manuscript that identifies a key intermediate in CTF18-dependent clamp loading. 

      Comments on revisions: 

      The authors have done a nice job with the revision. 

      We thank the reviewer for their very positive comments.

      Reviewer #3 (Public review): 

      Summary: 

      CTF18-RFC is an alternative eukaryotic PCNA sliding clamp loader which is thought to specialize in loading PCNA on the leading strand. Eukaryotic clamp loaders (RFC complexes) have an interchangeable large subunit which is responsible for their specialized functions. The authors show that the CTF18 large subunit has several features responsible for its weaker PCNA loading activity, and that the resulting weakened stability of the complex is compensated by a novel beta hairpin backside hook. The authors show this hook is required for the optimal stability and activity of the complex. 

      Relevance: 

      The structural findings are important for understanding RFC enzymology and novel ways that the widespread class of AAA ATPases can be adapted to specialized functions. A better understanding of CTF18-RFC function will also provide clarity into aspects of DNA replication, cohesion establishment and the DNA damage response. 

      Strengths: 

      The cryo-EM structures are of high quality enabling accurate modelling of the complex and providing a strong basis for analyzing differences and similarities with other RFC complexes. 

      Weaknesses: 

      The manuscript would have benefited from a more detailed biochemical analysis using mutagenesis and assays to tease apart the differences with the canonical RFC complex. Analysis of the FRET assay could be improved. 

      Overall appraisal: 

      Overall, the work presented here is solid and important. The data is mostly sufficient to support the stated conclusions.

      We thank the reviewer for their mainly positive assessment. Following this reviewer suggestion, we have re-analysed the FRET assay data and amended the manuscript accordingly.

      Comments on revisions: 

      While the authors addressed my previous specific concerns, they have now added a new experiment which raises new concerns. 

      The FRET clamp loading experiments (Fig. 6) appear to be overfitted so that the fitted values are unlikely to be robust and it is difficult to know what they mean, and this is not explained in this manuscript. Specifically, the contribution of two exponentials is floated in each experiment. By eye, CTF18-RFC looks much slower than RFC1-RFC (as also shown previously in the literature) but the kinetic constants and text suggest it is faster. This is because the contribution of the fast exponential is substantially decreased, and the rate constants then compensate for this. There is a similar change in contribution of the slow and fast rates between WT CTF18 and the variant (where the data curves look the same) and this has been balanced out by a change in the rate constants, which is then interpreted as a defect. I doubt the data are strong enough to confidently fit all these co-dependent parameters, especially for CTF18, where a fast initial phase is not visible. I would recommend either removing this figure or doing a more careful and thorough analysis. 

      We appreciate the reviewer’s concern regarding potential overfitting of the kinetic data in Figure 6. To address this, we performed a model-minimal re-analysis designed specifically to avoid parameter covariance and over-interpretation (Figure 6 and Figs S11-14 in the new manuscript). Only data recorded after the instrument’s <7 s dead time were included in the fits, thereby excluding the partially obscured early region of the reaction. For each clamp loader complex, we selected the minimal kinetic model that produced residuals randomly distributed about zero. This approach yielded a single-exponential fit for CTF18-RFC, whereas RFC and CTF18<sup>Δ165–194</sup>-RFC required double-exponential fits; single-exponential models for the latter two complexes left structured residuals, clearly indicating the presence of an additional kinetic phase.

      Rather than relying on co-dependent amplitude and rate parameters, we quantified the reactions by reporting progress times (t<sub>0.5</sub>, t<sub>0.90</sub>, t<sub>0.95</sub>), which provide a model-independent measure of reaction speed. This directly addresses the reviewer’s concern and allows a fair comparison of the relative kinetics among the complexes.

      From this analysis, RFC exhibited the fastest onset (t<sub>0.5</sub> ≤ 7 s; lower bound), while CTF18RFC and CTF18<sup>Δ165–194</sup>-RFC showed progressively slower half-times of approximately 8 s and 12 s, respectively. Completion times further emphasized these differences: both RFC and CTF18-RFC reached 95 % completion at ~77 s, whereas the mutant required ~145 s. Despite these kinetic distinctions, CTF18-RFC and its β-hairpin deletion mutant achieved similar EFRET plateaus, indicating that the mutation slows reaction progression but does not reduce the overall extent of PCNA loading.

      Finally, we emphasize that our interpretation is deliberately conservative. We do not assign distinct kinetic phases to CTF18-RFC, as their molecular basis remains unresolved. RFC’s phases have been characterized in prior stopped-flow studies, but CTF18-RFC likely follows a distinct or simplified pathway. Our conclusions are thus limited to what the data unambiguously support: deletion of the Ctf18 β-hairpin decreases the rate—but not the extent—of PCNA loading, consistent with the reduced stimulation of Pol ε primer extension observed under single-turnover conditions.

    1. eLife Assessment

      This study presents valuable findings by demonstrating that specific GPCR subtypes induce distinct extracellular vesicle miRNA signatures, highlighting a potential novel mechanism for intercellular communication with implications for receptor pharmacology within the field. The evidence is solid, however, more experiments are needed to determine whether the distinct extracellular vesicle miRNA signatures result from GPCR-dependent miRNA expression or GPCR-dependent incorporation of miRNAs into extracellular vesicles.

    2. Reviewer #1 (Public review):

      Summary:

      GPCRs affect the EV-miRNA cargoes

      Strengths:

      Novel idea of GPCRs-mediated control of EV loading of miRNAs

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is completely lacking.

      Comments on revisions:

      The revised version of the manuscript falls short of the required standard by lacking additional experiments. Some of the conditions for acceptability could have been met only through clarifying uncertainties via further experiments, which, unfortunately, have not been conducted.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathology processes.

      Methods:

      Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      No significant change in EV quantity or size following GPCR activation.

      Each GPCR triggered a distinct EV miRNA expression profile.

      miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Comments on revisions:

      All the comments have been taken into account. I wish the authors success in their future research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNAbinding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section (page 16-17).

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript (page 19-20). Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as an exploratory study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section (Page 19, second paragraph).

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies (page 19-20).  

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for Future Research:

      (1) Functionally validate top candidate miRNAs in recipient cells.

      We acknowledge that validating the target genes of the top candidate miRNAs is a crucial next step. In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      (2) Investigate other GPCR families and repeat in primary or disease-relevant cell lines.

      The inclusion of different GPCRs and cell lines is suggested as an area for further investigation in the discussion. (Page 19).

      (3) Apply similar approaches in in vivo models or patient samples to assess clinical relevance.

      In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      References

      Garcia-Martin, R., Wang, G., Brandão, B. B., Zanotto, T. M., Shah, S., Kumar Patel, S., Schilling, B., & Kahn, C. R. (2022). MicroRNA sequence codes for small extracellular vesicle release and cellular retention. Nature, 601(7893), 446-451. https://doi.org/10.1038/s41586021-04234-3  

      Jackson, L., Rennie, M., Poussaint, A., & Scarlata, S. (2022). Activation of Gαq sequesters specific transcripts into Ago2 particles. Sci Rep, 12(1), 8758. https://doi.org/10.1038/s41598022-12737-w  

      Liu, X.-M., & Halushka, M. K. (2025). Beyond the Bubble: A Debate on microRNA Sorting Into Extracellular Vesicles. Laboratory Investigation, 105(2), 102206. https://doi.org/10.1016/j.labinv.2024.102206  

      McKenzie, A. J., Hoshino, D., Hong, N. H., Cha, D. J., Franklin, J. L., Coffey, R. J., Patton, J. G., & Weaver, A. M. (2016). KRAS-MEK Signaling Controls Ago2 Sorting into Exosomes. Cell  Rep, 15(5), 978-987. https://doi.org/10.1016/j.celrep.2016.03.085  

      Pultar, M., Oesterreicher, J., Hartmann, J., Weigl, M., Diendorfer, A., Schimek, K., Schädl, B., Heuser, T., Brandstetter, M., Grillari, J., Sykacek, P., Hackl, M., & Holnthoner, W. (2024).Analysis of extracellular vesicle microRNA profiles reveals distinct blood and lymphatic endothelial cell origins. J Extracell Biol, 3(1), e134. https://doi.org/10.1002/jex2.134  

      Teng, Y., Ren, Y., Hu, X., Mu, J., Samykutty, A., Zhuang, X., Deng, Z., Kumar, A., Zhang, L., Merchant, M. L., Yan, J., Miller, D. M., & Zhang, H.-G. (2017). MVP-mediated exosomal sorting of miR-193a promotes colon cancer progression. Nature Communications, 8(1), 14448. https://doi.org/10.1038/ncomms14448  

      Villarroya-Beltri, C., Gutiérrez-Vázquez, C., Sánchez-Cabo, F., Pérez-Hernández, D., Vázquez, J., Martin-Cofreces, N., Martinez-Herrera, D. J., Pascual-Montano, A., Mittelbrunn, M., & Sánchez-Madrid, F. (2013). Sumoylated hnRNPA2B1 controls the sorting of miRNAs into exosomes through binding to specific motifs. Nat Commun, 4, 2980. https://doi.org/10.1038/ncomms3980

      Welsh, J. A., Goberdhan, D. C. I., O'Driscoll, L., Buzas, E. I., Blenkiron, C., Bussolati, B., Cai, H., Di Vizio, D., Driedonks, T. A. P., Erdbrügger, U., Falcon-Perez, J. M., Fu, Q. L., Hill, A. F., Lenassi, M., Lim, S. K., Mahoney, M. G., Mohanty, S., Möller, A., Nieuwland, R., . . .Witwer, K. W. (2024). Minimal information for studies of extracellular vesicles (MISEV2023): From basic to advanced approaches. J Extracell Vesicles, 13(2), e12404. https://doi.org/10.1002/jev2.12404  

      Yoon, J. H., Jo, M. H., White, E. J., De, S., Hafner, M., Zucconi, B. E., Abdelmohsen, K., Martindale, J. L., Yang, X., Wood, W. H., 3rd, Shin, Y. M., Song, J. J., Tuschl, T., Becker, K. G., Wilson, G. M., Hohng, S., & Gorospe, M. (2015). AUF1 promotes let-7b loading on Argonaute 2. Genes Dev, 29(15), 1599-1604. https://doi.org/10.1101/gad.263749.115  

      Zubkova, E., Evtushenko, E., Beloglazova, I., Osmak, G., Koshkin, P., Moschenko, A., Menshikov, M., & Parfyonova, Y. (2021). Analysis of MicroRNA Profile Alterations in Extracellular Vesicles From Mesenchymal Stromal Cells Overexpressing Stem Cell Factor. Front Cell Dev Biol, 9, 754025. https://doi.org/10.3389/fcell.2021.754025

    1. eLife Assessment

      This valuable study presents a thorough analysis of protein abundance changes caused by amino acid substitutions, using structural context to improve predictive accuracy. By deriving substitution response matrices based on solvent accessibility, the authors demonstrate that simple structural features can predict abundance effects with accuracy comparable to complex methods such as free energy calculations. The strength of the evidence is convincing, supported by robust experimental design and comprehensive analyses.

    2. Reviewer #1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seq-type measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMP-seq.

      Public Review:

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      Comments on revision:

      We have no further comments on this manscript.

    3. Reviewer #3 (Public review):

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance through the lens of protein structural information (residue solvent accessibility, secondary structure type) to derive combinations of context-specific substitution matrices that predict variant impact on protein abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor, but to showcase the degree of prediction afforded simply by utilizing structural information.

      Both the derived matrices and the underlying 'training' data are comprehensively evaluated. The authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structure-unaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility and secondary structure. The capacity for the approach to produce generalizable matrices is explored through training data combinations, highlighting factors such as the variable quality of the experimental MAVE data and the biochemical differences between the protein targets themselves, which can lead to limitations. Despite this, the authors demonstrate their simple matrix approach is generally on par with dedicated protein stability predictors in abundance effect evaluation, and even outperforms them in a niche of solvent accessible surface mutations, revealing their matrices provide orthogonal abundance-specific signal. More importantly, the authors further develop this concept to creatively show their matrices can be used to identify surface residues that have buried-like substitution profiles, which are shown to correspond to protein interface residues, post-translational modification sites, functional residues or putative degrons.

      The paper makes a strong and well-supported main point, demonstrating the widespread utility of the authors' approach, empowered through protein structural information and cutting edge MAVE datasets. This work creatively utilizes a simple concept to produce a highly interpretable tool for protein abundance prediction (and beyond), which is inspiring in the age of impenetrable machine learning models.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer # 1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seqtype measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMPseq.

      We thank the reviewer for their evaluation of our work and for their comments and feedback below.

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      We thank the reviewer for their summary of the main points of our work. Based on the suggestion by the reviewer, we have added a comparison to predictions with BLOSUM62 to our revised manuscript, noting that we have previously compared the BLOSUM62 matrix to a broader and more heterogeneous set of scores generated by MAVEs (Høie et al, 2022).

      Specific Feedback:

      Major points:

      The authors spend a good amount of space discussing how the six datasets have different distributions in abundance scores. After the development of their model is there more to say about why? Is there something that can be leveraged here to design maximally informative experiments?

      We believe that these effects arise from a combination of intrinsic differences between the systems and assay-specific effects. For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures, will play a role, as will the fact that some proteins contain multiple domains.

      Also, the sequencing-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and on the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition differences can contribute to the differences between VAMP-seq score distributions. 

      From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences. We have briefly expanded the discussion of these points in the manuscript, and we have moreover elaborated on this in subsequent work (Schulze et al., 2025).

      They compare to one more "sophisticated model" - RosettaddG - which should be more correlated with thermodynamic stability than other factors measured by VAMP-seq. However, the direct head-tohead comparison between their matrices and ddG is underdeveloped. How can this be used to dissect cases where thermodynamics are not contributing to specific substitution patterns OR in specific residues/regions that are predicted by one method better than the other? This would naturally dovetail into whether there is orthogonal information between these two that could be leveraged to create better predictions.

      We thank the reviewer for this suggestion and indeed had spent substantial effort trying to gain additional biological insights from variants for which MAVE scores or MAVE predictions do not match predicted ∆∆G values. One major caveat in this analysis is that the experimental MAVE scores, MAVE predictions and the predicted ∆∆G values are rather noisy, making it difficult to draw conclusions based on individual variants or even small subsets of variants.

      In our revised manuscript, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. 

      We find that many substitution profiles are predicted equally well by the two model types, but also that there are residues for which one method predicts substitution effects better than the other method. We have added an analysis of the characteristics of the residues and variants for which either the ∆∆G model or the substitution matrix model is most useful to rank variants. Since we only find relatively few residues for which this is the case, we do not expect a model that leverages predicted scores from both methods to perform better than ThermoMPNN across variants. 

      Perhaps beyond the scope of this baseline method, there is also ThermoMPNN and the work from Gabe Rocklin to consider as other approaches that should be more correlated only with thermodynamics.

      We acknowledge that there are other approaches to predict ∆∆G beyond Rosetta including for example ThermoMPNN and our own method called RaSP (Blaabjerg et al, eLIFE, 2023), and we have added comparisons to ThermoMPNN and RaSP in the revised manuscript. We are unsure how one would use the data from Rocklin and colleagues directly, but we note that e.g. RaSP has been benchmarked on this data and other methods have been trained on this data. We originally used Rosetta since the Rosetta model is known to be relatively robust and because it has never seen large databases during training (though we do not think that training of ThermoMPNN and RaSP would be biased towards the VAMP-seq data). We note also that we have previously compared both Rosetta calculations and RaSP with VAMP-seq data for TPMT, PTEN and NUDT15 (Blaabjerg et al, eLIFE, 2023)

      I find myself drawn to the hints of a larger idea that outliers to this model can be helpful in identifying specific aspects of proteostasis. The discussion of S109 is great in this respect, but I can't help but feel there is more to be mined from Figure S9 or other analyses of outlier higher than predicted abundance along linear or tertiary motifs.

      We agree with these points and have previously spent substantial time trying to make sense of outliers in Figure S9 and Figure S18 (Figure S8 and Figure S18 of revised manuscript). The outlier analysis was challenging, in part due to the relatively high noise levels in both experimental data and predictions, and we did not find any clear signals. Some outliers in e.g. Figure S9 are very likely the result of dataset-specific abundance score distributions, which further complicates the outlier analysis. We now note this in the revised paper and hope others will use the data to gain additional insights on proteostasis-specific effects.  

      Reviewer # 2 (Public review):

      Summary:

      This study analyzes protein abundance data from six VAMP-seq experiments, comprising over 31,000 single amino acid substitutions, to understand how different amino acids contribute to maintaining cellular protein levels. The authors develop substitution matrices that capture the average effect of amino acid changes on protein abundance in different structural contexts (buried vs. exposed residues). Their key finding is that these simple structure-based matrices can predict mutational effects on abundance with accuracy comparable to more complex physics-based stability calculations (ΔΔG).

      Major strengths:

      (1) The analysis focuses on a single molecular phenotype (abundance) measured using the same experimental approach (VAMP-seq), avoiding confounding factors present when combining data from different phenotypes (e.g., mixing stability, activity, and fitness data) or different experimental methods.

      (2) The demonstration that simple structural features (particularly solvent accessibility) can capture a significant portion of mutational effects on abundance.

      (3) The practical utility of the matrices for analyzing protein interfaces and identifying functionally important surface residues.

      We thank the reviewer for the comments above and the detailed assessment of our work.

      Major weaknesses:

      (1) The statistical rigor of the analysis could be improved. For example, when comparing exposed vs. buried classification of interface residues, or when assessing whether differences between prediction methods are significant.

      We agree with the reviewer that it is useful to determine if interface residues (or any of the residues in the six proteins) can confidently be classified as buried- or exposed-like in terms of their substitution profiles. Thus, we have expanded our approach to compare individual substitution profiles to the average profiles of buried and exposed residues to now account for the noise in the VAMP-seq data. In our updated approach, we resample the abundance score substitution profile for every residue several thousand times based on the experimental VAMP-seq scores and score standard deviations, and we then compare every resampled profile to the average profiles for buried and exposed residues, thereby obtaining residue-specific distributions of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. These RMSD distributions are typically narrow, since many variants in several datasets have small standard deviations. In the revised manuscript, we report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the resampled profiles. We do not recalculate average scores in substitution matrices for this analysis. 

      Moreover, to illustrate potential overlap in predictive performance between prediction methods more clearly than in our preprint, we have added confidence intervals in Fig. 2 and Fig. 3 of the revised manuscript. We note that the analysis in Fig. 2 is performed using a leave-one-protein-out approach, which we believe provides the cleanest assessment of how well the different models perform.

      (2) The mechanistic connection between stability and abundance is assumed rather than explained or investigated. For instance, destabilizing mutations might decrease abundance through protein quality control, but other mechanisms like degron exposure could also be at play.

      We agree that we have not provided much description of the relation between stability and abundance in our original preprint. In the revised manuscript, we provide some more detail as well as references to previous literature explaining the ways in which destabilising mutations can cause degradation. We have moreover performed and added additional analyses of the relationship between thermodynamic stability and abundance through comparisons of stability predictions and predictions performed with our substitution matrix models.

      (3) The similar performance of simple matrix-based and complex physics-based predictions calls for deeper analysis. A systematic comparison of where these approaches agree or differ could illuminate the relationship between stability and abundance. For instance, buried sites showing exposed-like behavior might indicate regions of structural plasticity, while the link between destabilization and degradation might involve partial unfolding exposing typically buried residues. The authors have all the necessary data for such analysis but don't fully exploit this opportunity.

      This is similar to a point made by reviewer 1, and our answer is similar. We were indeed hoping that our analyses would have revealed clearer differences between effects on thermodynamic protein stability and cellular abundance and have tried to find clear signals. One major caveat in performing the suggested analysis is that both the experimental MAVE scores, ∆∆G predictions and our simple matrix-based predictions are rather noisy, making it difficult to make conclusions based on individual variants or even small subsets of variants. 

      To address this point, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. We find that many substitution profiles are predicted equally well by the two model types, but we also, in particular, find solvent-exposed residues for which the substitution matrix model is the better predictor. These residues are often aspartate, glutamate and proline, suggesting that surface-level substitutions of these amino acid types often can have effects that are not captured well by a thermodynamical model, either because this model does not describe thermodynamic effects perfectly, or because in-cell effects are necessary to account for to provide an accurate description.

      (4) The pooling of data across proteins to construct the matrices needs better justification, given the observed differences in score distributions between proteins (for example, PTEN's distribution is shifted towards high abundance scores while ASPA and PRKN show more binary distributions).

      We agree with the reviewer that the differences between the score distributions are important to investigate further and keep in mind when analysing e.g. prediction outliers. However, our results show that the pooling of VAMP-seq scores across proteins does result in substitution matrices that make sense biochemically and can identify outlier residues with proteostatic functions. As we also respond to a related point by reviewer 1, the differences in score distributions likely have complex origins. In that sense, we also hope that our results can inspire experimentalists to design methods to generate data that are more comparable across proteins.

      For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures will play a role, as will the fact that some proteins contain multiple domains. Also, the sequence-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and from the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition can contribute to the differences between VAMP-seq score distributions. From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences.

      Thus, even when experiments on different proteins are performed using the same technique (VAMP-seq), quantifying the same phenomenon (cellular abundance) and done in similar ways (saturation mutagenesis, sort-seq using four FACS bins), there can still be substantial differences in the results across different systems. An interesting side result of our work is to highlight this including how such variation makes it difficult to learn across experiments. We now elaborate on these points in the revised manuscript.

      (5) Some key methodological choices require better justification. For example, combining "to" and "from" mutation profiles for PCA despite their different behaviors, or using arbitrary thresholds (like 0.05) for residue classification.

      We hope we have explained our methodological choices clearer in the revised paper.

      We removed the dependency of the threshold of 0.05 used for residue classification in Fig. S19 of the original manuscript; in the revised manuscript we only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the abundance score profiles that we resampled according to VAMP-seq score noise levels, as explained above.

      With respect to combining “to” and “from” mutational profiles for PCA, we could have also chosen to analyse these two sets of profiles separately to take potentially different behaviours along the two mutational axes into account. We do not think that there should be anything wrong with concatenating the two sets of profiles in a single analysis, since the analysis on the concatenated profiles simply expresses amino acid similarities and differences in a more general manner.

      The authors largely achieve their primary aim of showing that simple structural features can predict abundance changes. However, their secondary goal of using the matrices to identify functionally important residues would benefit from more rigorous statistical validation. While the matrices provide a useful baseline for abundance prediction, the paper could offer deeper biological insights by investigating cases where simple structure-based predictions differ from physics-based stability calculations.

      This work provides a valuable resource for the protein science community in the form of easily applicable substitution matrices. The finding that such simple features can match more complex calculations is significant for the field. However, the work's impact would be enhanced by a deeper investigation of the mechanistic implications of the observed patterns, particularly in cases where abundance changes appear decoupled from stability effects.

      We agree that disentangling stability and other effects on cellular abundance is one of the goals of this work. As discussed above, it has been difficult to find clear cases where amino acid substitutions affect abundance without stability beyond for example the (rare) effects of creating surface exposed degrons. Our new analysis, in which we compare substitution matrix-based predictions to stability predictions, does offer deeper insight into the relationship between the two predictor types and hence possibly between folding stability and abundance. 

      Reviewer #3 (Public review): 

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance (and thus stability) by utilizing structural information, specifically residue solvent accessibility and secondary structure type, to derive combinations of context-specific substitution matrices predicting variant abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor but to showcase the degree of prediction afforded simply by utilizing information on residue accessibility. The performance of their matrices is robustly evaluated using a leave-one-out approach, where the abundance effects for a single protein are predicted using the remaining datasets. Using a simple classification of buried and solvent-exposed residues, and substitution matrices derived respectively for each residue group, the authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structureunaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility or secondary structure. Interestingly, it is shown that the performance of the simple buried and exposed residue substitution matrices for predicting protein abundance is on par with Rosetta, an established and specialized protein variant stability predictor. More importantly, the authors finish off the paper by demonstrating the utility of the two matrices to identify surface residues that have buried-like substitution profiles, that are shown to correspond to protein interface residues, posttranslational modification sites, functional residues, or putative degrons.

      Strengths:

      The paper makes a strong and well-supported main point, demonstrating the utility of the authors' approach through performance comparisons with alternative substitution matrices and specialized methods alike. The matrices are rigorously evaluated without introducing bias, exploring various combinations of protein datasets. Supplemental analyses are extremely comprehensive and detailed. The applicability of the substitution matrices is explored beyond abundance prediction and could have important implications in the future for identifying functionally relevant sites.

      We thank the reviewer for the supportive comments on our work. 

      Comments:

      (1) A wider discussion of the possible reasons why matrices for certain proteins seem to correlate better than others would be extremely interesting, touching upon possible points like differences or similarities in local environments, degradation pathways, posttranslation modifications, and regulation. While the initial data structure differences provide a possible explanation, Figure S17A, B correlations show a more complicated picture.

      We agree with the reviewer that biochemical and biophysical differences between the proteins might contribute to the fact that some matrices correlate better than others. We also agree that it would be very interesting to understand these differences better. While it might be possible to examine some of the suggested causes of the differences, like differences or similarities in local environments, we have generally found that noise and differences in score distributions make such analyses difficult (see also responses to reviewers 1 and 2). For now, we will defer additional analyses to future work.

      (2) The performance analysis in Figure 2D seems to show that for particular proteins "less is more" when it comes to which datasets are best to derive the matrix from (CYP2C9, ASPA, PRKN). Are there any features (direct or proxy), that would allow to group proteins to maximize accuracy? Do the authors think on top of the buried vs exposed paradigm, another grouping dimension at the protein/domain level could improve performance?

      We don’t currently know if any protein- or domain-level features could be used to further split residues into useful categories for constructing new substitution matrices, but it is an interesting suggestion. We note that every substitution matrix consists of 380 averages, and creating too many residue groupings will cause some matrix entries to be averaged over very few abundance scores, at least with the current number of scores in the pooled VAMP-seq dataset. For example, while previous work has shown different mutational effects e.g. in helices and sheets (as one would expect), we find that a model with six matrices ({buried,exposed}x{helix,sheet,other}) does not lead to improved predictions (Fig. 2C), presumably because of an unfavourable balance between parameters and data.

      (3) While the matrices and Rosetta seem to show similar degrees of correlation, do the methods both fail and succeed on the same variants? Or do they show a degree of orthogonality and could potentially be synergistic?

      These are good questions and are related to similar questions from reviewers 1 and 2. In the revised manuscript, we have added additional analyses of differences between predictions from our substitution matrix model and a stability model, and we indeed find that the two methods show a degree of orthogonality. However, since we identify only relatively few residues for which one method performs better than the other, we don’t expect a synergistic model to outperform the stability predictor across all variants in any of the six proteins.  

      Overall, this work presents a valuable contribution by creatively utilizing a simple concept through cutting-edge datasets, which could be useful in various.

      Reviewing Editor:

      As discussed in more detail below, to strengthen the assessment, the authors are encouraged to:

      (1) Include more thorough statistical analyses, such as confidence intervals or standard errors, to better validate key claims (e.g., RMSD comparisons).

      (2) Perform a deeper comparison between substitution response matrices and ΔΔG-based predictions to uncover areas of agreement or orthogonality

      (3) Clarify the relationship between structural features, stability, and abundance to provide more mechanistic insights.

      As discussed above and below, we have added new analyses and clarifications to the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Why is a continuous version of the contact number used here, instead of a discrete count of neighbouring residues? WCN values of the residues in the core domain can be affected by residues far away (small contribution but not strictly zero; if there are many of them, it adds up).

      We have previously found WCN, which quantifies residue contact numbers in a continuous manner, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also found WCN and the cellular abundance of single substitution variants to correlate well in individual analyses of different proteins (Grønbæk-Thygesen et al., 2024; Gersing et al., 2024; Clausen et al., 2024).

      We have calculated the WCN as well as a contact number based on discrete counts of neighbouring residues for the six proteins in our dataset. When distances between residues are evaluated in the same way (i.e. using the shortest distance between any pair of heavy atoms in the side chains), and when the cutoff value used for the discrete count is equal to the r<sub>0</sub> of the WCN function, the continuous and discrete evaluations of residue contact numbers are highly and linearly correlated, and their rank correlation with the VAMP-seq data are very similar. We only observe minor contributions from residues far away in the structure on the WCN.

      Typos in SI figure captions e.g. Figure S8-11 "All predictions were performed using using...."

      Thank you for pointing this out. We have corrected the typos in Figure S8-11 (Figure S7-S10 in the revised manuscript).

      Personally, I'd appreciate a definition of these new substitution matrices under the constraints of rASA/WCN values. It was unclear to me until I read the code but we think that the definition is averaging the substitution matrix based on the clusters they are assigned to. If so, this could be straightforwardly defined in the method section with a heaviside step function.

      We have added a definition of the “buried” and “exposed” substitution matrices as a function of rASA in the methods section (“Definitions of buried and exposed residues” and “Definition of substitution matrices”) of the manuscript, as well as a definition of how we classified residues as either buried or exposed using both rASA and WCN as input. Our final substitution matrices, as shown in e.g. Fig. 2, do not depend on the WCN; only the substitution matrix results in Figure S6 (Figure S20 in the revised manuscript) depend on both WCN and rASA.

      Reviewer #2 (Recommendations for the authors):

      The following suggestions aim to strengthen the analysis and clarify the presentation of your findings:

      (1) Specific analyses to consider:

      (1.1) Analyze buried positions where the exposed matrix performs better. Understanding these cases might reveal properties of protein core regions that show unexpected mutational tolerance.

      We agree with the reviewer that a more detailed analysis of buried residues with exposed-like substitution profiles would be very interesting.

      We note that for proteins where the VAMP-seq score distribution is shifted towards high values (as it is the case for PTEN, TPMT and CYP2C9), our identification of such residues may be a result of the score distribution differences between the six datasets. To confidently identify mutationally tolerant core regions, it would be best to (a) correct for the distribution differences prior to the analysis or (b) focus the analysis on residues that fall far below the diagonal in Figure S18.

      In additional data (which can be found at https://github.com/KULL-Centre/_2024_Schulze_abundance-analysis)) ,we provide, for each of the proteins, a list of buried residues for which RMSD<sub>exposed</sub> <RMSD<sub>buried</sub> (for more than 95% of resampled substitution profiles, as described under 1.6). We have not analysed these residues further.

      (1.2) A systematic comparison of matrix-based vs. ΔΔG-based predictions could help understand both exposed sites that behave as buried (as analyzed in the paper) and buried sites that behave as exposed (1.1), potentially revealing mechanisms underlying abundance changes.

      In our revised manuscript, we have added additional analyses to compare matrixbased and ΔΔG-based predictions, focusing on exposed sites for which one prediction method captures variant effects on abundance considerably better the other prediction method. We have not investigated buried sites with exposed-like behaviour any further in this work.

      (1.3) Explore different normalization approaches when pooling data across proteins. In particular, consider using log(abundance score): if the experimental error in abundance measurements is multiplicative (which can be checked from the reported standard errors), then log transformation would convert this into a constant additive error, making the analysis more statistically sound.

      As we answer below to point 2.2, the abundance scores are, within each dataset, min-max normalised to nonsense and synonymous variant scores, and the score scale is thus in this way consistent across the six datasets. We have explained above and in the revised manuscript that abundance score distribution differences across datasets are likely partially a result of the FACS binning of assay-specific variant libraries. Using only the VAMP-seq scores (that is, without further information about the individual experiments), we cannot correct for the influence of the sorting strategy on the reported scores. A score normalisation across datasets that places all data points on a single scale would require inter-dataset references variant scores, which we do not have. We note that in a subsequent manuscript (Schulze et al, bioRxiv, 2025) we have attempted to take system- and experimentspecific score distributions into account. We now refer to this work in the revised manuscript.

      (1.4) Consider using correlation coefficients between predicted and observed abundance profiles as an alternative to RMSD, which is sensitive to the absolute values of the scores.

      We agree with the reviewer that using correlation coefficients to compare substitution profiles might also be useful, in particular for datasets with relatively unique VAMP-seq score distributions, such as the ASPA dataset. To explore this idea, we have repeated the analysis presented in Fig. S18 using the Pearson correlation coefficient r rather than the RMSD.

      As in Fig. S18, we derive r<sub>buried</sub> and r<sub>exposed</sub> for every residue in the six proteins, specifically by calculating r between the abundance score substitution profile of every individual residue and the average abundance score substitution profiles of buried and exposed residues. VAMP-seq data for the protein for which r<sub>buried</sub> and r<sub>exposed</sub> are evaluated is omitted from the calculation of average abundance score substitution profiles, and we use only monomer structures to determine whether residues are buried or exposed. 

      We show the results of this analysis in an Author response image 1 below. In each panel of the figure, r<sub>buried</sub> and r<sub>exposed</sub> are shown for individual residues of a single protein. Blue datapoints indicate residues that are solvent-exposed in the wild-type protein structures, and yellow datapoints indicate residues that are buried in the wild-type structures. Residues for which it is not the case that r<sub>buried</sub> < r<sub>exposed</sub> or r<sub>exposed</sub><r<sub>buried</sub> in more than 95% of 1000 resampled residue substitution profiles (see explanation of resampling method above) are coloured grey. “Acc.” is the balanced classification accuracy, calculated using all non-grey datapoints, indicating how many buried residues have buried-like substitution profiles (r<sub>exposed</sub><r<sub>buried</sub>) and how many solvent-exposed residues have exposed-like substitution profiles (r<sub>buried</sub> < r<sub>exposed</sub>). The classification accuracy per protein in this figure cannot be compared to the classification accuracy of the same protein in Fig. S18, since the number of datapoints used in the accuracy calculation differ between the r- and RMSD-based analyses. 

      Author response image 1.

      Comparing the r-based approach to the RMSD-based approach (Fig. S18), it is clear that the r-based method is less robust than the RMSD-based method for noisy and incomplete datasets. For the noisiest and most mutationally incomplete VAMP-seq datasets (i.e., PTEN, TPMT and CYP2C9) (Fig. 1), there are relatively few residues for which we with high confidence can determine if the substitution profile is more buried- or more exposed-like. When the VAMP-seq data is less noisy and has high mutational completeness, the r-based method becomes more robust and may thus be relevant in potential future work on new VAMP-seq data with small error bars.

      In conclusion, we find that RMSD-based approach to compare substitution profiles is more robust than an r-based approach for several of the VAMP-seq datasets that are included in our analysis. We do believe than an approach based on the correlation coefficient, or potentially several metrics, could be relevant to use, since abundance score distributions from VAMP-seq datasets can differ significantly across datasets. So as not to increase the length of the main text of our manuscript, we have not added this analysis to the revised manuscript.

      (1.5) Consider treating missing abundance scores as zero values, as they might indicate variants with very low abundance, rather than omitting them from the analysis.

      This suggestion would be most relevant for the PTEN, TPMT and CYP2C9 datasets, which all have a relatively small average mutational depth and completeness, as shown in Fig. 1B and 1C. To assess if setting missing abundance scores as zero values would be reasonable, we have compared the distributions of predicted ΔΔG values (from RaSP and ThermoMPNN) and of predicted abundance scores (from our exposure-based substitution matrices) for variants with reported and missing VAMP-seq data. We show the result in Author response image 2, with data aggregated across the six protein systems:

      Author response image 2.

      We find that variants with and without VAMP-seq data have similar ΔΔG score distributions and similar predicted abundance score distributions, and there is thus no clear enrichment of predicted loss of abundance for variants with missing VAMP-seq scores. This suggests that missing abundance scores do not necessarily indicate very low abundance. One cause of missing data might instead be problems with library generation (Matreyek et al, 2018, 2021).

      We show in Fig. S9 (Fig. S8 of the revised manuscript) that predicted scores for variants with experimental abundance scores of 0 are often overestimated for NUDT15, ASPA and PRKN, but this is not so much a problem for PTEN, TMPT and CYP2C9, the datasets with most missing scores. The lack of an enrichment of low abundance variants from the various predictors would thus still support that missing scores do not necessarily indicate low abundance.

      (1.6) Develop a proper statistical framework for comparing buried vs exposed predictions (whether using RMSD or correlations), including confidence intervals, rather than using arbitrary thresholds.

      As explained above and in the methods section of our revised manuscript, we have expanded our approach to compare the substitution profile of a residue to the average profiles of buried and exposed residues, and our method now accounts for the noise in the VAMP-seq data, making the analysis more statistically rigorous. In our expanded approach, we compare the substitution profiles of individual residues to the average profiles for buried and exposed residues 10,000 times per residue to get a residue-specific distribution of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. Individual RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values are calculated by resampling abundance scores from a Gaussian distribution defined by the experimentally reported abundance score and abundance score standard deviation per variant. We now only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> < RMSD<sub>exposed</sub> in at least 95% of our samples. We do not recalculate average scores in substitution matrices for this analysis. We have updated the plots in our manuscript, e.g. in Fig. S18 and S19 of the revised version, to indicate which residues are confidently classified as buried- or exposed-like.

      (2) Presentation improvements:

      (2.1) In Figure 4, consider removing the average abundance scores, which are not directly related to the RMSD comparison being shown.

      We have decided to keep the average abundance scores in Fig. 4 (now Fig. 5), as we find the average abundance scores useful for guiding interpretation of the RMSD values. For example, an unusually small average abundance score with a relatively small standard deviation may explain a case where RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> are both large. This is for example the case for residue G185 in ASPA. 

      In our preprint, the error bars on the average abundance scores in Fig. 4 (now Fig. 5) indicated the standard deviation across the abundance scores that were used to calculate the average per position. We have removed these error bars in the revised manuscript, as we realised that these were not necessarily helpful to the reader.

      (2.2) I am assuming that abundance scores are defined as the ratio abundance_variant/abundance_wt throughout the analysis, but I don't think this has been explicitly defined. If this is correct, please state it explicitly. In such case, log(abundance_score) would have a simple interpretation as the difference in abundance between variant and wild-type.

      Abundance scores are defined throughout the manuscript as sequence-based scores that have been min-max normalised to the abundance of nonsense and synonymous variants, i.e. abundance_score = (abundance_variant abundance_nonsense)/(abundance_wt–abundance_nonsense). We have described the normalisation of scores to wild-type and nonsense variant abundance in lines 164-166 of the original manuscript. We have now added additional information about the normalisation scheme in the methods section. We note that we did not ourselves apply this normalisation to the data; the scores were reported in this manner in the original publications that reported the VAMP-seq experiments for the six proteins.

      (2.3) Consider renaming "rASA" to the more commonly used "RSA" for relative solvent accessibility.

      We have decided to keep using “rASA” throughout the manuscript.

      (2.4) The weighted contact number function used differs from the established WCN measure (Σ1/rij²) introduced by Lin et al. (2008, Proteins). This should be acknowledged and the choice of alternative weighting scheme justified.

      As we have also responded to the first minor point of reviewer 1, we have previously found WCN, as it is defined in our manuscript, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also previously found this type of WCN to correlate well with variant abundance of individual proteins, as measured with VAMP-seq or protein fragment complementation assays (Grønbæk-Thygesen et al., 2024; Clausen et al., 2024; Gersing et al., 2024). We acknowledge that residue contact numbers or weighted contact numbers could also be expressed in other ways and that alternative contact number definitions would likely also produce values that correlate well with VAMP-seq data. Since the WCN, as defined in our manuscript, already correlates relatively well with abundance scores, we have not explored whether alternative definitions produce better correlations.  

      (2.5) Replace the phrase "in the above" with specific references to sections or simply "above" where appropriate. Also, consider replacing many instances of "moreover" with simpler alternatives such as "also" or "in addition" to improve readability.

      We have changed several sentences according to this suggestion and hope that we have improved the readability of our manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) It should be explicitly confirmed earlier that complex structures are used for NUDT15 and ASPA when assessing rASA/WCN. Additionally, it would be interesting to see the effect that deriving the matrices using NUDT15 and ASPA monomers would have.

      We have commented on the use of NUDT15 and ASPA homodimer structures earlier in the revised manuscript (specifically already in the subsection Abundance scores correlate with the degree of residue solvent-exposure section).

      When residues are classified using monomer rather than dimer structures of NUDT15 and ASPA, there is a small effect on the resulting “buried” and “exposed” substitution matrices. Entries in this set of substitution matrices calculated using either monomer or dimer structures typically differ by less than 0.05, and only a single entry differ by more than 0.1. As expected, the “exposed” matrix tend to contain slightly larger numbers when derived from dimer structures than when derived from monomer structures, meaning that when the interface residues are included in the exposed residue category, the average abundance scores of the “exposed” matrix are lowered. For buried residues, the picture is more mixed, although the overall tendency is that the interface residues make the “buried” matrix contain smaller average abundance scores for dimer compared to monomer structures. These results generally support the use of dimer structures for the residue classification.

      We here show the differences between the substitution matrices calculated with dimer or monomer structures of NUDT15 and ASPA and using data for all six proteins in our combined VAMP-seq dataset (average_abundance_score_differece = average_abundance_score_dimers – average_abundance_score _monomers):

      Author response image 3.

      We have not explored these alternative matrices further.

      (2) While the supplemental analyses are rigorous, the abundance of various metrics being presented can be confusing, especially when they seem to differ in their result. For instance, the discussion of Figure S17 (paragraph starting 428) contains mentions of mean differences but then switches to correlations, while both are presented for all panels. The claim "The datasets thus mainly differ due to differences in substitution effects in buried environments. " is well supported by the observed mean differences, but for Pearson's correlations the average panel A ,B values of buried 0.421 vs exposed 0.427 are hardly different. Which of the metrics is more meaningful, and are both needed?

      We agree with the reviewer that the claim that “The datasets thus mainly differ due to differences in substitution effects in buried environments” is not well-supported by the r between the substitution matrices, and we have removed this claim from the text.

      Since some datasets share VAMP-seq score distribution features, while others do not, the absolute difference between scores or matrices may be relevant to check for some dataset pairs, while the r may be more relevant to check for other dataset pairs. Hence, we have included both metrics in Fig S17 (Fig S11 in the revised manuscript).

      (3) Lines 337-340 - does not feel like S7 is the topic, perhaps the authors meant Figure 2A, B? In general, the supplemental figure references are out of order and panel combinations are sometimes confusing.

      We have corrected figures references to now be correct and changed the arrangement of supplemental figures so that they now occur in the correct order. We have looked through the panel combinations with clarity in mind, and hope that the current set of main and supplementary figures balances overview and detail.

      (4) Line 363 "are also are also".

      We have corrected this typo.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study analyzes the gastric fluid DNA content identified as a potential biomarker for human gastric cancer. However, the study lacks overall logicality, and several key issues require improvement and clarification. In the opinion of this reviewer, some major revisions are needed:

      (1) This manuscript lacks a comparison of gastric cancer patients' stages with PN and N+PD patients, especially T0-T2 patients.

      We are grateful for this astute remark. A comparison of gfDNA concentration among the diagnostic groups indicates a trend of increasing values as the diagnosis progresses toward malignancy. The observed values for the diagnostic groups are as follows:

      Author response table 1.

      The chart below presents the statistical analyses of the same diagnostic/tumor-stage groups (One-Way ANOVA followed by Tukey’s multiple comparison tests). It shows that gastric fluid gfDNA concentrations gradually increase with malignant progression. We observed that the initial tumor stages (T0 to T2) exhibit intermediate gfDNA levels, which in this group is significantly lower than in advanced disease (p = 0.0036), but not statistically different from non-neoplastic disease (p = 0.74).

      Author response image 1.

      (2) The comparison between gastric cancer stages seems only to reveal the difference between T3 patients and early-stage gastric cancer patients, which raises doubts about the authenticity of the previous differences between gastric cancer patients and normal patients, whether it is only due to the higher number of T3 patients.

      We appreciate the attention to detail regarding the numbers analyzed in the manuscript. Importantly, the results are meaningful because the number of subjects in each group is comparable (T0-T2, N = 65; T3, N = 91; T4, N = 63). The mean gastric fluid gfDNA values (ng/µL) increase with disease stage (T0-T2: 15.12; T3-T4: 30.75), and both are higher than the mean gfDNA values observed in non-neoplastic disease (10.81 ng/µL for N+PD and 10.10 ng/µL for PN). These subject numbers in each diagnostic group accurately reflect real-world data from a tertiary cancer center.

      (3) The prognosis evaluation is too simplistic, only considering staging factors, without taking into account other factors such as tumor pathology and the time from onset to tumor detection.

      Histopathological analyses were performed throughout the study not only for the initial diagnosis of tissue biopsies, but also for the classification of Lauren’s subtypes, tumor staging, and the assessment of the presence and extent of immune cell infiltrates. Regarding the time of disease onset, this variable is inherently unknown--by definition--at the time of a diagnostic EGD. While the prognosis definition is indeed straightforward, we believe that a simple, cost-effective, and practical approach is advantageous for patients across diverse clinical settings and is more likely to be effectively integrated into routine EGD practice.

      (4) The comparison between gfDNA and conventional pathological examination methods should be mentioned, reflecting advantages such as accuracy and patient comfort.

      We wish to reinforce that EGD, along with conventional histopathology, remains the gold standard for gastric cancer evaluation. EGD under sedation is routinely performed for diagnosis, and the collection of gastric fluids for gfDNA evaluation does not affect patient comfort. Thus, while gfDNA analysis was evidently not intended as a diagnostic EGD and biopsy replacement, it may provide added prognostic value to this exam.

      (5) There are many questions in the figures and tables. Please match the Title, Figure legends, Footnote, Alphabetic order, etc.

      We are grateful for these comments and apologize for the clerical oversight. All figures, tables, titles and figure legends have now been double-checked.

      (6) The overall logicality of the manuscript is not rigorous enough, with few discussion factors, and cannot represent the conclusions drawn.

      We assume that the unusual wording remark regarding “overall logicality” pertains to the rationale and/or reasoning of this investigational study. Our working hypothesis was that during neoplastic disease progression, tumor cells continuously proliferate and, depending on various factors, attract immune cell infiltrates. Consequently, both tumor cells and immune cells (as well as tumor-derived DNA) are released into the fluids surrounding the tumor at its various locations, including blood, urine, saliva, gastric fluids, and others. Thus, increases in DNA levels within some of these fluids have been documented and are clinically meaningful. The concurrent observation of elevated gastric fluid gfDNA levels and immune cell infiltration supports the hypothesis that increased gfDNA—which may originate not only from tumor cells but also from immune cells—could be associated with better prognosis, as suggested by this study of a large real-world patient cohort.

      In summary, we thank Reviewer #1 for his time and effort in a constructive critique of our work.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether the total DNA concentration in gastric fluid (gfDNA), collected via routine esophagogastroduodenoscopy (EGD), could serve as a diagnostic and prognostic biomarker for gastric cancer. In a large patient cohort (initial n=1,056; analyzed n=941), they found that gfDNA levels were significantly higher in gastric cancer patients compared to non-cancer, gastritis, and precancerous lesion groups. Unexpectedly, higher gfDNA concentrations were also significantly associated with better survival prognosis and positively correlated with immune cell infiltration. The authors proposed that gfDNA may reflect both tumor burden and immune activity, potentially serving as a cost-effective and convenient liquid biopsy tool to assist in gastric cancer diagnosis, staging, and follow-up.

      Strengths:

      This study is supported by a robust sample size (n=941) with clear patient classification, enabling reliable statistical analysis. It employs a simple, low-threshold method for measuring total gfDNA, making it suitable for large-scale clinical use. Clinical confounders, including age, sex, BMI, gastric fluid pH, and PPI use, were systematically controlled. The findings demonstrate both diagnostic and prognostic value of gfDNA, as its concentration can help distinguish gastric cancer patients and correlates with tumor progression and survival. Additionally, preliminary mechanistic data reveal a significant association between elevated gfDNA levels and increased immune cell infiltration in tumors (p=0.001).

      Reviewer #2 has conceptually grasped the overall rationale of the study quite well, and we are grateful for their assessment and comprehensive summary of our findings.

      Weaknesses:

      (1) The study has several notable weaknesses. The association between high gfDNA levels and better survival contradicts conventional expectations and raises concerns about the biological interpretation of the findings.

      We agree that this would be the case if the gfDNA was derived solely from tumor cells. However, the findings presented here suggest that a fraction of this DNA would be indeed derived from infiltrating immune cells. The precise determination of the origin of this increased gfDNA remains to be achieved in future follow-up studies, and these are planned to be evaluated soon, by applying DNA- and RNA-sequencing methodologies and deconvolution analyses.

      (2) The diagnostic performance of gfDNA alone was only moderate, and the study did not explore potential improvements through combination with established biomarkers. Methodological limitations include a lack of control for pre-analytical variables, the absence of longitudinal data, and imbalanced group sizes, which may affect the robustness and generalizability of the results.

      Reviewer #2 is correct that this investigational study was not designed to assess the diagnostic potential of gfDNA. Instead, its primary contribution is to provide useful prognostic information. In this regard, we have not yet explored combining gfDNA with other clinically well-established diagnostic biomarkers. We do acknowledge this current limitation as a logical follow-up that must be investigated in the near future.

      Moreover, we collected a substantial number of pre-analytical variables within the limitations of a study involving over 1,000 subjects. Longitudinal samples and data were not analyzed here, as our aim was to evaluate prognostic value at diagnosis. Although the groups are imbalanced, this accurately reflects the real-world population of a large endoscopy center within a dedicated cancer facility. Subjects were invited to participate and enter the study before sedation for the diagnostic EGD procedure; thus, samples were collected prospectively from all consenting individuals.

      Finally, to maintain a large, unbiased cohort, we did not attempt to balance the groups, allowing analysis of samples and data from all patients with compatible diagnoses (please see Results: Patient groups and diagnoses).

      (3) Additionally, key methodological details were insufficiently reported, and the ROC analysis lacked comprehensive performance metrics, limiting the study's clinical applicability.

      We are grateful for this useful suggestion. In the current version, each ROC curve (Supplementary Figures 1A and 1B) now includes the top 10 gfDNA thresholds, along with their corresponding sensitivity and specificity values (please see Suppl. Table 1). The thresholds are ordered from-best-to-worst based on the classic Youden’s J statistic, as follows:

      Youden Index = specificity + sensitivity – 1 [Youden WJ. Index for rating diagnostic tests. Cancer 3:32-35, 1950. PMID: 15405679]. We have made an effort to provide all the key methodological details requested, but we would be glad to add further information upon specific request.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an excellent study by a superb investigator who discovered and is championing the field of migrasomes. This study contains a hidden "gem" - the induction of migrasomes by hypotonicity and how that happens. In summary, an outstanding fundamental phenomenon (migrasomes) en route to becoming transitionally highly significant.

      Strengths:

      Innovative approach at several levels. Migrasomes - discovered by Dr Yu's group - are an outstanding biological phenomenon of fundamental interest and now of potentially practical value.

      Weaknesses:

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      We sincerely thank the reviewer for the encouraging and insightful comments. We fully agree that the fundamental aspects of migrasome biology are of great importance and deserve deeper exploration.

      In line with the reviewer’s suggestion, we have expanded our discussion on the basic biology of engineered migrasomes (eMigs). A recent study by the Okochi group at the Tokyo Institute of Technology demonstrated that hypoosmotic stress induces the formation of migrasome-like vesicles, involving cytoplasmic influx and requiring cholesterol for their formation (DOI: 10.1002/1873-3468.14816, February 2024). Building on this, our study provides a detailed characterization of hypoosmotic stressinduced eMig formation, and further compares the biophysical properties of natural migrasomes and eMigs. Notably, the inherent stability of eMigs makes them particularly promising as a vaccine platform.

      Finally, we would like to note that our laboratory continues to investigate multiple aspects of migrasome biology. In collaboration with our colleagues, we recently completed a study elucidating the mechanical forces involved in migrasome formation (DOI: 10.1016/j.bpj.2024.12.029), which further complements the findings presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors' report describes a novel vaccine platform derived from a newly discovered organelle called a migrasome. First, the authors address a technical hurdle in using migrasomes as a vaccine platform. Natural migrasome formation occurs at low levels and is labor intensive, however, by understanding the molecular underpinning of migrasome formation, the authors have designed a method to make engineered migrasomes from cultured, cells at higher yields utilizing a robust process. These engineered migrasomes behave like natural migrasomes. Next, the authors immunized mice with migrasomes that either expressed a model peptide or the SARSCoV-2 spike protein. Antibodies against the spike protein were raised that could be boosted by a 2nd vaccination and these antibodies were functional as assessed by an in vitro pseudoviral assay. This new vaccine platform has the potential to overcome obstacles such as cold chain issues for vaccines like messenger RNA that require very stringent storage conditions.

      Strengths:

      The authors present very robust studies detailing the biology behind migrasome formation and this fundamental understanding was used to form engineered migrasomes, which makes it possible to utilize migrasomes as a vaccine platform. The characterization of engineered migrasomes is thorough and establishes comparability with naturally occurring migrasomes. The biophysical characterization of the migrasomes is well done including thermal stability and characterization of the particle size (important characterizations for a good vaccine).

      Weaknesses:

      With a new vaccine platform technology, it would be nice to compare them head-tohead against a proven technology. The authors would improve the manuscript if they made some comparisons to other vaccine platforms such as a SARS-CoV-2 mRNA vaccine or even an adjuvanted recombinant spike protein. This would demonstrate a migrasome-based vaccine could elicit responses comparable to a proven vaccine technology. 

      We thank the reviewer for the thoughtful evaluation and constructive suggestions, which have helped us strengthen the manuscript. 

      Comparison with proven vaccine technologies:

      In response to the reviewer’s comment, we now include a direct comparison of the antibody responses elicited by eMig-Spike and a conventional recombinant S1 protein vaccine formulated with Alum. As shown in the revised manuscript (Author response image 1), the levels of S1-specific IgG induced by the eMig-based platform were comparable to those induced by the S1+Alum formulation. This comparison supports the potential of eMigs as a competitive alternative to established vaccine platforms. 

      Author response image 1.

      eMigrasome-based vaccination showed similar efficacy compared with adjuvanted recombinant spike protein The amount of S1-specific IgG in mouse serum was quantified by ELISA on day 14 after immunization. Mice were either intraperitoneally (i.p.) immunized with recombinant Alum/S1 or intravenously (i.v.) immunized with eM-NC, eM-S or recombinant S1. The administered doses were 20 µg/mouse for eMigrasomes, 10 µg/mouse (i.v.) or 50 µg/mouse (i.p.) for recombinant S1 and 50 µl/mouse for Aluminium adjuvant.

      Assessment of antigen integrity on migrasomes:

      To address the reviewer’s suggestion regarding antigen integrity, we performed immunoblotting using antibodies against both S1 and mCherry. Two distinct bands were observed: one at the expected molecular weight of the S-mCherry fusion protein, and a higher molecular weight band that may represent oligomerized or higher-order forms of the Spike protein (Figure 5b in the revised manuscript).

      Furthermore, we performed confocal microscopy using a monoclonal antibody against Spike (anti-S). Co-localization analysis revealed strong overlap between the mCherry fluorescence and anti-Spike staining, confirming the proper presentation and surface localization of intact S-mCherry fusion protein on eMigs (Figure 5c in the revised manuscript). These results confirm the structural integrity and antigenic fidelity of the Spike protein expressed on eMigs.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      I know that the reviewers always ask for more, and this is not the case here. Can the abstract and title be changed to emphasize the science behind migrasome formation, and possibly add a few more fundamental aspects on how hypotonic shock induces migrasomes?

      Alternatively, if the authors desire to maintain the emphasis on vaccines, can immunological mechanisms be somewhat expanded in order to - at least to some extent - explain why migrasomes are a better vaccine vehicle?

      One way or another, this reviewer is highly supportive of this study and it is really up to the authors and the editor to decide whether my comments are of use or not.

      My recommendation is to go ahead with publishing after some adjustments as per above.

      We’d like to thank the reviewer for the suggestion. We have changed the title of the manuscript and modified the abstract, emphasizing the fundamental science behind the development of eMigrasome. To gain some immunological information on eMig illucidated antibody responses, we characterized the type of IgG induced by eM-OVA in mice, and compared it to that induced by Alum/OVA. The IgG response to Alum/OVA was dominated by IgG1. Quite differently, eM-OVA induced an even distribution of IgG subtypes, including IgG1, IgG2b, IgG2c, and IgG3 (Figure 4i in the revised manuscript). The ratio between IgG1 and IgG2a/c indicates a Th1 or Th2 type humoral immune response. Thus, eM-OVA immunization induces a balance of Th1/Th2 immune responses.

      Reviewer #2 (Recommendations For The Authors):

      The study is a very nice exploration of a new vaccine platform. This reviewer believes that a more head-to-head comparison to the current vaccine SARS-CoV-2 vaccine platform would improve the manuscript. This comparison is done with OVA antigen, but this model antigen is not as exciting as a functional head-to-head with a SARS-CoV-2 vaccine.

      I think that two other discussion points should be included in the manuscript. First, was the host-cell protein evaluated? If not, I would include that point on how issues of host cell contamination of the migrasome could play a role in the responses and safety of a vaccine. Second, I would discuss antigen incorporation and localization into the platform. For example, the full-length spike being expressed has a native signal peptide and transmembrane domain. The authors point out that a transmembrane domain can be added to display an antigen that does not have one natively expressed, however, without a signal peptide this would not be secreted and localized properly. I would suggest adding a discussion of how a non-native signal peptide would be necessary in addition to a transmembrane domain.

      We thank the reviewer for these thoughtful suggestions and fully agree that the points raised are important for the translational development of eMig-based vaccines.

      (1) Host cell proteins and potential immunogenicity:

      We appreciate the reviewer’s suggestion to consider host cell protein contamination. Considering potential clinical application of eMigrasomes in the future, we will use human cells with low immunogenicity such as HEK-293 or embryonic stem cells (ESCs) to generate eMigrasomes. Also, we will follow a QC that meets the standard of validated EV-based vaccination techniques. 

      (2) Antigen incorporation and localization—signal peptide and transmembrane domain:

      We also agree with the reviewer’s point that proper surface display of antigens on eMigs requires both a transmembrane domain and a signal peptide for correct trafficking and membrane anchoring. For instance, in the case of full-length Spike protein, the native signal peptide and transmembrane domain ensure proper localization to the plasma membrane and subsequent incorporation into eMigs. In case of OVA, a secretary protein that contains a native signal peptide yet lacks a transmembrane domain, an engineered transmembrane domain is required. For antigens that do not naturally contain these features, both a non-native signal peptide and an artificial transmembrane domain are necessary. We have clarified this point in the revised discussion and explicitly noted the requirement for a signal peptide when engineering antigens for surface display on migrasomes.

    1. Author response:

      The following is the authors’ response to the original reviews

      We again thank the reviewers for their comments and recommendations. In response to the reviewer’s suggestions, we have performed several additional experiments, added additional discussion, and updated our conclusions to reflect the additional work. Specifically, we have performed additional analyses in female WT and Marco-deficient animals, demonstrating that the Marco-associated phonotypes observed in male mice (reduced adrenal weight, increased lung Ace mRNA and protein expression, unchanged expression of adrenal corticosteroid biosynthetic enzymes) are not present in female mice. We also report new data on the physiological consequences of increased aldosterone levels observed in male mice, namely plasma sodium and potassium titres, and blood pressure alterations in WT vs Marco-deficient male mice. In an attempt to address the reviewer’s comments relating to our proposed mechanism on the regulation of lung Ace expression, we additionally performed a co-culture experiment using an alveolar macrophage cell line and an endothelial cell line. In light of the additional evidence presented herein, we have updated our conclusions from this study and changed the title of our work to acknowledge that the mechanism underlying the reported phenotype remains incompletely understood. Specific responses to reviewers can be seen below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels.

      The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung.

      Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored.

      The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear.

      The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      (1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      (2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      Using a power calculator (www.gigacalculator.com) it was determined that our sample size of 13 was one less than sufficient to detect a similar % difference in corticosterone as was detected in corticosterone. We regret that we unable to perform additional measurements as the author suggested in the available timeframe.

      (2) All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      Given the limitations outlined in our previous response to reviewers it was not possible to repeat every experiment from the original manuscript. We were able to measure the expression of lung Ace mRNA, ACE protein, adrenal weights, adrenal expression of steroid biosynthetic enzymes, presence of myeloid cells, and levels of serum electrolytes in female animals. These are presented in figures 1G, 3B, 4A, 4E, 4F, 4I, and 4J. We have elected to not present single cell seq data according to sex as it did not indicate substantial differences between males and females in Marco or Ace expression and so does not substantively change our approach.

      (3) IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We used negative controls to guide our settings during acquisition of immunofluorescent images. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung in addition to the protein level. This data was presented in the original manuscript and is further bolstered by our additional presentation of expression data for Ace mRNA and protein in female animals in this revised manuscript.

      (4) Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      We don’t have ACE-deficient mice so cannot do KO validation of the antibody. We did perform secondary stain controls which confirmed the signal observed is primary antibody-derived. Moreover, we specifically chose an anti-ACE antibody (Invitrogen catalogue # MA5-32741) that has undergone advanced verification with the manufacturer. We additionally tested the antibody in the brain and liver and observed no significant levels of staining.

      Author response image 1.

      (5) The link between alveolar macrophage Marco and ACE is poorly explored.

      We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the discussion.

      (6) Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      This is outside the scope if this project, though we would consider exploring such experiments in future studies.

      (7) Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      (1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      (2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the implications explored in the discussion.

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      (4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern we carried out a co-culture experiment as described above.

    1. eLife Assessment

      This study reports insights into how the caspase Dcp-1, best known for cell death, can also promote tissue growth in Drosophila, extending the authors' earlier work by identifying regulatory factors that shape this non-lethal activity. The valuable findings identify new Dcp-1-interacting proteins Sirt1, Fkbp59, Debcl, Buffy, Atg2, and Atg8a, and help broaden understanding of how growth and death pathways intersect. The evidence is solid, but some conclusions would be strengthened by additional studies, particularly regarding the nature of the cell death observed and the involvement of autophagy.

    2. Reviewer #1 (Public review):

      Summary:

      The authors clearly demonstrate that overexpressed Dcp-1, but not Drice, is activated without canonical apoptosome components. Using TurboID-based proximity labeling, they revealed distinct proximal proteomes, among which Sirtuin 1, an Atg8a deacetylase, which promotes autophagy, was specifically required for Dcp-1 activation. Additionally, the show that autophagy-related genes, including Bcl-2 family members Debcl and Buffy, are required for Dcp-1 activation.

      Using structure-based prediction using AlphaFold3, they identified that Bruce, an autophagy-regulated inhibitor of apoptosis, acts as a Dcp-1-specific regulator acting outside the apoptosome-mediated pathway. Finally, they show that Bruce suppresses wing tissue growth. These findings indicate that non-lethal Dcp-1 activity is governed by the autophagy-Bruce axis, enabling distinct non-lethal functions independent of cell death.

      Strengths:

      This is an excellent paper with very good structure, excellent quality data and analysis.

      Weaknesses:

      This reviewer did not identify any weaknesses or recommendations for revision.

    3. Reviewer #2 (Public review):

      Summary:

      The Drosophila executioner caspase Dcp-1 has established roles in cell death, autophagy, and imaginal disc growth. This study reports previously unrecognized factors that work together with Dcp-1. Specifically, the authors performed a turboID-based proximal ligation experiment to identify factors associated Dcp-1 and Drice. Dcp-1-specific interactors were further examined for their genetic interaction. The authors report autophagy-related genes, including Debcl and Buffy, to be required for Dcp-1 activation. In addition, the authors present evidence of an interaction between Bruce and Dcp-1. Bruce-expression blocks the Dcp-1 overexpression phenotype. Inhibition of effector caspases or overexpression of Bruce commonly reduced wing growth, suggesting a relationship between the two proteins.

      Strengths:

      On the positive side, the study identifies new Dcp-1-interacting proteins and provides a functional link between Dcp-1 and Sirt1, Fkbp59, Debcl, Buffy, Atg2, and Atg8a.

      Weaknesses:

      The data supporting the Dcp-1/Bruce interaction are not strong, even though the title of this manuscript highlights Bruce. For example, the authors' turboID data does not support Dcp-1/Bruce interaction. The case for the interaction is based on a single experiment that overexpresses a truncated Bruce transgene in S2 cells.

    4. Reviewer #3 (Public review):

      Summary:

      The present paper by Shinoda et al. from the Miura group builds upon findings reported in an earlier study by the same team (Shinoda et al., PNAS, 2019), which identified a non-apoptotic role for the Drosophila executioner caspase Dcp-1 in promoting wing tissue growth. That earlier work attributed this function primarily to Dcp-1 and to Decay, a caspase structurally related to executioner caspases, but not to DrICE, the principal apoptotic executioner caspase. The authors further proposed that this non-apoptotic caspase activity operates independently of the initiator caspase Dronc.

      In the current study, the authors both corroborate aspects of their previous findings and extend the investigation to mechanisms regulating Dcp-1 in this context. They identify roles for the giant IAP Bruce, two BCL-2 family members, and autophagy-related components in modulating non-apoptotic Dcp-1 activity. Moreover, they show that Bruce binds to a BIR-like peptide exposed upon Dcp-1 cleavage, but not to DrICE. The study further suggests that low levels of Dcp-1 activity promote wing tissue growth, whereas excessive activity induces cell death, as evidenced by impaired wing development following Dcp-1 overexpression. Overall, the manuscript provides several intriguing insights into the non-apoptotic regulation of the comparatively weak apoptotic executioner caspase Dcp-1 and complements the group's earlier work. However, several concerns remain regarding certain interpretations of the data and the experimental rigour of some of the results.

      Strengths:

      A major strength of the work is its systematic genetic and biochemical approaches, which combine tissue-specific manipulation with protein interaction mapping to explore how Dcp-1 is regulated. The identification of several regulatory factors, including an inhibitor of cell death protein and components linked to autophagy, provides a coherent framework for understanding how Dcp-1 activity might be tuned.

      Weaknesses:

      The evidence supporting some key claims remains incomplete. In particular, the type of cell death form induced when Dcp-1 is overexpressed is not clearly established, and additional tests would be needed to distinguish between the different cell death types.

      Likely impact:

      The study contributes to a growing body of work showing that proteins traditionally associated with cell death can have broader roles in tissue development. This conceptual advance is likely to be of interest to researchers studying growth control and tissue maintenance.

      Specific points:

      (1) Nature of the wing ablation phenotype

      A central concern is whether the wing ablation phenotype observed upon Dcp-1 overexpression truly reflects apoptotic cell death. The authors show in Figure 1c that nuclei in cells overexpressing Dcp-1, but not DrICE, zymogens are highly condensed, which is suggestive of apoptosis. However, it is equally plausible that this phenotype reflects a form of non-apoptotic, Dcp-1-dependent cell death (e.g. autophagy-dependent cell death). This distinction could be readily addressed using TUNEL labelling and direct caspase activity assays. The latter would be particularly informative, as it remains unclear whether zymogen Dcp-1 is capable of cleaving standard effector caspase reporters in vivo. Does the anti-cleaved Dcp-1 antibody detect Dcp-1 activation following overexpression of the Dcp-1 zymogen?

      (2) Role of Decay

      In their earlier study, the authors identified Decay as another caspase influencing wing growth, albeit more modestly than Dcp-1. It is therefore unclear why this line of investigation was not pursued further in the current work. This omission is notable, as Decay is not implicated in apoptosis and, to date, no substantial physiological function has been assigned to this caspase in any system. At a minimum, this point should be discussed explicitly.

      (3) Figure 2: Proximity labelling analysis

      The authors use TurboID-mediated proximity labelling to reveal distinct Dcp-1- and DrICE-associated proteomes across tissues, with a particular focus on the wing disc. They further demonstrate that RNAi-mediated knockdown of the Dcp-1-associated proteins Sirt1 and Fkbp59 suppresses the wing ablation phenotype induced by Dcp-1 overexpression, suggesting that these factors are required for Dcp-1 activity. However, it should be clarified whether Bruce was identified as a Dcp-1 interactor in the proximity labelling dataset, given its proposed central regulatory role. In addition, further discussion of Fkbp59, its known functions and how it might mechanistically influence Dcp-1 activity would be valuable.

      (4) Figure 3: Autophagy-related factors

      Given that Sirt1 is known to promote autophagy, the authors next examine autophagy-related proteins and identify roles for Atg2, Atg8a, Debcl, and Buffy in Dcp-1 activation. Notably, these proteins do not promote cell death in the Hid-induced canonical apoptotic pathway. However, it is important to determine whether knockdown of Debcl, Buffy, Atg2, or Atg8a alone affects wing development in the absence of Dcp-1 overexpression, to exclude the possibility that these perturbations independently impair wing formation.

      (5) Evidence for canonical autophagy

      The involvement of autophagy would be more convincingly demonstrated by testing additional core autophagy genes, such as Atg7, Atg5, and Atg12, as well as performing a combined knockdown of Atg8a and Atg8b. Moreover, direct assessment of autophagy at the cellular level using established genetic reporters would substantially strengthen the conclusions.

      (6) Figures 4-5: Functional consequences

      It would be informative to determine whether Synr, Debcl, or Buffy influence wing size on their own and whether their overexpression enhances wing growth.

      (7) Terminology and interpretation of cell death

      Taken together, the results suggest that Dcp-1 zymogen overexpression induces a form of non-apoptotic cell death, potentially autophagy-dependent or related. The reviewer does not understand the authors' insistence on referring to this process as apoptosis. The authors should be more cautious in their terminology: there is no canonical versus non-canonical apoptosis; there is simply apoptosis. Without stronger evidence, these effects should not be described as apoptotic cell death.

    1. eLife Assessment

      This study presents a valuable advance by enabling functional mapping of Ca²⁺ responses in live human pancreatic tissue slices, providing new opportunities to study islet heterogeneity and diabetes-related dysfunction in an intact tissue context. The evidence supporting the main conclusions is solid, based on reproducible methodology and functional validation across multiple human donor samples. Key revisions needed include clearer quantification of transduction efficiency and tissue viability, and improved clarification of how CaMPARI2 signals should be interpreted.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to overcome a major technical limitation in pancreatic slice research - the inefficient viral transduction of dense, enzyme-active human pancreas tissue - while maintaining tissue integrity and physiological responsiveness. They developed a modified culture and infection protocol that incorporates gentle orbital agitation, removal of protease inhibitors, and physiological temperature during adenoviral transduction. This method increased transduction efficiency by approximately threefold without impairing insulin secretion or calcium signaling responses.

      Strengths:

      The study's major strengths are its clear methodological innovation, experiment optimization, and multiparametric validation. The authors provide compelling evidence that their approach enhances the expression of genetically encoded calcium indicators (GCaMP6m) and integrators (CaMPARI2), preserving both endocrine and exocrine cell functionality. The demonstration of targeted biosensor expression in β-cells and multiplexed imaging of redox and calcium dynamics highlights the versatility of the system. The CaMPARI2-based approach is particularly impactful, as it decouples maximum calcium response assessment from real-time imaging, thereby increasing throughput and reducing bias. The authors successfully apply the technique to samples from non-diabetic, T1D, and T2D donors, revealing disease-relevant alterations in β-cell calcium responses consistent with known physiological dysfunctions. The analysis of islet size versus calcium response further underscores the utility of this platform for probing structure-function relationships in situ.

      Weaknesses:

      The primary limitations are a lack of live/dead assessment to differentiate viability-related effects from methodological improvements, a lack of quantification of the transduction efficiency (while relative efficiency is clearly increased, it is not shown what is absolute efficiency is), lack of IF confirmation of the cell-specific transduction efficiency. These limitations, however, do not detract from the overall strength of the technical advance.

      Overall, this work offers a convincing and practical advance for the diabetes and islet biology community. It substantially improves the toolkit available for live human pancreas studies and will likely catalyze further mechanistic investigations of islet heterogeneity, disease progression, and therapeutic response.

    3. Reviewer #2 (Public review):

      (1) The photoconversion protocol requires a more detailed and quantitative discussion. The current description ("5 s pulses for 5 min, leading to 2.5 min of total light delivery") is too brief to evaluate whether the chosen illumination parameters maintain the CaMPARI2 signal within its linear dynamic range. Because CaMPARI2 photoconversion reflects the time integral of 405 nm photoconverting light exposure in the presence of intracellular [Ca²⁺], the red/green fluorescence ratio is directly proportional to cumulative illumination time until saturation occurs. Previous characterization (PMID: 30361563) shows that photoconversion is approximately linear over the first 0-80 s of 405 nm exposure, after which red fluorescence plateaus. The total exposure used here (=150 s) may therefore exceed the linear regime, potentially obscuring differences between cells with moderate versus strong Ca²⁺ activity. The authors should (i) justify the selected illumination parameters, (ii) provide evidence that the chosen conditions remain within the linear response range for the specific optical setup, (iii) discuss how overexposure might affect quantitative interpretation of red/green ratios and comparisons between experimental groups. Inclusion of calibration data would substantially strengthen the methodological rigor and reproducibility of the study.

      (2) For Figure 8a (middle panels), the data points for 16G and KCl show overlaps, raising the possibility that at it 16G may already be saturated. The authors should comment on the potential for CaMPARI2 saturation at 16G, and clarify whether this affects the interpretation of the KCl results "At maximal stimulation by KCl, there was no size-function correlation (R = 0.15, p = 0.14)."

      (3) The term "calcium activity" is used throughout the manuscript but remains vague. Pancreatic islets typically display a biphasic Ca²⁺ response to high glucose-an initial sustained peak followed by repetitive oscillations - and these phases differ in both kinetics and physiological meaning. Ca²⁺ responses are usually quantified using parameters such as rise time, amplitude, and duration for the initial peak, and amplitude, frequency, burst duration, and duty cycle for the oscillatory phase. The authors should clarify how "calcium activity" is defined in their analyses and discuss the appropriateness of directly comparing Ca²⁺ signals with distinct temporal patterns.

      (4) The CaMPARI2 red/green ratio reflects the time-integral of 405 nm photoconverting light exposure in the presence of Ca²⁺, two Ca²⁺ responses with the same duty cycle but different amplitudes could, in principle, yield the same red/green ratios. This raises an important question regarding how well the CaMPARI2 signal distinguishes differences in Ca²⁺ amplitude versus time spent above threshold. The authors should directly relate single-cell Ca²⁺ traces to corresponding red/green ratios to demonstrate the extent to which CaMPARI2 photoconversion truly reflects "Ca²⁺ activity." Such validation would clarify whether the metric is sensitive to variations in oscillation amplitude, duty cycle, or both, and would strengthen the interpretation of CaMPARI2-based functional comparisons.

    4. Reviewer #3 (Public review):

      Summary:

      Lazimi and coworkers present an updated experimental protocol by which viral vectors can be used with live pancreas slices in order to efficiently transduce fluorescent protein biosensors. This is of high importance, given that live human pancreas slices provide a means to study islet function while maintaining the architecture of the local environment. Thus, efficiently delivering a wide range of fluorescent protein biosensors provides expanded capabilities to study the human islet and its dysfunction in type 1 and type 2 diabetes. The authors demonstrate the improved transduction provided by their revised protocol, which includes orbital culture, while retaining or, in some cases, improving cell viability, hormone release, and Ca2+ responses. Further, the authors demonstrate how a 'Ca2+ integrator', CAMPARI2, can be used to profile the Ca2+ response of large numbers of cells and islets, to capture the variability in islet responses in healthy and diabetic cases.

      Strengths:

      The data presented are generally robust, and the methods are well described, such that this protocol could be repeated by other investigators. All findings are representative of multiple donors. Importantly, the data is highly novel.

      Weaknesses:

      Weaknesses in the manuscript mainly include a lack of technical details by which data is presented or analyzed, as well as caveats by which certain data related to islet size are interpreted.

    1. eLife Assessment

      This paper addresses valuable questions about the evolution of recombination landscape under domestication by examining recombination maps in domesticated chickens and their wild ancestor. However, despite employing a state-of-the-art deep learning method for recombination map inference, the lack of systematic benchmarking and presence of some unexpected patterns raise concerns about the reliability of the inferred maps, thus providing incomplete support for rapid evolution of recombination landscapes. Additionally, due to methodological limitations in testing for intra-genome correlations between evolutionary processes, the current evidence is inadequate to support the associations of recombination with selection and/or introgression in domesticated chickens.

    2. Reviewer #1 (Public review):

      Liu, Li, Ge, and colleagues use whole genome sequence data to estimate the recombination landscape of domesticated chickens and their wild ancestor, Red Junglefowl. They compare landscapes estimated using the deep learning method RelERNN (Adrion et al. 2020) to understand the consequences of domestication for the evolution of recombination. The authors build on previous work in tomato, maize, and other domesticated species to examine how recombination rate and patterning evolve under the demography and selection pressures of domestication. They do so by comparing estimates of local recombination rates across chromosomes and populations, asking if/how well certain sequence and chromatin-based predictors predict recombination rate, and testing for an association between recombination rate and the proportion of introgressed ancestry from Red Junglefowl.

      This study provides evidence for the hypothesis that recombination evolves rapidly in domesticated lineages -- so much so that we see little hotspot sharing between breeds in the present-day! Strengths of the paper include the collection/analysis of data from several domesticated sub-populations and efforts to control for demography and structure in the inference of recombination landscapes (given the challenges of some methods under non-equilibrium demography: https://academic.oup.com/mbe/article/35/2/335/4555533). It is also reassuring to see patterns that have been thoroughly established (e.g., the negative relationship between recombination rate and chromosome size) validated.

      However, I have concerns about the data and methodology.

      (1) My main concern is that the demographic and recombination rate estimates inferred using ~20 whole genomes are likely quite variable and, without quantification of the uncertainty or systematic assessment of the possible biases in the methodology, it is difficult to have confidence in analyses which make use of the RelERNN landscapes.

      (a) Similar studies in rye (https://academic.oup.com/mbe/article/39/6/msac131/6605708) and tomato (https://academic.oup.com/mbe/article/39/1/msab287/6379725) used data from far more individuals (916 individuals split up into populations of size 50 for rye, >75 samples for tomato) to infer recombination maps and conduct downstream analyses. Studies in human genetics make use of an even greater number! The evidence (Lines 189-196 of the main text) that the sample size is sufficient to capture fine-scale variation in recombination is weak. In particular, correlations between the true and estimated recombination rate are based on *equilibrium* demography at sample sizes of 5, 10, and 20, yet used draw the inference "20 samples per population are sufficient to reconstruct their recombination landscapes" under the *non-equilibrium* demography (inferred using SMC+).

      (b) RelERNN learns the recombination landscape by using several signatures (the decay of linkage disequilibrium and, as described in https://academic.oup.com/genetics/advance-article-abstract/doi/10.1093/genetics/iyaf108/8157390, choppiness of the allele frequency spectrum) left in present-day genomes. Both signatures depend strongly on local SNP density. It does not seem the effect of SNP density on the inferred recombination rate is examined, despite the potential for correlated noise in inferred recombination rate (in SNP-sparse regions of the genome) to confound downstream inference.

      (c) It is unclear if the demographic histories for chickens (Figure S6) broadly match what have been previously estimated from whole-genome data, or if a large class of demographic models are compatible with the data (i.e., confidence intervals for the demographic histories are quite large). In Figure S6, its bottlenecks are somewhat weak and affect only a couple of the groups, despite the history of domestication and the expectation that effective sizes vary more widely. The groups affected (LX and WL) are those that have the weakest correlations between recombination rate under the equilibrium and non-equilibrium demographic models.

      (2) The authors test for the effects of chromatin modifications, GC content, etc using correlations between local recombination rate and the features individually. However, joint inference of the effects under a GLM (the distribution of recombination rates is probably better described by, e.g., a Gamma distribution) would permit more straightforward causal inference, given, e.g., the potential effects of chromatin marks on deleterious mutation accumulation. I recognize this likely would not change the direction or significance of the effects in question, but it is worth noting given readers who may want to learn something from the effect sizes and the nature of causes and effects is difficult to disentangle without a multivariate approach.

      Overall:

      Previous work on recombination landscape evolution in birds (namely, the zebra finch and long-tailed finch; Singhal & Leffler 2015) has shown that many hotspots, i.e., small stretches of the genome that experience rates of crossing over that are much higher than the genome-wide average, are conserved over tens of millions of years of evolution. Work in tomato, maize, rye, and other flowering plants with histories of domestication have shown that hotspots can be dynamic. The results of Liu, Li, Ge, and colleagues complement those analyses and will, therefore, be of interest to those working on the evolution of recombination. Additionally, the finding that minor parent ancestry is negatively associated with recombination is interesting to an otherwise general rule in evolutionary biology. Finally, it is quite exciting to see recombination maps inferred using RelERNN, and in a demography-aware fashion!

      That all said, it is difficult to have certainty in the results due to the relatively limited sample size for each of the populations, the lack of control for SNP density, the uncertainty in both recombination maps and demographic histories, and the lack of a joint modelling framework to carefully tease apart effects that are reported in isolation.

    3. Reviewer #2 (Public review):

      Summary:

      Liu et al. use whole genome sequencing data from several strains of chicken as well as a subspecies of the chicken wild ancestor to study the impact of domestication on the recombination landscape. They analyze these data using several machine-learning/AI based methods, using simulation to partially inform their analysis. The authors claim to find substantial deviations in the fine-scale recombination landscape between breeds, and surprising patterns between recombination and introgression/selection. However, there are substantial inconsistencies between the author's findings and the current understanding in the field, supported by indirect evidence that is hard to interpret at best.

      Strengths:

      The data produced by the authors of this and a previous paper is well-suited to answer the questions that they pose. The authors use simulations to support some decisions made in analyzing this data, which partially alleviates some potential questions, and could be extended to address additional concerns. Should further analysis support the claims currently made regarding hotspot turnover and introgression frequency vs. recombination rate, these findings would indeed be striking observations at odds with current understanding in the field.

      Weaknesses:

      I have several major concerns regarding the ability of the analyses to support the claims in this paper, summarized below.

      Substantial deviations from field-standard benchmarks the estimated recombination landscape appear to have been disregarded, particularly with regard to the WL breed.<br /> o For example, the number of detected hotspots per subspecies ranges from maybe 500 to over 100,000 based on figure 2A. While the mean is indeed comparable to estimates from other species (lines 315-317), this characterization masks that each recombination map has far too few or too many hotspots to be biologically accurate (at least without substantial corroboration from more direct analyses). As such, statements about hotspot overlap between breeds and hotspot conservation cannot be taken at face value. Authors might consider using alternative methods to detect hotspots, assessing their power to detect hotspots in each breed, and evaluating hotspot overlap between breeds with respect to random expectation.<br /> o Furthermore, the authors consider the recombination landscape at promoters (Figure S10) and H3K4me3 sites (Figure 2C) and find that levels are slightly elevated, but the magnitude of the elevation (negligible to ~1.5x) is substantially lower than that of any other species studied to date without PRDM9. The magnitude of elevation for both comparisons is especially small for WL, which suggests that the recombination estimates for this breed are particularly noisy, and yet this breed is the focus of the introgression analysis.

      Introgression and strong selection can both be thought of as changing the local Ne along the genome. Estimating recombination from patterns of LD most directly estimates rho (the population recombination rate, 4*Ne*r), and disentangling local changes in Ne from local changes in r is non-trivial. Furthermore, selective sweeps, particularly easy-to-detect hard sweeps, are often characterized by having very little genetic variation. Estimating recombination rate from patterns of LD in regions with very little variation seems particularly challenging, and could bias results such as in Figure S15. The authors do not discuss the implications of these challenges for their analyses, which seems particularly relevant for their analyses of introgression and selection with recombination, as well as comparisons between WL (which the authors report to have undergone more selection and introgression) with other breeds. Authors should quantify their ability/power to detect recombination rates and hotspots under these conditions using simulation - some of these simulations are already mentioned in the paper, but are not analyzed in this way. Also useful would be quantifying the impact of simulated bottlenecks on estimates of recombination rate.

      In many analyses (e.g. hotspot and coldspot overlap, histone mark analysis), authors appear to use 1000 randomly selected regions of the same length as a control. If this characterization is accurate, authors should match the number of control regions to the number of features that they're comparing to. A more careful analysis might also select random regions from the same chromosome, match for GC content where appropriate, etc.

      Authors provide very little detail about the number/locations of coldspots or selective sweeps- how many were detected in each subspecies? Does the fraction of hotspots and coldspots which overlap selective sweeps vary between species? It is unclear whether the numbers in the text (lines 356-364) represent a single breed or an analysis across breeds.

    1. eLife Assessment

      Koch et al. describe a valuable novel methodology, SynSAC, to synchronise cells to analyse meiosis I or meiosis II or mitotic metaphase in budding yeast. The authors present convincing data to validate abscisic acid-induced dimerisation to induce a synthetic spindle assembly checkpoint (SAC) arrest that will be of particular importance to analyse meiosis II. The authors use their approach to determine the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II that will be of interest to the broader meiosis research community.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system, but more work is needed to validate these results, particularly in normal cells.

      Overall, the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      Significance:

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner. Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      Significance:

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

    5. Author response:

      General Statements

      We are delighted that all reviewers found our manuscript to be a technical advance by providing a much sought after method to arrest budding yeast cells in metaphase of mitosis or both meiotic metaphases. The reviewers also valued our use of this system to make new discoveries in two areas. First, we provided evidence that the spindle checkpoint is intrinsically weaker in meiosis I and showed that this is due to PP1 phosphatase. Second, we determined how the composition and phosphorylation of the kinetochore changes during meiosis, providing key insights into kinetochore function and providing a rich dataset for future studies.

      The reviewers also made some extremely helpful suggestions to improve our manuscript, which we will now implement:

      (1) Improvements to the discussion throughout the manuscript. The reviewers recommended that we focus our discussion on the novel findings of the manuscript and drew out some key points of interest that deserve more attention. We fully agree with this and we will address this in a revised version.

      (2) We will add a new supplemental figure to help interpret the mass spectrometry data, to address Reviewer #3, point 4.

      (3) We are currently performing an additional control experiment to address the minor point 1 from reviewer #3. Our experiment to confirm that SynSAC relies on endogenous checkpoint proteins was missing the cell cycle profile of cells where SynSAC was not induced for comparison. We will add this control to our full revision.

      (4) In our full revision we will also include representative images of spindle morphology as requested by Reviewer #1, point 2

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is that it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division. Overall, I have only a few minor suggestions.

      We appreciate the reviewers’ support of our study.

      (1) In wild-type - Pds1 levels are high during M1 and A1, but low in MII. Can the authors comment on this? In line 217, what is meant by "slightly attenuated? Can the authors comment on how anaphase occurs in presence of high Pds1? There is even a low but significant level in MII.

      The higher levels of Pds1 in meiosis I compared to meiosis II has been observed previously using immunofluorescence and live imaging[1–3]. Although the reasons are not completely clear, we speculate that there is insufficient time between the two divisions to re-accumulate Pds1 prior to separase re-activation.

      We agree “slightly attenuated” was confusing and we have re-worded this sentence to read “Addition ABA at the time of prophase release resulted in Pds1securin stabilisation throughout the time course, consistent with delays in both metaphase I and II”.

      We do not believe that either anaphase I or II occur in the presence of high Pds1. Western blotting represents the amount of Pds1 in the population of cells at a given time point. The time between meiosis I and II is very short even when treated with ABA. For example, in Figure 2B, spindle morphology counts show that the anaphase I peak is around 40% at its maxima (105 min) and around 40% of cells are in either metaphase I or metaphase II, and will be Pds1 positive. In contrast, due to the better efficiency of meiosis II, anaphase II hardly occurs at all in these conditions, since anaphase II spindles (and the second nuclear division) are observed at very low frequency (maximum 10%) from 165 minutes onwards. Instead, metaphase II spindles partially or fully breakdown, without undergoing anaphase extension. Taking Pds1 levels from the western blot and the spindle data together leads to the conclusion that at the end of the time-course, these cells are biochemically in metaphase II, but unable to maintain a robust spindle. Spindle collapse is also observed in other situations where meiotic exit fails, and potentially reflects an uncoupling of the cell cycle from the programme governing gamete differentiation[3–5]. We will explain this point in a revised version while referring to representative images that from evidence for this, as also requested by the reviewer below.

      (2) The figures with data characterizing the system are mostly graphs showing time course of MI and MII. There is no cytology, which is a little surprising since the stage is determined by spindle morphology. It would help to see sample sizes (ie. In the Figure legends) and also representative images. It would also be nice to see images comparing the same stage in the SynSAC cells versus normal cells. Are there any differences in the morphology of the spindles or chromosomes when in the SynSAC system?

      This is an excellent suggestion and will also help clarify the point above. We will provide images of cells at the different stages. For each timepoint, 100 cells were scored. We have already included this information in the figure legends 

      (3) A possible criticism of this system could be that the SAC signal promoting arrest is not coming from the kinetochore. Are there any possible consequences of this? In vertebrate cells, the RZZ complex streams off the kinetochore. Yeast don't have RZZ but this is an example of something that is SAC dependent and happens at the kinetochore. Can the authors discuss possible limitations such as this? Does the inhibition of the APC effect the native kinetochores? This could be good or bad. A bad possibility is that the cell is behaving as if it is in MII, but the kinetochores have made their microtubule attachments and behave as if in anaphase.

      In our view, the fact that SynSAC does not come from kinetochores is a major advantage as this allows the study of the kinetochore in an unperturbed state. It is also important to note that the canonical checkpoint components are all still present in the SynSAC strains, and perturbations in kinetochore-microtubule interactions would be expected to mount a kinetochore-driven checkpoint response as normal. Indeed, it would be interesting in future work to understand how disrupting kinetochore-microtubule attachments alters kinetochore composition (presumably checkpoint proteins will be recruited) and phosphorylation but this is beyond the scope of this work. In terms of the state at which we are arresting cells – this is a true metaphase because cohesion has not been lost but kinetochore-microtubule attachments have been established. This is evident from the enrichment of microtubule regulators but not checkpoint proteins in the kinetochore purifications from metaphase I and II. While this state is expected to occur only transiently in yeast, since the establishment of proper kinetochore-microtubule attachments triggers anaphase onset, the ability to capture this properly bioriented state will be extremely informative for future studies. We appreciate the reviewers’ insight in highlighting these interesting discussion points which we will include in a revised version.

      Reviewer #1 (Significance):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

      We appreciate the reviewer’s enthusiasm for our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so-named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      I have only a couple of minor comments:

      (1) I would add the Suppl Figure 1A to main Figure 1A. What is really exciting here is the arrest in metaphase II, so I don't understand why the authors characterize metaphase I in the main figure, but not metaphase II. But this is only a suggestion.

      This is a good suggestion, we will do this in our full revision.

      (2) Line 197, the authors state: “...SyncSACinduced a more pronounced delay in metaphase II than in metaphase I”. However, line 229 and 240 the authors talk about a "longer delay in metaphase <i compared to metaphase II"... this seems to be a mix-up.

      Thank you for pointing this out, this is indeed a typo and we have corrected it.

      (3) The authors describe striking differences for both protein abundance and phosphorylation for key kinetochore associated proteins. I found one very interesting protein that seems to be very abundant and phosphorylated in metaphase I but not metaphase II, namely Sgo1. Do the authors think that Sgo1 is not required in metaphase II anymore? (Top hit in suppl Fig 8D).

      This is indeed an interesting observation, which we plan to investigate as part of another study in the future. Indeed, data from mouse indicates that shugoshin-dependent cohesin deprotection is already absent in meiosis II in mouse oocytes[6], though whether this is also true in yeast is not known. Furthermore, this does not rule out other functions of Sgo1 in meiosis II (for example promoting biorientation). We will include this point in the discussion.

      Reviewer #2 (Significance):

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner.

      We are grateful to the reviewer for their support.

      Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      For many purposes the enrichment and extended time for sample collection is sufficient, as we demonstrate here. However, as pointed out by the reviewer below, the system can be improved by use of the 4A-RASA mutations to provide a stronger arrest (see our response below). We did not experiment with higher ABA concentrations or repeated addition since the very robust arrest achieved with the 4A-RASA mutant deemed this unnecessary.

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      We agree that the 4A-RASA mutant is the best tool to use for the arrest and going forward this will be our approach. We collected the proteomics data and the data on the SynSAC mutant variants concurrently, so we did not know about the improved arrest at the time the proteomics experiment was done. Because very good arrest was already achieved with the unmutated SynSAC construct, we could not justify repeating the proteomics experiment which is a large amount of work using significant resources. However, we will highlight the potential of the 4A-RASA mutant more prominently in our full revision.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      We agree these are intriguing findings that highlight key differences as to the wiring of the spindle checkpoint in meiosis and mitosis and potential for future studies, however, currently we can only speculate as to the underlying cause. The effect of the RASA mutation in mitosis is unexpected and unexplained. However, the fact that the 4A-RASA mutation causes a stronger delay in meiosis I compared to mitosis can be explained by a greater prominence of PP1 phosphatase in meiosis. Indeed, our data (Figure 4A) show that the PP1 phosphatase Glc7 and its regulatory subunit Fin1 are highly enriched on kinetochores at all meiotic stages compared to mitosis.

      We agree that the improved growth of the RVAF mutant is intriguing and points to a role of Aurora B-mediated phosphorylation, though previous work has not supported such a role [7].

      We will include a discussion of these important points in a revised version.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      While we agree with the reviewer that at first glance, normalising to no tag appears to be the most appropriate normalisation, in practice there is very low background signal in the no tag sample which means that any random fluctuations have a big impact on the final fold change used for normalisation. This approach therefore introduces artefacts into the data rather than improving normalisation.

      To provide reassurance that our kinetochore immunoprecipitations are specific, and that the background (no tag) signal is indeed very low, we will provide a new supplemental figure showing the volcanos comparing kinetochore purifications at each stage with their corresponding no tag control.

      It is also important to note that our experiment looks at relative changes of the same protein over time, which we expect to be relatively small in the whole cell lysate. We previously documented proteins that change in abundance in whole cell lysates throughout meiosis[8]. In this study, we found that relatively few proteins significantly change in abundance.

      Our aim in the current study was to understand how the relative composition of the kinetochore changes and for this, we believe that a direct comparison to Dsn1, a central kinetochore protein which we immunoprecipitated is the most appropriate normalisation.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      We strongly agree with this point and we will re-frame the discussion to focus on the novel findings, as also raised by the other reviewers.

      Finally, minor concerns are:

      (1) Meiotic progression in SynSAC strains lacking Mad1, Mad2 or Mad3 is severely affected (Fig. 1D and Supp. Fig. 1), making it difficult to assess whether, as the authors state, the metaphase delays depend on the canonical SAC cascade. In addition, as a general note, graphs displaying meiotic time courses could be improved for clarity (e.g., thinner data lines, addition of axis gridlines and external tick marks, etc.).

      We will generate the data to include a checkpoint mutant +/- ABA for direct comparison. We will take steps to improve the clarity of presentation of the meiotic timecourse graphs, though our experience is that uncluttered graphs make it easier to compare trends.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (3) It is surprising that, although SAC activity is proposed to be weaker in metaphase I, the levels of CPC/SAC proteins seem to be higher at this stage of meiosis than in metaphase II or mitotic metaphase (Fig. 4A-B).

      We agree, this is surprising and we will point this out in the revised discussion. We speculate that the challenge in biorienting homologs which are held together by chiasmata, rather than back-to-back kinetochores results in a greater requirement for error correction in meiosis I. Interestingly, the data with the RASA mutant also point to increased PP1 activity in meiosis I, and we additionally observed increased levels of PP1 (Glc7 and Fin1) on meiotic kinetochores, consistent with the idea that cycles of error correction and silencing are elevated in meiosis I.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (5) Several typographical errors should be corrected (e.g., "Knetochores" in Fig. 4 legend, "250uM ABA" in Supp. Fig. 1 legend, etc.)

      Thank you for pointing these out, they have been corrected.

      Reviewer #3 (Significance):

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have only corrected minor typos as detailed above.

      Description of analyses that authors prefer not to carry out

      The revisions we plan are detailed above. There are just two revisions we believe are either unnecessary or beyond the scope, both minor concerns of Reviewer #3. For clarity we have reproduced them, along with our justification below. In the latter case, the reviewer also acknowledged that further work in this direction is beyond the scope of the current study.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (1) Salah, S.M., and Nasmyth, K. (2000). Destruction of the securin Pds1p occurs at the onset of anaphase during both meiotic divisions in yeast. Chromosoma 109, 27–34.

      (2) Matos, J., Lipp, J.J., Bogdanova, A., Guillot, S., Okaz, E., Junqueira, M., Shevchenko, A., and Zachariae, W. (2008). Dbf4-dependent CDC7 kinase links DNA replication to the segregation of homologous chromosomes in meiosis I. Cell 135, 662–678.

      (3) Marston, A.L.A.L., Lee, B.H.B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Developmental cell 4, 711–726. https://doi.org/10.1016/S1534-5807(03)00130-8.

      (4) Attner, M.A., and Amon, A. (2012). Control of the mitotic exit network during meiosis. Molecular Biology of the Cell 23, 3122–3132. https://doi.org/10.1091/mbc.E12-03-0235.

      (5) Pablo-Hernando, M.E., Arnaiz-Pita, Y., Nakanishi, H., Dawson, D., del Rey, F., Neiman, A.M., and de Aldana, C.R.V. (2007). Cdc15 Is Required for Spore Morphogenesis Independently of Cdc14 in Saccharomyces cerevisiae. Genetics 177, 281–293. https://doi.org/10.1534/genetics.107.076133.

      (6) El Jailani, S., Cladière, D., Nikalayevich, E., Touati, S.A., Chesnokova, V., Melmed, S., Buffin, E., and Wassmann, K. (2025). Eliminating separase inhibition reveals absence of robust cohesin protection in oocyte metaphase II. EMBO J 44, 5187–5214. https://doi.org/10.1038/s44318-025-00522-0.

      (7) Rosenberg, J.S., Cross, F.R., and Funabiki, H. (2011). KNL1/Spc105 Recruits PP1 to Silence the Spindle Assembly Checkpoint. Current Biology 21, 942–947. https://doi.org/10.1016/j.cub.2011.04.011.

      (8) Koch, L.B., Spanos, C., Kelly, V., Ly, T., and Marston, A.L. (2024). Rewiring of the phosphoproteome executes two meiotic divisions in budding yeast. EMBO J 43, 1351–1383. https://doi.org/10.1038/s44318-024-00059-8.

    1. eLife Assessment

      This work offers important insights into the protein CHD4's function in chromatin remodeling and gene regulation in embryonic stem cells, supported by extensive biochemical, genomic, and imaging data. The use of an inducible degron system allows precise functional analysis, and the datasets generated represent a key resource for the field. The revised study offers compelling evidence and makes a significant contribution to understanding CHD4's role in epigenetic regulation. This work will be of interest to the epigenetics and stem biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed an elegant investigation to clarify the roles of CHD4 in chromatin accessibility and transcription regulation. In addition to the common mechanisms of action through nucleosome repositioning and opening of transcriptionally active regions, the authors considered here a new angle of CHD4 action through modulating the off rate of transcription factor binding. Their suggested scenario is that the action of CHD4 is context-dependent and is different for highly-active regions vs low-accessibility regions.

      Strengths:

      This is a very well-written paper that will be of interest to researchers working in this field. The authors performed large work with different types of NGS experiments and the corresponding computational analyses. The combination of biophysical measurements of the off-rate of protein-DNA binding with NGS experiments is particularly commendable.

      Comments on revised version:

      The authors have addressed all my points

    3. Reviewer #2 (Public review):

      This study leverages acute protein degradation of CHD4 to define its role in chromatin and gene regulation. Previous studies have relied on KO and/or RNA interference of this essential protein and as such are hampered by adaptation, cell population heterogeneity, cell proliferation and indirect effects. The authors have established an AID2-based method to rapidly deplete the dMi-2 remodeller to circumvent these problems. CHD4 is gone within an hour, well before any effects on cell cycle or cell viability can manifest. This represents an important technical advance that, for the first time, allows a comprehensive analysis of the immediate and direct effect of CHD4 loss of function on chromatin structure and gene regulation.

      Rapid CHD4 degradation is combined with ATAC-seq, CUT&RUN, (nascent) RNA-seq and single molecule microscopy to comprehensively characterise the impact on chromatin accessibility, histone modification, transcription and transcription factor (NANOG, SOX2, KLF4) binding in mouse ES cells.

      The data support the previously developed model that high levels of CHD4/NuRD maintain a degree of nucleosome density to limit TF binding at open regulatory regions (e.g. enhancers). The authors propose that CHD4 activity at these sites is an important prerequisite for enhancers to respond to novel signals that require an expanded or new set of TFs to bind.

      What I find even more exciting and entirely novel is the finding that CHD4 removes TFs from regions of limited accessibility to repress cryptic enhancers and to suppress spurious transcription. These regions are characterised by low CHD4 binding and have so far never been thoroughly analysed. The authors correctly point out that the general assumption that chromatin regulators act on regions where they seem to be concentrated (i.e. have high ChIP-seq signals) runs the risk of overlooking important functions elsewhere. This insight is highly relevant beyond the CHD4 field and will prompt other chromatin researchers to look into low level binding sites of chromatin regulators.

      The biochemical and genomic data presented in this study is of high quality (I cannot judge single microscopy experiments due to my lack of expertise). This is an important and timely study that is of great interest to the chromatin field.

      Comments on revised version:

      All my comments below have been addressed in the revised version of the manuscript.

      The revised manuscript provides a significant advance of our understanding of how the nucleosome remodeler CHD4 exerts its function. In particular, the findings suggest an intriguing role of CHD4 in TF removal at genomic regions where only low levels of CHD4 can be detected. In the future, it will be interesting to see if this activity is shared by other ATP-dependent nucleosome remodelers.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript an inducible degron approach is taken to investigate the function of the CHD4 chromatin remodelling complex. The cell lines and approaches used are well thought out and the data appear to be of high quality. They show that loss of CHD4 results in rapid changes to chromatin accessibility at thousands of sites. At the majority of locations where changes are detected, chromatin accessibility is decreased and these sites are strongly bound by CHD4 prior to activation of the degron and so likely represent primary sites of action. Somewhat surprisingly while chromatin accessibility is reduced at these sites transcription factor occupancy is little changed. Following CHD4 degradation occupancy of the key pluripotency transcription factors NANOG and SOX2 increases at many locations genome wide and at many of these sites chromatin accessibility increases. These represent important new insights into the function of CHD4 complexes.

      Strengths:

      The experimental approach is well suited to providing insight into a complex regulator such as CHD4. The data generated to characterise how cells respond to loss of CHD4 is of high quality. The study reveals major changes in transcription factor occupancy following CHD4 depletion.

      Weaknesses:

      The main weakness can be summarised as relating to the fact authors favour the interpretation that all rapid changes following CHD4 degradation occur as a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially very low (e.g sites where accessibility is gained, in comparison to that at sites where chromatin acdessibility is lost). The revised discussion acknowledges rapid indirect effects cannot be excluded.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level). 

      This is now expanded in the Discussion

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable. 

      We have normalised figure text as much as is feasible.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text. 

      We have removed references to these terms from the text and included a definition in the figure legend. 

      (4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG". 

      We have removed this panel as it was confusing and did not demonstrate any robust conclusion. 

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section. 

      We have included an explanation of the curve fitting equation in the Methods as suggested.

      The apparent dissociation rate observed is a sum of multiple rates of decay – true dissociation rate (k<sub>off</sub>), signal loss caused by photobleaching k<sub>pb</sub>, and signal loss caused by defocusing/tracking error (k<sub>tl</sub>).

      k<sub>off</sub><sup>app</sup> = k<sub>off</sub>+ k<sub>pb</sub> + k<sub>tl</sub>

      We are making conclusions about relative changes in k<sub>off</sub><sup>app</sup> upon CHD4 depletion, not about the absolute magnitude of true in k<sub>off</sub> or TF residence times.Our conclusions extend to true in k<sub>off</sub> on the assumption that k<sub>pb</sub> and k<sub>tl</sub> are equal across all samples imaged due to identical experimental conditions and analysis. k<sub>pb</sub> and k<sub>tl</sub> vary hugely across experimental set-ups, especially with different laser powers, so other k<sub>off</sub> or k<sub>off</sub><sup>app</sup> values reported in the literature would be expected to differ from ours. Time-lapse experiments or independent determination of k<sub>pb</sub> (and k<sub>tl</sub>) would be required to make any statements about absolute values of k<sub>off</sub>

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1). 

      We have now included a discussion of this point and referenced both papers.

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text. 

      We have endeavoured to define all relevant terms in the figure legends. 

      Reviewer #2 (Public review): 

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of folddifference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation. 

      The heatmap displays z-scores, meaning expression for each gene has been centred and scaled across the entire time course. As a result, time zero is not a true baseline, it simply shows whether the gene’s expression at that moment is above or below its own mean. A transition from blue to red therefore indicates that the gene increases relative to its overall average, which typically corresponds to upregulation, but it doesn’t directly represent fold-change from the 0-hour time point. We have now included a brief explanation of this in the figure legend to make this point clear.  

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)? 

      We have edited the text to more accurately reflect what is going on in the screen shot. We have also replaced “WT” with “0” as this more accurately reflects the status of these cells. 

      (3) There is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself. 

      We now include more speculation on this point in the Discussion.

      Reviewer #3 (Public review): 

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low. 

      We acknowledge that we cannot definitively say any effect is a direct consequence of CHD4 depletion and have mitigated statements in the Results and Discussion. 

      Reviewing Editor Comments: 

      I am pleased to say all three experts had very complementary and complimentary comments on your paper - congratulations. Reviewer 3 does suggest toning down a few interpretations, which I suggest would help focus the manuscript on its greater strengths. I encourage a quick revision to this point, which will not go back to reviewers, before you request a version of record. I would also like to take this opportunity to thank all three reviewers for excellent feedback on this paper. 

      As advised we have mitigated the points raised by the reviewers. 

      Reviewer #2 (Recommendations for the authors): 

      p9, top: The sentence starting with "Genes increasing in expression after four hours...." is very difficult to understand and should be rephrased or broken up. 

      We agree. This has been completely re-written. 

      Reviewer #3 (Recommendations for the authors): 

      Sites of increased chromatin accessibility emerge more slowly than sites of lost chromatin accessibility. Figure 1D, a little increase in accessibility at 30min, but a more noticeable decrease at 30min. The sites of increased accessibility also have lower absolute accessibility than observed at locations where accessibility is lost. This raises the possibility that the sites of increased accessibility represent rapid but indirect changes occurring following loss of CHD4. Consistent with this, enrichment for CHD4 and MDB3 by CUT and TAG is far higher at sites of decreased accessibility. The low level of CHD4 occupancy observed at sites where accessibility increases may not be relevant to the reason these sites are affected. Such small enrichments can be observed when aligning to other genomic features. The authors interpret their findings as indicating that low occupancy of CHD4 exerts a long-lasting repressive effect at these locations. This is one possible explanation; however, an alternative is that these effects are indirect. Perhaps driven by the very large increase in TF binding that is observed following CHD4 degradation and which appears to occur at many locations regardless of whether CHD4 is present. 

      The reviewer is right to point out that we don’t know what is direct and what is indirect. All we know is that changes happen very rapidly upon CHD4 depletion. The changes in standard ATAC-seq signal appear greater at the sites showing decreased accessibility than those increasing, however the starting points are very different: a small increase from very low accessibility will likely be a higher fold change than a more visible decrease from very high accessibility (Fig. 1D). In contrast, Figure 6 shows a more visible increase in Tn5 integrations at sites increasing in accessibility at 30 minutes than the change in sites decreasing in accessibility at 30 minutes. We therefore disagree that the sites increasing in accessibility are more likely to be indirect targets. In further support of this, there is a rapid increase in MNase resistance at these sites upon MBD3 reintroduction (Fig. 6I), possibly indicating a direct impact of NuRD on these sites. 

      Substantial changes in Nanog and SOX2 binding are observed across the time course. These changes are very large, with 43k or 78k additional sites detected. How is this possible? Does the amount of these TF's present in cells change? The argument that transient occupancy of CHD4 acts to prevent TF's binding to what is likely to be many 100's of thousands of sites (if the data for Nanog and SOX2 are representative of other transcription factors such as KLF4) seems unlikely. 

      The large number of different sites identified gaining TF binding is likely to be a reflection of the number of cells being analysed: within the 10<sup>5</sup>-10<sup>6</sup> cells used for a Cut&Run experiment we detect many sites gaining TF binding. In individual cells we agree it would be unlikely for that many sites to become bound at the same time. We detect no changes in the amounts of Nanog or Sox2 in our cells across 4 hour CHD4 depletion time course. However, we maintain that low frequency interactions of CHD4 with a site can counteract low frequency TF binding and prevent it from stimulating opening of a cryptic enhancer. 

      While increased TF binding is observed at sites of gained accessibility, the changes in TF occupancy at the lost sites do not progress continuously across the time course. In addition, the changes in occupancy are small in comparison to those observed at the gained sites. The text comments on an increase in SOX2 and Nanog occupancy at 30 min, but there is either no change or a loss by 4 hours. It's difficult to know what to conclude from this. 

      At sites losing accessibility the enrichment of both Nanog and Sox2 increases at 30 minutes. We suspect this is due to the loss of CHD4’s TF-removal activity. Thereafter the two TFs show different trends: Nanog enrichment then decreases again, probably due to the decrease in accessibility at these sites. Sox2, by contrast, does not change very much, possibly due to its higher pioneering ability. It is true that the amounts of change are very small here, however Cut&Run was performed in triplicate and the summary graphs are plotted with standard error of the mean (which is often too small to see), demonstrating that the detected changes are highly significant. (We neglected to refer to the SEM  in our figure legends: this has now been corrected.) At sites where CHD4 maintains chromatin compaction, the amount of transcription factor binding goes from zero or nearly zero to some finite number, hence the fold change is very large. In contrast the changes at sites losing accessibility starts from high enrichment so fold changes are much smaller. 

      Changes in the diffusive motion of tagged TF's are measured. The data is presented as an average of measurements of individual TF's. What might be anticipated is that subpopulations of TF's would exhibit distinct behaviours. At many locations, occupancy of these TF's are presumably unchanged. At 1 hour, many new sites are occupied, and this would represent a subpopulation with high residence. A small population of TF's would be subject to distinct effects at the sites where accessibility reduces at the onehour time point. The analysis presented fails to distinguish populations of TF's exhibiting altered mobility consistent with the proportion of the TF's showing altered binding. 

      We agree that there are likely subpopulations of TFs exhibiting distinct binding behaviours, and our modality of imaging captures this, but to distinguish subpopulations within this would require a lot more data.

      However, there is no reason to believe that the TF binding at the new sites being occupied at 1 hr would have a difference in residence time to those sites already stably bound by TFs in the wildtype, i.e. that they would exhibit a different limitation to their residence time once bound compared to those sites. We do capture more stably bound trajectories per cell, but that’s not what we’re reporting on - it’s the dissociation rate of those that have already bound in a stable manner at sites where TF occupancy is detected also by ChIP.

      The analysis of transcription shown in Figure 2 indicates that high-quality data has been obtained, showing progressive changes to transcription. The linkage of the differentially expressed genes to chromatin changes shown in Figure 3 is difficult to interpret. The curves showing the distance distribution for increased or decreased DARs are quite similar for up- and down-regulated genes. The frequency density for gained sites is slightly higher, but not as much higher as would be expected, given these sites are c6fold more abundant than the sites with lost accessibility. The data presented do not provide a compelling link between the CHD4-induced chromatin changes and changes to transcription; the authors should consider revising to accommodate this. It is possible that much of the transcriptional response even at early time points is indirect. This is not unprecedented. For example, degradation of SOX2, a transcriptional activator, results in both repression and activation of similar numbers of genes https://pmc.ncbi.nlm.nih.gov/articles/PMC10577566/ 

      We agree that these figures do not provide a compelling link between the observed chromatin changes and gene expression changes. That 50K increased sites are, on average, located farther away from misregulated genes than are the 8K decreasing sites highlights that this is rarely going to be a case of direct derepression of a silenced gene, but rather distal sites could act as enhancers to spuriously activate transcription. This would certainly be a rare event, but could explain the low-level transcriptional noise seen in NuRD mutants. We have edited the wording to make this clearer.

      The model presented in Figure 7 includes distinct roles at sites that become more or less accessible following inactivation of CHD4. This is perplexing as it implies that the same enzymes perform opposing functions at some of the different sites where they are bound. 

      Our point is that it does the same thing at both kinds of sites, but the nature of the sites means that the consequences of CHD4 activity will be different. We have tried to make this clear in the text. 

      At active sites, it is clear that CHD4 is bound prior to activation of the degron and that chromatin accessibility is reduced following depletion. Changes in TF occupancy are complex, perhaps reflecting slow diffusion from less accessible chromatin and a global increase in the abundance of some pluripotency transcription factors such as SOX2 and Nanog that are competent for DNA binding. The link between sites of reduced accessibility and transcription is less clear. 

      At the inactive sites, the increase in accessibility could be driven by transcription factor binding. There is very little CHD4 present at these sites prior to activation of the degron, and TF binding may induce chromatin opening, which could be considered a rapid but indirect effect of the CHD4 degron. The link to transcription is not clear from the data presented, but it would be anticipated that in some cases it would drive activation. 

      We acknowledge these points and have indicated this possibility in the Results and the Discussion.

      No Analysis is performed to identify binding sequences enriched at the locations of decreased accessibility. This could potentially define transcription factors involved in CHD4 recruitment or that cause CHD4 to function differently in different contexts. 

      HOMER analyses failed to provide any unique insights. The sites going down are highly accessible in ES cells: they have TF binding sites that one would expect in ES cells. The increasing sites show an enrichment for G-rich sequences, which reflects the binding preference of CHD4.

    1. eLife Assessment

      This valuable study presents Altair-LSFM, a well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and reduced cost. The approach provides compelling evidence of its strengths, including the use of custom-machined baseplates, detailed assembly instructions, and demonstrated live-cell imaging capabilities. This manuscript will be of interest to microscopists and potentially biologists seeking accessible LSFM tools.

    2. Reviewer #1 (Public review):

      Summary:

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths:

      The article includes extensive supplementary material that complements the information in the main article.

      Live imaging has been incorporated, as requested, increasing the value of the paper.

      Weaknesses:

      None

    3. Reviewer #2 (Public review):

      Summary:

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source light-sheet microscope, that may be relatively easy to align and construct due to a custom-designed mounting plate. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or achieve high-resolution but are difficult to construct and are unstable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for a high-resolution, economical and easy to implement LSFM systems and address this need with Altair.

      Strengths:

      The authors succeed in their goals of implementing a relatively low cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      Weaknesses:

      There is still a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is now discussed in the manuscript but remains a limitation in the currently implemented design.

      (2) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. In the revised manuscript the authors now implement temperature control, but ideal live cell imaging conditions that would include gas and humidity control are not implemented. While, as the authors note, other microscopes that lack full environmental control have achieved widespread adoption, in my view this still limits the use cases of this microscope. There is no discussion on how this limitation of environmental control may be overcome in future iterations.

      (3) While the microscope is well designed and completely open source it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested even if they can afford it. Claims on how easy it may be to align the system for a "Novice" in supplementary table 5, appear to be unsubstantiated and should be removed unless a Novice was indeed able to assemble and validate the system in 2 weeks. It seems that these numbers were just arbitrarily proposed in the current version without any testing. In our experience it's hard to predict how long an alignment will take for a novice.

      (4) There is no quantification on field uniformity and the tunability of the light sheet parameters (FOV, thickness, PSF, uniformity). There is no quantification on how much improvement is offered by the resonant and how its operation may alter the light-sheet power, uniformity and the measured PSF.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging.

      The system is designed for ease of assembly and use, incorporating a custom-machined baseplate and in silico optimized optical paths to ensure robust alignment and performance.

      The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy.

      Compelling validation using fluorescent beads multicolor cellular imaging and dual-color live-cell imaging highlights the system's performance. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers that want to implement such a system, providing also estimate costs and a detailed description of needed expertises.

      Strengths:

      - Strong and accessible technical innovation.

      With an elegant combination of beam shaping and optical modelling, the authors provide a high resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of thin light-sheet and small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      - Impeccable optical performances and ease of mounting of samples

      The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity. At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      -Transparency and comprehensiveness of documentation and resources.

      A very detailed protocol provides detailed documentation about the setup, the optical modeling and the total cost.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. At this stage, I believe the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community.

      Comments on revisions:

      I appreciate the details and the care expressed by the authors in answering all my concerns, both the bigger ones (lack of live cell imaging demonstration) and to the smaller ones (about data storage, costs, expertise needed, and so on). The manuscript has been greatly improved, and I have no other comments to make.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the editors and reviewers for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to promote accessibility and reproducibility. Below, we provide point-by-point responses to each referee comment. In the process, we have significantly revised the manuscript to include live-cell imaging data and a quantitative evaluation of imaging speed. We now more explicitly describe the different variants of lattice light-sheet microscopy—highlighting differences in their illumination flexibility and image acquisition modes—and clarify how Altair-LSFM compares to each. We further discuss challenges associated with the 5 mm coverslip and propose practical strategies to overcome them. Additionally, we outline cost-reduction opportunities, explain the rationale behind key equipment selections, and provide guidance for implementing environmental control. Altogether, we believe these additions have strengthened the manuscript and clarified both the capabilities and limitations of AltairLSFM.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths: 

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      We thank the reviewer for their thoughtful assessment and for recognizing the strengths of our manuscript, including the extensive supplementary material. Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. We would therefore greatly appreciate the reviewer’s guidance on which sections were perceived as superficial so that we can expand them to better support readers and builders of the system.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to substantially reduce the barrier to entry for non-specialist laboratories. Many open-source implementations, such as OpenSPIM, OpenSPIN, and Benchtop mesoSPIM, similarly focused on accessibility and reproducibility rather than introducing new optical modalities, yet have had a measureable impact on the field by enabling broader community participation. Altair-LSFM follows this tradition, providing sub-cellular resolution performance comparable to advanced systems like LLSM, while emphasizing reproducibility, ease of construction through a precision-machined baseplate, and comprehensive documentation to facilitate dissemination and adoption.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We appreciate the reviewer’s comment and agree that there are practical challenges associated with handling 5 mm diameter coverslips in this configuration. In the revised manuscript, we now explicitly describe these challenges and provide practical solutions. Specifically, we highlight the use of a custommachined coverslip holder designed to simplify mounting and handling, and we direct readers to an alternative configuration using the Zeiss W Plan-Apochromat 20×/1.0 objective, which eliminates the need for small coverslips altogether.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We appreciate the reviewer’s perspective and understand the concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. Our decision to use these components was intentional: relying on a unified, professionally supported and maintained platform minimizes complexity associated with sourcing, configuring, and integrating hardware from multiple vendors, thereby reducing non-financial barriers to entry for non-specialist users.

      Importantly, these components are not the primary cost driver of Altair-LSFM (they represent roughly 18% of the total system cost). Nonetheless, for individuals where the price is prohibitive, we also outline several viable cost-reduction options in the revised manuscript (e.g., substituting manual stages, omitting the filter wheel, or using industrial CMOS cameras), while discussing the trade-offs these substitutions introduce in performance and usability. These considerations are now summarized in Supplementary Note 1, which provides a transparent rationale for our design and cost decisions.

      Finally, we note that even with these professional-grade components, Altair-LSFM remains substantially less expensive than commercial systems offering comparable optical performance, such as LLSM implementations from Zeiss or 3i.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our data. As noted, the current manuscript focuses on validating the optical performance and resolution of the system using fixed specimens to ensure reproducibility and stability.

      We fully agree on the importance of environmental control for live-cell imaging. In the revised manuscript, we now describe in detail how temperature regulation can be achieved using a custom-designed heated sample chamber, accompanied by detailed assembly instructions on our GitHub repository and summarized in Supplementary Note 2. For pH stabilization in systems lacking a 5% CO₂ atmosphere, we recommend supplementing the imaging medium with 10–25 mM HEPES buffer. Additionally, we include new live-cell imaging data demonstrating that Altair-LSFM supports in vitro time-lapse imaging of dynamic cellular processes under controlled temperature conditions.

      Reviewer #2 (Public review): 

      Summary: 

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems. 

      We thank the reviewer for their thoughtful summary. We agree that existing open-source systems primarily emphasize imaging of large specimens, whereas commercial systems that achieve sub-cellular resolution remain costly and complex. Our aim with Altair-LSFM was to bridge this gap—providing LLSM-level performance in a substantially more accessible and reproducible format. By combining high-NA optics with a precision-machined baseplate and open-source documentation, Altair offers a practical, high-resolution solution that can be readily adopted by non-specialist laboratories.

      Strengths: 

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful and generous assessment of our work. We are pleased that the manuscript’s emphasis on fundamental optical principles, design rationale, and practical implementation was clearly conveyed. We agree that Altair’s modular and accessible architecture provides a strong foundation for future variants tailored to specific experimental needs. To facilitate this, we have made all Zemax simulations, CAD files, and build documentation openly available on our GitHub repository, enabling users to adapt and extend the system for diverse imaging applications.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      We thank the reviewer for this thoughtful and constructive comment. We have revised the manuscript to more clearly distinguish between the original open-source implementation of LLSM and subsequent commercial versions by 3i and ZEISS. The revised Introduction and Discussion now explicitly note that while open-source and early implementations of LLSM can require expert alignment and maintenance, commercial systems—particularly the ZEISS Lattice Lightsheet 7—are designed for automated operation and stable, turn-key use, albeit at higher cost and with limited modifiability. We have also moderated earlier language regarding usability and stability to avoid anecdotal phrasing.

      We also now provide a more objective proxy for system complexity: the number of optical elements that require precise alignment during assembly and maintenance thereafter. The original open-source LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair-LSFM system contains only nine such elements. By this metric, Altair-LSFM is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories.

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We thank the reviewer for this helpful comment. We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may pose a practical limitation for some users. We now discuss this more explicitly in the revised manuscript. Specifically, we note that replacing the detection objective provides a straightforward solution to this constraint. For example, as demonstrated by Moore et al. (Lab Chip, 2021), pairing the Zeiss W Plan-Apochromat 20×/1.0 detection objective with the Thorlabs TL20X-MPL illumination objective allows imaging beyond the physical surfaces of both objectives, eliminating the need for small-format coverslips. In the revised text, we propose this modification as an accessible path toward greater compatibility with conventional sample mounting formats. We also note in the Discussion that Oblique Plane Microscopy (OPM) inherently avoids such nonstandard mounting requirements and, owing to its single-objective architecture, is fully compatible with standard environmental chambers.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We thank the reviewer for this important observation and agree that environmental control is critical for live-cell imaging applications. It is worth noting that the original open-source LLSM design, as well as the commercial version developed by 3i, provided temperature regulation but did not include integrated control of CO2 or humidity. Despite this limitation, these systems have been widely adopted and have generated significant biological insights. We also acknowledge that both OPM and the ZEISS implementation of LLSM offer clear advantages in this respect, providing compatibility with standard commercial environmental chambers that support full regulation of temperature, CO₂, and humidity.

      In the revised manuscript, we expand our discussion of environmental control in Supplementary Note 2, where we describe the Altair-LSFM chamber design in more detail and discuss its current implementation of temperature regulation and HEPES-based pH stabilization. Additionally, the Discussion now explicitly notes that OPM avoids the challenges associated with non-standard sample mounting and is inherently compatible with conventional environmental enclosures.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We agree that the original LLSM design offers substantially greater flexibility than what is reflected in our initial comparison, including the ability to generate multiple lattice geometries (e.g., square and hexagonal), operate in structured illumination mode, and acquire volumes using both sample- and lightsheet–scanning strategies. To address this, we now include Supplementary Note 3 that provides a detailed overview of the illumination modes and imaging flexibility afforded by the original LLSM implementation, and how these capabilities compare to both the commercial ZEISS Lattice Lightsheet 7 and our AltairLSFM system. In addition, we have revised the discussion to explicitly acknowledge that the original LLSM could operate in alternative scan strategies beyond sample scanning, providing greater context for readers and ensuring a more balanced comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we now include a demonstration of live-cell imaging to directly validate AltairLSFM’s suitability for dynamic biological applications. We also explicitly discuss the temporal resolution of the system in the main text (see Optoelectronic Design of Altair-LSFM), where we detail both software- and hardware-related limitations. Specifically, we evaluate the maximum imaging speed achievable with Altair-LSFM in conjunction with our open-source control software, navigate.

      For simplicity and reduced optoelectronic complexity, the current implementation powers the piezo through the ASI Tiger Controller, which modestly reduces its bandwidth. Nonetheless, for a 100 µm stroke typical of light-sheet imaging, we achieved sufficient performance to support volumetric imaging at most biologically relevant timescales. These results, along with additional discussion of the design trade-offs and performance considerations, are now included in the revised manuscript and expanded upon in the supplementary material.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a basic understanding of (or willingness to learn) optics, electronics, and instrumentation. Such a barrier exists for all open-source microscopes, and our goal is not to eliminate this requirement entirely but to substantially reduce the technical and logistical challenges that typically accompany the construction of custom light-sheet systems.

      Importantly, no machining experience or in-house fabrication capabilities are required. Users can simply submit the provided CAD design files and specifications directly to commercial vendors for fabrication. We have made this process as straightforward as possible by supplying detailed build instructions, recommended materials, and vendor-ready files through our GitHub repository. Our dissemination strategy draws inspiration from other successful open-source projects such as mesoSPIM, which has seen widespread adoption—over 30 implementations worldwide—through a similar model of exhaustive documentation, open-source software, and community support via user meetings and workshops.

      We also recognize that documentation alone cannot fully replace hands-on experience. To further lower barriers to adoption, we are actively working with commercial vendors to streamline procurement and assembly, and Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant that provides resources for hosting workshops, offering real-time community support, and developing supplementary training materials.

      In the revised manuscript, we now expand the Discussion to explicitly acknowledge these implementation considerations and to outline our ongoing efforts to support a broad and diverse user base, ensuring that laboratories with varying levels of technical expertise can successfully adopt and maintain the Altair-LSFM platform.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We thank the reviewer for this insightful comment and agree that our original language regarding adaptability may have overstated the degree to which Altair-LSFM can be modified without prior experience. It was not our intention to imply that the system can be easily redesigned by users with limited technical background. Meaningful adaptations of the optical or mechanical design do require expertise in optical layout, optomechanical design, and alignment.

      That said, for laboratories with such expertise, we aim to facilitate modifications by providing comprehensive resources—including detailed Zemax simulations, complete CAD models, and alignment documentation. These materials are intended to reduce the development burden for expert users seeking to tailor the system to specific experimental requirements, without necessitating a complete re-optimization of the optical path from first principles.

      In the revised manuscript, we clarify this point and temper our language regarding adaptability to better reflect the realistic scope of customization. Specifically, we now state in the Discussion: “For expert users who wish to tailor the instrument, we also provide all Zemax illumination-path simulations and CAD files, along with step-by-step optimization protocols, enabling modification and re-optimization of the optical system as needed.” This revision ensures that readers clearly understand that Altair-LSFM is designed for reproducibility and straightforward assembly in its default configuration, while still offering the flexibility for modification by experienced users.

      Reviewer #3 (Public review):

      Summary: 

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging. The system is designed for ease of assembly and use, incorporating a custommachined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells. The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy. Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      We thank the reviewer for their thoughtful and positive assessment of our work. We appreciate their recognition of Altair-LSFM’s design and performance, including its ability to achieve high-resolution, imaging throughout a 266-micron field of view. While Altair-LSFM approaches the practical limits of diffraction-limited performance, it does not exceed the fundamental diffraction limit; rather, it achieves near-theoretical resolution through careful optical optimization, beam shaping, and alignment. We are grateful for the reviewer’s acknowledgment of the accessibility and comprehensive documentation that make this system broadly implementable.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity.

      At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      We thank the reviewer for their thoughtful and encouraging comments. We are pleased that the technical innovation, optical performance, and accessibility of Altair-LSFM were recognized. Our goal from the outset was to develop a diffraction-limited, high-resolution light-sheet system that balances optical performance with reproducibility and ease of implementation. We are also pleased that the use of precisionmachined baseplates was recognized as a practical and effective strategy for achieving performance while maintaining ease of assembly.

      Weaknesses: 

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signalto-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we have significantly expanded our discussion of different light-sheet systems to provide clearer quantitative and conceptual context for Altair-LSFM. These comparisons are based on values reported in the literature, as we do not have access to many of these instruments (e.g., DaXi, diSPIM, or commercial and open-source variants of LLSM), and a direct experimental comparison is beyond the scope of this work.

      We note that while quantitative parameters such as signal-to-noise ratio are important, they are highly sample-dependent and strongly influenced by imaging conditions, including fluorophore brightness, camera characteristics, and filter bandpass selection. For this reason, we limited our comparison to more general image-quality metrics—such as light-sheet thickness, resolution, and field of view—that can be reliably compared across systems.

      Finally, per the reviewer’s recommendation, we have added additional discussion clarifying the differences between dual-objective and single-objective light-sheet architectures, outlining their respective strengths, limitations, and suitability for different experimental contexts.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We fully agree that environmental control is critical for live-cell imaging applications. In the revised manuscript, we now describe the design and implementation of a temperature-regulated sample chamber in Supplementary Note 2, which maintains stable imaging conditions through the use of integrated heating elements and thermocouples. This approach enables precise temperature control while minimizing thermal gradients and optical drift. For pH stabilization, we recommend the use of 10–25 mM HEPES in place of CO₂ regulation, consistent with established practice for most light-sheet systems, including the initial variant of LLSM. Although full humidity and CO₂ control are not readily implemented in dual-objective configurations, we note that single-objective designs such as OPM are inherently compatible with commercial environmental chambers and avoid these constraints. Together, these additions clarify how environmental control can be achieved within Altair-LSFM and situate its capabilities within the broader LSFM design space.

      (3) System cost and data storage cost: While the system presented has the advantage of being opensource, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We agree that cost considerations are critical for adoption in academic environments. We would also like to clarify that the quoted $150k includes the optical table and laser source. In the revised manuscript, Supplementary Note 1 now includes an expanded discussion of cost–performance trade-offs and potential paths for cost reduction.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      In the revised manuscript, we now include Supplementary Note 4, which provides a high-level discussion of data storage needs, approximate costs, and practical strategies for managing large datasets generated by light-sheet microscopy. This section offers general guidance—including file-format recommendations, and cost considerations—but we note that actual costs will vary by institution and contractual agreements.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) A picture, or full CAD design of the complete instrument, should be included as a main figure.

      A complete CAD rendering of the microscope is now provided in Supplementary Figure 4.

      (2) There is no quantitative comparison of the effects of the tilting resonant galvo; only a cartoon, a figure should be included.

      The cartoon was intended purely as an educational illustration to conceptually explain the role of the tilting resonant galvo in shaping and homogenizing the light sheet. To clarify this intent, we have revised both the figure legend and corresponding text in the main manuscript. For readers seeking quantitative comparisons, we now reference the original study that provides a detailed analysis of this optical approach, as well as a review on the subject.

      (3) Description of L4 is missing in the Figure 1 caption.

      Thank you for catching this omission. We have corrected it.

      (4) The beam profiles in Figures 1c and 3a, please crop and make the image bigger so the profile can be appreciated. The PSFs in Figure 3c-e should similarly be enlarged and presented using a dynamic range/LUT such that any aberrations can be appreciated.

      In Figure 1c, our goal was to qualitatively illustrate the uniformity of the light-sheet across the full field of view, while Figure 1d provided the corresponding quantitative cross-section. To improve clarity, we have added an additional figure panel offering a higher-magnification, localized view of the light-sheet profile. For Figure 3c–e, we have enlarged the PSF images and adjusted the display range to better convey the underlying signal and allow subtle aberrations to be appreciated.

      (5) It is unclear why LLSM is being used as the gold standard, since in its current commercial form, available from Zeiss, it is a turn-key system designed for core facilities. The original LLSM is also a versatile instrument that provides much more than the square lattice for illumination, including structured illumination, hexagonal lattices, live-cell imaging, wide-field illumination, different scan modes, etc. These additional features are not even mentioned when compared to the Altair-LSFM. If a comparison is to be provided, it should be fair and balanced. Furthermore, as outlined in the public review, anecdotal statements on "most used", "difficult to align", or "unstable" should not be provided without data.

      In the revised manuscript, we have carefully removed anecdotal statements and, where appropriate, replaced them with quantitative or verifiable information. For instance, we now explicitly report that the square lattice was used in 16 of the 20 figure subpanels in the original LLSM publication, and we include a proxy for optical complexity based on the number of optical elements requiring alignment in each system.

      We also now clearly distinguish between the original LLSM design—which supports multiple illumination and scanning modes—and its subsequent commercial variants, including the ZEISS Lattice Lightsheet 7, which prioritizes stability and ease of use over configurational flexibility (see Supplementary Note 3).

      (6) The authors should recognize that implementing custom optics, no matter how well designed, is a big barrier to cross for most cell biology labs.

      We fully understand and now acknowledge in the main text that implementing custom optics can present a significant barrier, particularly for laboratories without prior experience in optical system assembly. However, similar challenges were encountered during the adoption of other open-source microscopy platforms, such as mesoSPIM and OpenSPIM, both of which have nonetheless achieved widespread implementation. Their success has largely been driven by exhaustive documentation, strong community support, and standardized design principles—approaches we have also prioritized in Altair-LSFM. We have therefore made all CAD files, alignment guides, and detailed build documentation publicly available and continue to develop instructional materials and community resources to further reduce the barrier to adoption.

      (7) Statements on "hands on workshops" though laudable, may not be appropriate to include in a scientific publication without some documentation on the influence they have had on implanting the microscope.

      We understand the concern. Our intention in mentioning hands-on workshops was to convey that the dissemination effort is supported by an NIH Biomedical Technology Development and Dissemination grant, which includes dedicated channels for outreach and community engagement. Nonetheless, we agree that such statements are not appropriate without formal documentation of their impact, and we have therefore removed this text from the revised manuscript.

      (8) It is claimed that the microscope is "reliable" in the discussion, but with no proof, long-term stability should be assessed and included.

      Our experience with Altair-LSFM has been that it remains well-aligned over time—especially in comparison to other light-sheet systems we worked on throughout the last 11 years—we acknowledge that this assessment is anecdotal. As such, we have omitted this claim from the revised manuscript.

      (9) Due to the reliance on anecdotal statements and comparisons without proof to other systems, this paper at times reads like a brochure rather than a scientific publication. The authors should consider editing their manuscript accordingly to focus on the technical and quantifiable aspects of their work.

      We agree with the reviewer’s assessment and have revised the manuscript to remove anecdotal comparisons and subjective language. Where possible, we now provide quantitative metrics or verifiable data to support our statements.

      Reviewer #3 (Recommendations for the authors):

      Other minor points that could improve the manuscript (although some of these points are explained in the huge supplementary manual): 

      (1) The authors explain thoroughly their design, and they chose a sample-scanning method. I think that a brief discussion of the advantages and disadvantages of such a method over, for example, a laserscanning system (with fixed sample) in the main text will be highly beneficial for the users.

      In the revised manuscript, we now include a brief discussion in the main text outlining the advantages and limitations of a sample-scanning approach relative to a light-sheet–scanning system. Specifically, we note that for thin, adherent specimens, sample scanning minimizes the optical path length through the sample, allowing the use of more tightly focused illumination beams that improve axial resolution. We also include a new supplementary figure illustrating how this configuration reduces the propagation length of the illumination light sheet, thereby enhancing axial resolution.

      (2) The authors justify selecting a 0.6 NA illumination objective over alternatives (e.g., Special Optics), but the manuscript would benefit from a more quantitative trade-off analysis (beam waist, working distance, sample compatibility) with other possibilities. Within the objective context, a comparison of the performances of this system with the new and upcoming single-objective light-sheet methods (and the ones based also on optical refocusing, e.g., DAXI) would be very interesting for the goodness of the manuscript.

      In the revised manuscript, we now provide a quantitative trade-off analysis of the illumination objectives in Supplementary Note 1, including comparisons of beam waist, working distance, and sample compatibility. This section also presents calculated point spread functions for both the 0.6 NA and 0.67 NA objectives, outlining the performance trade-offs that informed our design choice. In addition, Supplementary Note 3 now includes a broader comparison of Altair-LSFM with other light-sheet modalities, including diSPIM, ASLM, and OPM, to further contextualize the system’s capabilities within the evolving light-sheet microscopy landscape.

      (3) The modularity of the system is implied in the context of the manuscript, but not fully explained. The authors should specify more clearly, for example, if cameras could be easily changed, objectives could be easily swapped, light-sheet thickness could be tuned by changing cylindrical lens, how users might adapt the system for different samples (e.g., embryos, cleared tissue, live imaging), .etc, and discuss eventual constraints or compatibility issues to these implementations.

      Altair-LSFM was explicitly designed and optimized for imaging live adherent cells, where sample scanning and short light-sheet propagation lengths provide optimal axial resolution (Supplementary Note 3). While the same platform could be used for superficial imaging in embryos, systems implementing multiview illumination and detection schemes are better suited for such specimens. Similarly, cleared tissue imaging typically requires specialized solvent-compatible objectives and approaches such as ASLM that maximize the field of view. We have now added some text to the Design Principles section that explicitly state this.

      Altair-LSFM offers varying levels of modularity depending on the user’s level of expertise. For entry-level users, the illumination numerical aperture—and therefore the light-sheet thickness and propagation length—can be readily adjusted by tuning the rectangular aperture conjugate to the back pupil of the illumination objective, as described in the Design Principles section. For mid-level users, alternative configurations of Altair-LSFM, including different detection objectives, stages, filter wheels, or cameras, can be readily implemented (Supplementary Note 1). Importantly, navigate natively supports a broad range of hardware devices, and new components can be easily integrated through its modular interface. For expert users, all Zemax simulations, CAD models, and step-by-step optimization protocols are openly provided, enabling complete re-optimization of the optical design to meet specific experimental requirements.

      (4) Resolution measurements before and after deconvolution are central to the performance claim, but the deconvolution method (PetaKit5D) is only briefly mentioned in the main text, it's not referenced, and has to be clarified in more detail, coherently with the precision of the supplementary information. More specifically, PetaKit5D should be referenced in the main text, the details of the deconvolution parameters discussed in the Methods section, and the computational requirements should also be mentioned. 

      In the revised manuscript, we now provide a dedicated description of the deconvolution process in the Methods section, including the specific parameters and algorithms used. We have also explicitly referenced PetaKit5D in the main text to ensure proper attribution and clarity. Additionally, we note the computational requirements associated with this analysis in the same section for completeness.

      (5)  Image post-processing is not fully explained in the main text. Since the system is sample-scanning based, no word in the main text is spent on deskewing, which is an integral part of the post-processing to obtain a "straight" 3D stack. Since other systems implement such a post-processing algorithm (for example, single-objective architectures), it would be beneficial to have some discussion about this, and also a brief comparison to other systems in the main text in the methods section. 

      In the revised manuscript, we now explicitly describe both deskewing (shearing) and deconvolution procedures in the Alignment and Characterization section of the main text and direct readers to the Methods section. We also briefly explain why the data must be sheared to correct for the angled sample-scanning geometry for LLSM and Altair-LSFM, as well as both sample-scanning and laser-scanning-variants of OPMs.

      (6) A brief discussion on comparative costs with other systems (LLSM, dispim, etc.) could be helpful for non-imaging expert researchers who could try to implement such an optical architecture in their lab.

      Unfortunately, the exact costs of commercial systems such as LLSM or diSPIM are typically not publicly available, as they depend on institutional agreements and vendor-specific quotations. Nonetheless, we now provide approximate cost estimates in Supplementary Note 1 to help readers and prospective users gauge the expected scale of investment relative to other advanced light-sheet microscopy systems.

      (7) The "navigate" control software is provided, but a brief discussion on its advantages compared to an already open-access system, such as Micromanager, could be useful for the users.

      In the revised manuscript, we now include Supplementary Note 5 that discusses the advantages and disadvantages of different open-source microscope control platforms, including navigate and MicroManager. In brief, navigate was designed to provide turnkey support for multiple light-sheet architectures, with pre-configured acquisition routines optimized for Altair-LSFM, integrated data management with support for multiple file formats (TIFF, HDF5, N5, and Zarr), and full interoperability with OMEcompliant workflows. By contrast, while Micro-Manager offers a broader library of hardware drivers, it typically requires manual configuration and custom scripting for advanced light-sheet imaging workflows.

      (8) The cost and parts are well documented, but the time and expertise required are not crystal clear.Adding a simple time estimate (perhaps in the Supplement Section) of assembly/alignment/installation/validation and first imaging will be very beneficial for users. Also, what level of expertise is assumed (prior optics experience, for example) to be needed to install a system like this? This can help non-optics-expert users to better understand what kind of adventure they are putting themselves through.

      We thank the reviewer for this helpful suggestion. To address this, we have added Supplementary Table S5, which provides approximate time estimates for assembly, alignment, validation, and first imaging based on the user’s prior experience with optical systems. The table distinguishes between novice (no prior experience), moderate (some experience using but not assembling optical systems), and expert (experienced in building and aligning optical systems) users. This addition is intended to give prospective builders a realistic sense of the time commitment and level of expertise required to assemble and validate AltairLSFM.

      Minor things in the main text:

      (1) Line 109: The cost is considered "excluding the laser source". But then in the table of costs, you mention L4cc as a "multicolor laser source", for 25 K. Can you explain this better? Are the costs correct with or without the laser source? 

      We acknowledge that the statement in line 109 was incorrect—the quoted ~$150k system cost does include the laser source (L4cc, listed at $25k in the cost table). We have corrected this in the revised manuscript.

      (2) Line 113: You say "lateral resolution, but then you state a 3D resolution (230 nm x 230 nm x 370 nm). This needs to be fixed.

      Thank you, we have corrected this.

      (3) Line 138: Is the light-sheet uniformity proven also with a fluorescent dye? This could be beneficial for the main text, showing the performance of the instrument in a fluorescent environment.

      The light-sheet profiles shown in the manuscript were acquired using fluorescein to visualize the beam. We have revised the main text and figure legends to clearly state this.

      (4) Line 149: This is one of the most important features of the system, defying the usual tradeoff between light-sheet thickness and field of view, with a regular Gaussian beam. I would clarify more specifically how you achieve this because this really is the most powerful takeaway of the paper.

      We thank the reviewer for this key observation. The ability of Altair-LSFM to maintain a thin light sheet across a large field of view arises from diffraction effects inherent to high NA illumination. Specifically, diffraction elongates the PSF along the beam’s propagation direction, effectively extending the region over which the light sheet remains sufficiently thin for high-resolution imaging. This phenomenon, which has been the subject of active discussion within the light-sheet microscopy community, allows Altair-LSFM to partially overcome the conventional trade-off between light-sheet thickness and propagation length. We now clarify this point in the main text and provide a more detailed discussion in Supplementary Note 3, which is explicitly referenced in the discussion of the revised manuscript.

      (5) Line 171: You talk about repeatable assembly...have you tried many different baseplates? Otherwise, this is a complicated statement, since this is a proof-of-concept paper. 

      We thank the reviewer for this comment. We have not yet validated the design across multiple independently assembled baseplates and therefore agree that our previous statement regarding repeatable assembly was premature. To avoid overstating the current level of validation, we have removed this statement from the revised manuscript.

      (6) Line 187: same as above. You mention "long-term stability". For how long did you try this? This should be specified in numbers (days, weeks, months, years?) Otherwise, it is a complicated statement to make, since this is a proof-of-concept paper.

      We also agree that referencing long-term stability without quantitative backing is inappropriate, and have removed this statement from the revised manuscript.

      (7) Line 198: "rapid z-stack acquisition. How rapid? Also, what is the limitation of the galvo-scanning in terms of the imaging speed of the system? This should be noted in the methods section.

      In the revised manuscript, we now clarify these points in the Optoelectronic Design section. Specifically, we explicitly note that the resonant galvo used for shadow reduction operates at 4 kHz, ensuring that it is not rate-limiting for any imaging mode. In the same section, we also evaluate the maximum acquisition speeds achievable using navigate and report the theoretical bandwidth of the sample-scanning piezo, which together define the practical limits of volumetric acquisition speed for Altair-LSFM.

      (8) Line 234: Peta5Kit is discussed in the additional documentation, but should be referenced here, as well.

      We now reference and cite PetaKit5D.

      (9) Line 256: "values are on par with LLSM", but no values are provided. Some details should also be provided in the main text.

      In the revised manuscript, we now provide the lateral and axial resolution values originally reported for LLSM in the main text to facilitate direct comparison with Altair-LSFM. Additionally, Supplementary Note 3 now includes an expanded discussion on the nuances of resolution measurement and reporting in lightsheet microscopy.

      Figures:

      (1) Figure 1 could be implemented with Figure 3. They're both discussing the validation of the system (theoretically and with simulations), and they could be together in different panels of the same figure. The experimental light-sheet seems to be shown in a transmission mode. Showing a pattern in a fluorescent dye could also be beneficial for the paper.

      In Figure 1, our goal was to guide readers through the design process—illustrating how the detection objective’s NA sets the system’s resolution, which defines the required pixel size for Nyquist sampling and, in turn, the field of view. We then use Figure 1b–c to show how the illumination beam was designed and simulated to achieve that field of view. In contrast, Figure 3 presents the experimental validation of the illumination system. To avoid confusion, we now clarify in the text that the light sheet shown in Figure 3 was visualized in a fluorescein solution and imaged in transmission mode. While we agree that Figures 1 and 3 both serve to validate the system, we prefer to keep them as separate figures to maintain focus within each panel. We believe this organization better supports the narrative structure and allows readers to digest the theoretical and experimental validations independently.

      (2) Figure 3: Panels d and e show the same thing. Why would you expect that xz and yz profiles should be different? Is this due to the orientation of the objectives towards the sample?

      In Figure 3, we present the PSF from all three orthogonal views, as this provides the most transparent assessment of PSF quality—certain aberration modes can be obscured when only select perspectives are shown. In principle, the XZ and YZ projections should be equivalent in a well-aligned system. However, as seen in the XZ projection, a small degree of coma is present that is not evident in the YZ view. We now explicitly note this observation in the revised figure caption to clarify the difference between these panels.

      (3) Figure 4's single boxes lack a scale bar, and some of the Supplementary Figures (e.g. Figure 5) lack detailed axis labels or scale bars. Also, in the detailed documentation, some figures are referred to as Figure 5. Figure 7 or, for example, figure 6. Figure 8, and this makes the cross-references very complicated to follow

      In the revised manuscript, we have corrected these issues. All figures and supplementary figures now include appropriate scale bars, axis labels, and consistent formatting. We have also carefully reviewed and standardized all cross-references throughout the main text and supplementary documentation to ensure that figure numbering is accurate and easy to follow.

    1. eLife Assessment

      In this study, the authors investigate the role of ZMAT3, a p53 target gene, in tumor suppression and RNA splicing regulation. Using quantitative proteomics, the authors uncover that ZMAT3 knockout leads to upregulation of HKDC1, a gene linked to mitochondrial respiration, and that ZMAT3 suppresses HKDC1 expression by inhibiting c-JUN-mediated transcription. This set of convincing evidence reveals a fundamental mechanism by which ZMAT3 contributes to p53-driven tumor suppression by regulating mitochondrial respiration.

    2. Reviewer #1 (Public review):

      Summary:

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53-mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and showed markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration. The data are novel, compelling and very interesting.

      Comments on revisions:

      The authors have done a thorough job addressing my comments. This manuscript is quite strong and will be highly cited for its novelty and rigor.

    3. Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.

      Comments on revisions:

      The authors have mostly addressed to the concerns raised previously by this reviewer. The lack of functional assays made the reported findings mostly mechanistic with no clear biological context.

      The present manuscript is certainly improved compared to the previous version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:  

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration.  

      Strengths:

      The authors use multiple orthogonal approaches to test the majority of their findings.  The authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration.  

      Weaknesses:

      Some indication as to whether other c-JUN target genes are also regulated by ZMAT3 would improve the broad relevance of the authors' findings.  

      We thank the reviewer for the kind words and the thoughtful suggestion. As recommended, to identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.  

      Strengths:

      Mechanistically, ZMAT3 suppresses HKDC1 transcription by sequestering JUN and preventing its binding to the HKDC1 promoter, resulting in reduced HKDC1 expression. Conversely, p53 mutation leads to ZMAT3 downregulation and HKDC1 overexpression, thereby promoting increased mitochondrial respiration and proliferation. This mechanism is novel; however, the authors should address several points.  

      Weaknesses:

      The authors conduct mechanistic experiments (e.g., transcript and protein quantification, luciferase assays) to demonstrate regulatory interactions between p53, ZMAT3, JUN, and HKDC1. These findings should be supported with functional assays, such as proliferation, apoptosis, or mitochondrial respiration analyses.  

      We thank the reviewer for appreciating our work and for this valuable suggestion. The reviewer rightly pointed out that supporting the regulatory interactions between p53, ZMAT3, JUN and HKDC1 with functional assays such as proliferation, apoptosis and mitochondrial respiration analyses would strengthen our mechanistic data. During the revision of our manuscript, we attempted to address this point by performing simultaneously knockdown of these proteins; however, we observed substantial toxicity under these conditions, making the functional assays technically unfeasible. This outcome was not unexpected as knockdown of JUN or HKDC1 individually results in growth defects.  We therefore focused our efforts on addressing the recommendation for authors.  

      Reviewer #3 (Public review):

      Summary:  

      In their manuscript, Kumar et al. investigate the mechanisms underlying the tumor suppressive function of the RNA binding protein ZMAT3, a previously described tumor suppressor in the p53 pathway. To this end, they use RNA-sequencing and proteomics to characterize changes in ZMAT3-deficient cells, leading them to identify the hexokinase HKDC1 as upregulated with ZMAT3 deficiency first in colorectal cancer cells, then in other cell types of both mouse and human origin. This increase in HKDC1 is associated with increased mitochondrial respiration. As ZMAT3 has been reported as an RNA-binding and DNA-binding protein, the authors investigated this via PAR-CLIP and ChIP-seq but did not observe ZMAT3 binding to HKDC1 pre-mRNA or DNA. Thus, to better understand how ZMAT3 regulates HKDC1, the authors used quantitative proteomics to identify ZMAT3interacting proteins. They identified the transcription factor JUN as a ZMAT3-interacting protein and showed that JUN promotes the increased HKDC1 RNA expression seen with ZMAT3 inactivation. They propose that ZMAT3 inhibits JUN-mediated transcriptional induction of HKDC1 as a mechanism of tumor suppression. This work uncovers novel aspects of the p53 tumor suppressor pathway.  

      Strengths:

      This novel work sheds light on one of the most well-established yet understudied p53 target genes, ZMAT3, and how it contributes to p53's tumor suppressive functions. Overall, this story establishes a p53-ZMAT3-HKDC1 tumor suppressive axis, which has been strongly substantiated using a variety of orthogonal approaches, in different cell lines and with different data sets.  

      Weaknesses:

      While the role of p53 and ZMAT3 in repressing HKDC1 is well substantiated, there is a gap in understanding how ZMAT3 acts to repress JUN-driven activation of the HKDC1 locus. How does ZMAT3 inhibit JUN binding to HKDC1? Can targeted ChIP experiments or RIP experiments be used to make a more definitive model? Can ZMAT3 mutants help to understand the mechanisms? Future work can further establish the mechanisms underlying how ZMAT3 represses JUN activity.  

      We thank the reviewer for the kind words and the invaluable suggestion. The reviewer has an excellent point regarding how ZMAT3 inhibits JUN binding to HKDC1 locus.Our new data included in the revised manuscript show that the ZMAT3-JUN interaction is lost in the presence of DNase or RNase, indicating that the interaction requires both DNA and RNA. This result suggests that ZMAT3 and JUN  form an RNA-dependent, chromatin- associated complex. Although not directly investigated in our study, this finding is consistent with emerging evidence that RBPs can function as chromatin-associated cofactors in transcription. For example, functional interplay between transcription factor YY1 and the RNA binding protein RBM25 co-regulates a broad set of genes, where RBM25 appears to engage promoters first and then recruit YY1, with RNA proposed to guide target recognition. We have discussed this possibility in the discussion section of revised manuscript (page 13). We agree that future work using ZMAT3 mutants and targeted ChIP or RIP assays will be valuable to delineate the precise mechanism by which ZMAT3 inhibits JUN binding to its target genes.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. HKDC1 is emerging as an important player in human cancer. Importantly, the authors show both acute (gene silencing) and chronic (CRISPR KO) approaches to silence ZMAT3, and they do this in several cell lines. Notably, they show that ZMAT3 silencing leads to impaired mitochondrial respiration, in a manner that is rescued by silencing of HKDC1. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter (intron 1), and altered mitochondrial respiration. The findings are compelling, and the authors use multiple orthogonal approaches to test most findings. And the authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration. As such, enthusiasm is high for this manuscript. 

      Addressing the following question would improve the manuscript. 

      It is not clear how many (other) c-JUN target genes might be impacted by ZMAT3; other important c-JUN targets in cancer include GLS1, WEE1, SREBP1, GLUT1, and CD36, so there could be a global impact on metabolism in ZMAT3 KO cells. Can the authors perform qPCR on these targets in ZMAT3 WT and KO cells and see if these target genes are differentially expressed? 

      We thank the reviewer for this thoughtful suggestion. As recommended, we examined the expression of key c-JUN target genes GLS1 (also known as GLS), WEE1, SREBP1, GLUT1, and CD36 in ZMAT3-WT and ZMAT3-KO cells. We first analyzed publicly available JUN ChIP-Seq data from three ENCODE cell lines, which revealed JUN binding peaks near or upstream of exon 1 for GLS1/GLS, SREBP1, and SLC2A1/GLUT1, but not for WEE1 or CD36 (Appendix 1, panels A-E). Based on these results, we performed RT-qPCR for GLS1/GLS, SREBP1 and SLC2A1 in ZMAT3-WT and ZMAT3-KO cells, with or without JUN knockdown. GLS mRNA was significantly reduced upon JUN knockdown in both ZMAT3-WT cells and ZMAT3-KO cells, but it was not upregulated upon loss of ZMAT3, indicating that GLS is a JUN target gene, but it is not regulated by ZMAT3. In contrast, SREBF1 or SLC2A1 expression remained unchanged upon ZMAT3 loss or JUN knockdown (Appendix 1 panels F-H). These data suggest that the ZMAT3/JUN axis does not regulate the expression of these genes.

      To identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Minor concerns: 

      (1) Line 150: observed a modest. 

      (2) Line 159: Figure 2G appears to be inaccurately cited. 

      (3) Line 191: assays to measure. 

      We thank the reviewer for pointing these out. These minor concerns have been addressed in the text.  

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 1E: Can the authors clarify what the numbers on the left side of the chart represent? Do they refer to the scale?

      The numbers on the Y-axis represent the -log 10 (p- value) where higher values correspond to more significant changes. For visualization purposes, the significant changes are shown in red.  

      (2) Page 5, line 123: The sentence "As expected, ZMAT3 mRNA levels were decreased in the ZMAT3-KO cells" is redundant, as this information was already mentioned on page 4, line 103.  

      We thank the reviewer for noticing this redundancy. The repeated sentence has been removed in the revised manuscript.  

      (3) Page 5: The authors state: "Transcriptome-wide, upon loss of ZMAT3, 606 genes were significantly up-regulated (adj. p < 0.05 and 1.5-fold change) and 552 were down-regulated, with a median fold change of 1.76 and 0.55 for the up- and down-regulated genes, respectively." Later, on page 6, they write: "Comparison of the RNA-seq data from ZMAT3WT vs. ZMAT3-KO and CTRL siRNA vs. ZMAT3 siRNA-transfected HCT116 cells indicated that 1023 genes were commonly up-regulated, and 1042 were commonly down-regulated upon ZMAT3 loss (Figure S2C and D)." Why is the number of deregulated transcripts higher in the ZMAT3-WT vs. ZMAT3-KO comparison than in the CTRL siRNA vs. ZMAT3 siRNA comparison? Are the authors using less stringent criteria in the second analysis? This point should be clarified. 

      We thank the reviewer for highlighting this point. The reviewer is correct that less stringent criteria were used in the second analysis. On page 5, we applied stringent thresholds (adjusted p-value < 0.05 and 1.5-fold change) to identify high-confidence transcriptome-wide changes upon ZMAT3 loss. In contrast, for the comparison of both RNA-seq datasets (ZMAT3-WT vs. KO and siCTRL vs. siZMAT3), we included genes that were consistently up- or downregulated, without applying a fold change threshold, focusing instead on significantly altered genes (adjusted p < 0.05) in both datasets. This allowed us to capture broader and more reproducible transcriptomic changes that occur upon ZMAT3 depletion, including modest but significant changes upon transient ZMAT3 knockdown with siRNAs. We have now clarified this distinction on page 6 of the revised manuscript.

      (4) Figures 2B and 2E: The authors should provide quantification of HKDC1 protein levels normalized to a loading control. In addition, they should assess HKDC1 protein abundance upon ZMAT3 interference in SWI1222 and HCEC1CT cells, not just in HepG2 and HCT116 cells. 

      We thank the reviewer for this suggestion. We have now quantified all immunoblots presented throughout the manuscript, including those shown in Figures 2B and 2E, and all other figures containing protein analyses. Band intensities were quantified using ImageJ densitometry and normalized to GAPDH as the loading control. In addition, as suggested, we examined HKDC1 protein levels following ZMAT3 knockdown in two additional cell lines, SW1222 and HCEC-1CT. Consistent with our observations in HepG2 and HCT116 cells, ZMAT3 depletion led to increased HKDC1 protein levels in both SW1222 and HCEC-1CT cells. These new data are now included in Figure 2-figure supplement 1F and G. We have updated the Results section, figure legends, and figures to reflect these additions.

      (5) Figure 3A: It is unclear which gene was knocked out in the "KO cells." The authors should clearly specify this.

      We thank the reviewer for pointing this out. We have now updated Figure 3A.

      (6) Figure 3D: The result appears counterintuitive in comparison to Figure 3E. Why does HKDC1 knockdown reduce cell confluency more in ZMAT3 KO cells than in control (ZMAT3 wild-type) cells? The authors should explain this discrepancy more clearly.

      We thank the reviewer for this insightful comment. As shown in Figure 3D and 3E, knockdown of HKDC1 resulted in a greater decrease in proliferation in ZMAT3-KO cells than in ZMAT3-WT cells. This observation was indeed unexpected, given that HKDC1 acts downstream of ZMAT3. One possible explanation is that elevated HKDC1 expression in ZMAT3-KO cells increases their reliance on HKDC1 for sustaining proliferation, and that HKDC1 may also participate in additional pathways in ZMAT3-KO cells. Consequently, transient knockdown of HKDC1 in ZMAT3-KO cells would have a more pronounced effect on proliferation due to their increased dependency on HKDC1 activity. In contrast, ZMAT3WT cells which express lower levels of HKDC1 are less dependent on its function and therefore less sensitive to its depletion. We have now clarified this point on page 8 of the revised manuscript.  

      Reviewer #3 (Recommendations for the authors):  

      (1) Why do the authors start their analysis by knocking out the p53 response element in Zmat3? That should be clarified. In addition, since clones were picked after CRISPR KO of Zmat3, were experiments done to confirm that p53 signaling was not disrupted?

      We thank the reviewer for this thoughtful question. We began our study by targeting the p53 response element (p53RE) in the ZMAT3 locus because the basal expression of ZMAT3 is regulated by p53 (Muys, Bruna R. et al., Genes & Development, 2021). Deleting the p53RE therefore allowed us to markedly reduce ZMAT3 expression without disrupting the entire ZMAT3 locus. We have clarified this rationale on page 4 of the revised manuscript. To ensure that p53 signaling was not affected by this modification, we verified that canonical p53 targets such as p21 were equivalently induced in both ZMAT3WT and KO cells following Nutlin treatment and that p53 induction was unchanged(Figure 4F and Figure 1 – figure supplement 1A).

      (2) Throughout the text, many immunoblots are used to validate the knockouts and knockdowns used, but some clarification is needed. In Figure S1A, the Zmat3-WT sample seems to have significantly more p53 than the Zmat3 KO sample. Does Zmat3 KO compromise p53 levels in other experiments? It would be good to understand if Zmat3 affects p53 function by affecting its levels. Also, the p21 blot is overloaded.

      We thank the reviewer for this helpful observation. To determine whether ZMAT3 knockout affects p53 function by affecting its levels, we repeated the experiment three independent times. Western blots from these biological replicates, together with protein quantification, are now included in Appendix-2 and Figure 1-figure supplement 1A. These data show no significant differences in p53 or p21 induction between ZMAT3-WT and ZMAT3-KO cells following Nutlin treatment. In the revised manuscript, we have replaced the blot in Figure 1-figure supplement 1A with a more representative image from one of these replicate experiments.

      In Figure 2E, HKDC1 protein levels are not shown for the SW1222 and HCEC-1CT cell lines, 

      We thank the reviewer for this suggestion. HKDC1 protein levels in SW1222 and HCEC1-CT cells following ZMAT3 knockdown are now included as Figure 2- figure supplement 1F and 1G, together with the corresponding quantification.

      and Zmat3 does not appear as its characteristic two bands on the blot. What does this signify?

      We thank the reviewer for this observation. Endogenous ZMAT3 typically appears as two closely migrating bands on immunoblots. As shown in Figure 4D and Appendix 2A and 2B, these two bands are observed at the expected molecular weight following Nutlin treatment and are specific to ZMAT3, as they are markedly reduced in ZMAT3-KO cells. In contrast, only a single ZMAT3 band is visible in Figure 2E. This likely reflects limited resolution of the two bands in some blots rather than a biological difference.   

      (3) Why does HKDC1 knockdown only have an effect on metabolic phenotypes when ZMAT3 is gone? In Figure 3A, there does not seem to be a decrease in hexokinase activity in the siCTRL + siHKDC1 condition compared to siCTRL alone. Also, in Figure 3A, does phosphorylation activity of HKDC1 necessarily reflect glucose uptake, as stated? Additionally, in Figure 3C, there is no effect on mitochondrial respiration with siHKDC1, even though recent studies have shown a significant effect of HKDC1 on this.

      We thank the reviewer for raising these important questions. As noted, HKDC1 knockdown alone in wild-type cells (siCTRL + siHKDC1) does not significantly reduce hexokinase activity (Figure 3A). This likely reflects the low basal expression of HKDC1 in these cells. Thus, the metabolic phenotype may only become apparent when HKDC1 expression exceeds a functional threshold, as observed in ZMAT3-KO cells where HKDC1 is upregulated.

      Regarding the glucose uptake assay, HKDC1 itself is not phosphorylated; rather, it phosphorylates a non-catabolizable glucose analog, 2-deoxyglucose (2-DG) upon cellular uptake. According to the manufacturer’s protocol, intracellular 2-DG is phosphorylated by hexokinases to 2-deoxyglucose-6-phosphate (2-DG6P), which cannot be further metabolized and therefore accumulates. The accumulated 2-DG6P is quantified using a luminescence-based readout. This assay is widely used as a surrogate for glucose uptake because it reflects both glucose import and phosphorylation — the first step of glycolytic flux. As for the lack of change in mitochondrial respiration (Figure 3C), we acknowledge that some studies have reported mitochondrial roles for HKDC1 under basal conditions; however, such effects may be cell type-specific.

      (4) The emphasis on glycolysis signatures is confusing, as in the end, glycolysis does not seem to be affected by ZMAT3 status, but mitochondrial respiration is affected. Can the text be clarified to address this? It is also difficult to understand the role of oxygen consumption rate (OCR) in ZMAT3 phenotypes, as it does not fully track with proliferation. For example, ZMAT3 KD has the highest OCR, and the other conditions have similar OCRs but different proliferative rates in Figure 3D. Also, the colors used in Figure 3 to denote different genotypes change between B/C and D, which is confusing.

      We thank the reviewer for pointing out the inconsistency in the colors of the graph in Figure 2, which we have now corrected. Our data indicates that ZMAT3 regulates mitochondrial respiration without significantly affecting glycolysis. It is possible that mitochondria in ZMAT3-KO cells are oxidizing more substrates that are not produced by glycolysis. Additional work will be required to fully determine these mechanisms. We have clarified this on page 8 of the revised manuscript.      

      (5) The lack of ZMAT3 binding to RNAs in PAR-CLIP is not proof that it does not do so. A more targeted approach should be used, using individual RIP assays. The authors should also analyze the splicing of HKDC1, which could be affected by ZMAT3.

      As suggested, we performed ZMAT3 RNA IP experiments (RIP) using doxycycline-inducible HCT116-ZMAT3-FLAG cells. However, we did not observe significant enrichment of HKDC1 mRNA in the ZMAT3 IPs (Figure 5 – figure supplement 1A), consistent with previously published ZMAT3 RIP-seq data (Bersani et al, Oncotarget, 2016). These findings further support the notion that ZMAT3 does not directly bind to HKDC1 mRNA in these cells. We Accordingly, we have modified the text on page 10 of the revised manuscript.

      In addition, as suggested by the reviewer, we analyzed changes in splicing of HKDC1 pre-mRNA using rMATS in HCT116 cells by comparing our previously published RNA-seq data from siCTRL and siZMAT3-transfected HCT116 cells (Muys et al, Genes Dev, 2021). We focused on splicing events with an FDR < 0.05 and a delta PSI > |0.1| (representing at least a 10% change in splicing). The splicing analysis (data not shown) did not reveal any significant alterations in HKDC1 pre-mRNA splicing upon ZMAT3 knockdown. Corresponding text has been updated on page 10 of the revised manuscript.

      (6) The authors say that they examine JUN binding at the HKDC1 promoter several times, but they focus on intron 1 in Figure 5. They should revise the text accordingly, and they should also show JUN ChIP data traces for the whole HKDC1 locus in Figure 5C.

      We thank the reviewer for this helpful suggestion. As recommended, we have revised the text throughout the manuscript and replaced HKDC1 promoter with HKDC1 intron 1 DNA to accurately reflect our analysis, and Figure 5 now shows the JUN ChIP-seq signal across the entire HKDC1 locus.

      (7) In the ZMAT3 and JUN interaction assays, were these tested in the presence of DNAse or RNAse to determine if nucleic acids mediate the interaction?

      We thank the reviewer for this valuable suggestion. To test whether nucleic acids mediate the ZMAT3-JUN interaction, we performed ZMAT3 immunoprecipitation (IPs) in the presence or absence of DNase and RNase from doxycycline-inducible ZMAT3-FLAG expressing HCT116 cells. The ZMAT3-JUN interaction was lost upon treatment with either DNase or RNase, indicating that the interaction is mediated by nucleic acids. This data has been added in the revised manuscript (Figure 5-figure supplement 1D and on page 11).

    1. eLife Assessment

      This important study provides the first putative evidence that alteration of the Hox code in neck lateral plate mesoderm is sufficient to induce ectopic development of forelimb buds at neck level. The authors use both gain-of-function (GOF) and loss-of-function (LOF) approaches in chick embryos to test the roles of Hox paralogy group (PG) 4-7 genes in limb development. The GOF data provide strong evidence that overexpression of Hox PG6/7 genes are sufficient to induce forelimb buds at neck level. However, the experiments using dominant negative constructs are lacking some key controls that are needed to demonstrate the specificity of the LOF effect rendering the work as a whole incomplete.

    2. Reviewer #2 (Public review):

      In the original review of this manuscript, I noted that this study provides the first evidence that alteration of the Hox code in neck lateral plate mesoderm is sufficient for ectopic forelimb budding. Their finding that ectopic expression of Hoxa6 or Hoxa7 induces wing budding at neck level, a demonstration of sufficiency, is of major significance. The experiments used to test the necessity of specific Hox genes for limb budding involved overexpression of dominant negative constructs, and there were questions about whether the controls were well designed. The reviewers made several suggestions for additional experiments that would address their concerns. In their responses to those comments, the authors indicated that they would conduct those experiments, and they acknowledged the requests for further discussion of a few points.

      In the revised version of the manuscript, the authors have provided additional RNA-seq data in Table 3, which lists 221 genes that are shared between the Hoxa6-induced limb bud and normal wing bud but not the neck. This shows that the ectopic limb bud has a limb-like character. The authors also expanded the discussion of their results in the context of previous work on the mouse. These changes have improved the paper.

      The authors elected not to conduct the co-transfection experiments that were suggested to test the ability of Hoxa4/a5 to block the limb-inducing ability of Hoxa6/a7. They also chose not to conduct the additional control experiments that were suggested for the dominant negative studies. The authors' justification for not conducting these experiments is provided in the responses to reviewers.

      The paper is improved over the previous version, but the conclusions, particularly regarding the dominant negative experiments, would have been strengthened by the additional experiments that were recommended by the reviewers. Under the current publishing model for eLife, it is the authors' prerogative to decide whether to revise in accordance with the reviewers' suggestions. Therefore, it seems to me that this version of the manuscript is the definitive version that the authors want to publish, and that eLife should publish it together with the reviewers' comments and the authors' responses.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      We thank the reviewer for emphasizing the importance of appropriate controls for the dominant-negative experiments. Dominant-negative Hox constructs have been successfully and widely used in previous studies, supporting the reliability of this approach. In our experiments, electroporation of the dominant-negative constructs into the limb field produced clear and reproducible effects when compared with both unoperated embryos and embryos electroporated with a GFP control construct. The GFP construct serves as an appropriate control, as it accounts for any effects of electroporation or exogenous protein expression without altering Hox gene function. We therefore conclude that the observed phenotypes specifically reflect dominant-negative Hox activity rather than procedural artifacts.

      The absence of overt limb phenotypes in PG4–PG7 mouse mutants likely reflects both functional redundancy among Hox paralogs and the difficulty of detecting subtle limbspecific effects in bilateral, systemically affected embryos. In contrast, the chick embryo system allows unilateral gene manipulation, providing an internal control and greater sensitivity for detecting weak or localized effects that may be masked in whole-animal mouse mutants.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      We thank the reviewer for this thoughtful suggestion. We fully agree that functional redundancy among Hox paralogs is an important consideration. However, Hox gene interactions are highly context-dependent and not strictly additive. Simultaneous interference with multiple Hox groups often leads to complex or compensatory effects that are difficult to interpret mechanistically, particularly when using dominant-negative constructs that may affect overlapping transcriptional networks.

      Our current experimental design, which targets individual paralog groups, allows us to attribute observed phenotypes to specific Hox activities and to interpret the results more precisely. Moreover, as shown in previous studies, simultaneous knockdown of multiple Hox genes does not necessarily produce stronger. For these reasons, we believe that the present single–dominant-negative experiments are the most informative and sufficient for addressing the specific questions in this study.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We thank the reviewer for this insightful suggestion. However, because of the extensive functional redundancy and regulatory interdependence within the Hox network, simultaneous inhibition of Hox4 and Hox5 is unlikely to produce a simple or interpretable outcome. Previous studies have shown that combinatorial Hox manipulations can trigger compensatory changes in other Hox genes, often obscuring rather than clarifying specific relationships.

      In our study, the proposed permissive role of Hox4/5 is supported by the spatial and temporal patterns of Hox expression and by the phenotypic effects observed upon individual dominant-negative perturbations. These data together suggest that Hox4/5 establish a forelimb-competent domain, on which Hox6/7 subsequently act to promote limb outgrowth. We therefore believe that the current evidence sufficiently supports this model without necessitating the additional combined experiment, which may not provide clear mechanistic insight due to redundancy effects.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      We thank the reviewer for this important comment. We agree that Tbx5 expression alone may be not sufficient to define forelimb identity. However, in our experiments, the induced bulge displays several additional characteristics consistent with early limb identity (in pre-AER stage). First, the Tbx5 expression we observe corresponds to the stage when the limb field is already specified, not the earlier broad mesodermal phase described in other systems. Second, the induced domain also expresses Lmx1, a marker of dorsal limb mesenchyme, further supporting its limb-specific nature. Third, our RNA sequencing analysis reveals upregulation of multiple genes associated with early limb development pathways, providing molecular evidence for limb-type identity rather than non-specific mesodermal expansion. Taken together, these results strongly indicate that the induced bulge represents a forelimb-like structure rather than a generic mesodermal thickening.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We thank the reviewer for this helpful suggestion. We have analyzed the cartilage structures of the operated embryos. No skeletal elements were detected within the ectopic wing bud in the neck region. Furthermore, we did not observe any significant structural changes in the wing skeleton following loss-of-function (dnHox) experiments. These observations indicate that the ectopic bulges do not progress to form skeletal elements, consistent with their identity as early limb-like outgrowths rather than fully developed limbs.

      Reviewer #2 (Public review):

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We thank the reviewer for raising this important point regarding the specificity of the dominant-negative constructs. The dnHox constructs used in this study were generated by truncating the C-terminal region of each Hox protein, a strategy that removes the homeodomain and has been demonstrated to act as a specific dominant-negative by interfering with the corresponding Hox function without broadly affecting unrelated Hox genes. This approach has been successfully validated and used in previous work (Moreau et al., Curr. Biol. 2019), where similar constructs effectively and specifically inhibited Hox activity in the chick embryo.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here)

      We thank the reviewer for this insightful suggestion. We agree that, in principle, coelectroporation of dnHox4/5 with Hox6/7 could test the hierarchical relationship between these genes. However, due to the extensive redundancy and regulatory interdependence among Hox genes, simultaneous manipulation of multiple genes often leads to compensatory effects or complex outcomes that are difficult to interpret mechanistically. As discussed in our response to Point 3 of the reviewer 1, inhibition of only one or two Hox4/5 paralogs is unlikely to completely abolish the permissive function of this group.

      Our current data — showing that Hox6/7 gain-of-function can induce ectopic limb-like outgrowths, while dnHox4/5 and dnHox6/7 lead to reduced limb formation — already provide strong evidence for both the necessity and sufficiency of these Hox activities in forelimb positioning. We therefore believe that the existing experiments adequately support our proposed model without the need for additional combinatorial manipulations.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We thank the reviewer for this constructive suggestion. In response, we have added a table (Table 3) listing the genes expressed in both the native limb/wing bud and the Hoxa6-induced wing bud, as identified from our RNA-Seq dataset. This table provides the underlying data for the Venn diagram, heatmap, and GO analysis presented in Figure 3. We agree that including these data improves transparency and helps readers better appreciate the molecular similarity between the induced and native limb buds.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      We thank the reviewer for this important point. We have addressed this issue in our response to Reviewer 1, Point 1, and have expanded the relevant discussion in the manuscript. Briefly, we believe that the apparent discrepancy between chick and mouse results arises from both the high degree of functional redundancy among Hox paralogs and the limitations of detecting subtle limb-specific effects in systemic mouse mutants, where both sides of the embryo are equally affected. In contrast, the chick system allows unilateral gene manipulation, providing an internal control and greatly enhancing sensitivity to detect weak or localized effects. Thus, the chick embryo model can reveal subtle Hox-dependent limb-induction activities that are masked in conventional mouse knockout approaches.

    1. eLife Assessment

      This study reports useful information on the mechanisms by which a high-fat diet induces arrhythmias in the model organism Drosophila. Specifically, the authors propose that adipokinetic hormone (Akh) secretion is increased with this diet, and through binding of Akh to its receptor on cardiac neurons, arrhythmia is induced. The authors have revised their manuscript, but in some areas the evidence remains incomplete, which the authors say future studies will be directed to closing the present gaps. Nonetheless, the data presented will be helpful to those who wish to extend the research to a more complex model system, such as the mouse.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

    3. Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Comments on revisions:

      The authors have addressed my other concerns. The only outstanding issue is in regard to the following comment:

      The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high fat diet compared to normal fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      The authors state that 8 sec time windows were selected at the discretion of the imager for analysis. I don't know how to avoid bias unless the person acquiring the imaging is blinded to the condition and the analysis is also done blind. Can you comment whether data acquisition and analysis was done in a blinded fashion? If not, this should be stated as a limitation of the study.

      Drosophila heart rate is highly variable. During the recording, we were biased to choose a time window when heartbeat was fairly stable. This is a limitation of the study, which we mentioned in the revised version. We chose to use intact over “semi-intact” flies with an intention to avoid damaging the cardiac neurons. 

      Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      We thank the reviewer for the positive comments. We believe that more signaling pathways are active in the AkhR neurons and regulate rhythmic heartbeat. We are current searching for the molecules and pathways that act on the AkhR cardiac neurons to regulate the heartbeat. Thus, AkhR neuron x shall have a more profound effect. Loss of AkhR is not equivalent to AkhR neuron ablation. 

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhRexpressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      We thank the reviewer for suggesting the detailed experiments and we believe that address these points shall consolidate the results. As AkhR-Gal4 also expresses in the fat body, we set out to build a more specific driver. We planned to use split-Gal4 system (Luan et al. 2006. PMID: 17088209). The combination of pan neuronal Elav-Gal4.DBD and AkhRp65.AD shall yield AkhR neuron specific driver. We selected 2580 bp AkhR upstream DNA and cloned into pBPp65ADZpUw plasmid (Addgene plasmid: #26234). After two rounds of injection, however, we were not able to recover a transgenic line.

      We used GCaMP to record the calcium signal in the AkhR neurons. AkhR-Gal4>GCaMP has extremely high levels of fluorescence in the cardiac neurons under normal condition.

      We are screening Gal4 drivers, trying to find one line that is specific to the cardiac neurons and has a lower level of driver activity.   

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UASrpr.

      We quantified the AkhR neuron ablation and found that about 69% (n=28) showed a single ACN in AkhR-Gal4>rpr flies. It is more challenging to quantify other AkhR-expressing cells, as they are wide-spread distributed. We tried to add more copies of UAS-rpr or AkhR-Gal4, which caused developmental defects (pupa lethality). Thus, as mentioned above, we are trying to find a more specific driver for targeting the cardiac neurons.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors refer 'crop' as the functional equivalent of the human stomach. Considering the difference in their primary functions, this cannot be justified.

      In Drosophila, the crop functions analogously to the stomach in vertebrates. It is a foregut storage and preliminary processing organ that regulates food passage into the midgut. It’s more than a simple reservoir. Crop engages in enzymatic mixing, neural control, and active motility.

      Line 163 and 166, APCs are not neurons.

      Akh-producing cells (APCs) in Drosophila are neuroendocrine cells, residing in the corpora cardiaca (CC). While they produce and secrete the hormone AKH (akin to glucagon), they are not brain interneurons per se. APCs share many neuronal features (vesicular release, axon-like projections) and receive neural inputs, effectively functioning as a peripheral endocrine center.

    1. eLife Assessment

      This fundamental study is part of an impressive, large-scale effort to assess the reproducibility of published findings in the field of Drosophila immunity. In a companion article, the authors analyze 400 papers published between 1959 and 2011, and assess how many of the claims in these papers have been tested in subsequent publications. In this article, the authors report the results of validation experiments to assess a subset of the claims that, according to the literature, have not been corroborated. While the evidence reported for some of these validation studies is convincing, it remains incomplete for others.

    2. Reviewer #1 (Public review):

      Summary:

      This work revisits a substantial part of the published literature in the field of Drosophila innate immunity from 1959 to 2011. The strategy has been to restrain the analysis to some 400 articles and then to extract a main claim, two to four major claims and up to four minor claims totaling some 2000 claims overall. The consistency of these claims with the current state-of-the-art has been evaluated and reported on a dedicated Web site known as ReproSci and also in the text as well as in the 28 Supplements that report experimental verification, direct or indirect, e.g., using novel null mutants unavailable at the time, of a selected set of claims made in several articles. Of note, this review is mostly limited to the manuscript and its associated supplements and does not integrally cover the ReproSci website.

      Strengths:

      One major strength of this article is that it tackles the issue of reproducibility/consistency on a large scale. Indeed, while many investigators have some serious doubts about some results found in the literature, few have the courage, or the means and time, to seriously challenge studies, especially if published by leaders in the field. The Discussion adequately states the major limitations of the ReproSci approach, which should be kept in mind by the reader to form their own opinion.

      This study also allows investigators not familiar with the field to have a clearer understanding of the questions at stake and to derive a more coherent global picture that allows them to better frame their own scientific questions. Besides a thorough and up-to-date knowledge of the literature used to assess the consistency of the claims with our current knowledge, a merit of this study is the undertaking of independent experiments to address some puzzling findings and the evidence presented is often convincing, albeit one should keep in mind the inherent limitations as several parameters are difficult to control, especially in the field of infections, as underlined by the authors themselves. Importantly, some work of the lead author has also been re-evaluated (Supplements S2-S4). Thus, while utmost caution should be exerted, and often is, in challenging claims, even if the challenge eventually proves to be not grounded, it is valuable to point out potential controversial issues to the scientific community.

      While this is not a point of this review, it should be acknowledged that the possibility to post comments on the ReproSci website will allow further readjustments by the community in the appreciation of the literature and also of the ReproSci assessments themselves and of its complementary additional experiments.

      Weaknesses:

      Challenging the results from articles is, by its very nature, a highly sensitive issue, and utmost care should be taken when challenging claims. While the authors generally acknowledge the limitations of their approach in the main text and Supplements, there are a few instances where their challenges remain questionable and should be reassessed. This is certainly the case for Supplement S18, for which the ReproSci authors make a claim for a point that was not made in the publication under scrutiny. The authors of that study (Ramet et al., Immunity, 2001) never claimed that scavenger receptor SR-CI is a phagocytosis receptor, but that it is required for optimal binding of S2 cells to bacteria. Westlake et al. here have tested for a role of this scavenger receptor in phagocytosis, which had not been tested by Ramet et al. Thus, even though the ReproSci study brings additional knowledge to our understanding of the function of SR-CI by directly testing its involvement in phagocytosis by larval hemocytes, it did not address the major point of the Ramet et al. study, SR-CI binding to bacteria, and thus inappropriately concludes in Supplement S18 that "Contrary to (Ramet et al., 2001, Saleh et al., 2006), we find that SR-CI is unlikely to be a major Drosophila phagocytic receptor for bacteria in vivo." It follows that the results of Ramet et al. cannot be challenged by ReproSci as it did not address this program. Of note, Saleh et al. (2006) also mistakenly stated that SR-CI impaired phagocytosis in S2 cells and could be used as a positive control to monitor phagocytosis in S2 cells. Their assay appears to have actually not monitored phagocytosis but the association of FITC-labeled bacteria to S2 cells by FACS, as they did not mention quenching the fluorescence of bacteria associated with the surface with Trypan blue.

      The inference method to assess the consistency of results with current knowledge also has limitations that should be better acknowledged. At times, the argument is made that the gene under scrutiny may not be expressed at the right time according to large-scale data or that the gene product was not detected in the hemolymph by a mass-spectrometry approach. While being in theory strong arguments, some genes, for instance, those encoding proteases at the apex of proteolytic activation cascades, need not necessarily be strongly expressed and might be released by a few cells. In addition, we are often lacking relevant information on the expression of genes of interest upon specific immune challenges such as infections with such and such pathogens.

      As regards mass spectrometry, there is always the issue of sensitivity that limits the force of the argument. Our understanding of melanization remains currently limited, and methods are lacking to accurately measure the killing activity associated with the triggering of the proPO activation cascade. In this study, the authors monitor only the blackening reaction of the wound site based on a semi-quantitative measurement. They are not attempting to use other assays, such as monitoring the cleavage of proPOs into active POs or measuring PO enzymatic activity. These techniques are sometimes difficult to implement, and they suffer at times from variability. Thus, caution should be exerted when drawing conclusions from just monitoring the melanization of wounds.

      Likewise, the study of phagocytosis is limited by several factors. As most studies in the field focus on adults, the potential role of phagocytosis in controlling Gram-negative bacterial infections is often masked by the efficiency of the strong IMD-mediated systemic immune response mediated by AMPs (Hanson et al, eLife, 2019). This problem can be bypassed in rare instances of intestinal infections by Gram-negative bacteria such as Serratia marcescens (Nehme et al., PLoS Pathogens, 2007) or Pseudomonas aeruginosa (Limmer et al. PNAS, 2011), which escape from the digestive tract into the hemocoel without triggering, at least initially, the systemic immune response. It is technically feasible to monitor bacterial uptake in adults by injecting fluorescently labeled bacteria and subsequently quenching the signal from non-ingested bacteria. Nonetheless, many investigators prefer to resort to ex vivo assays starting from hemocytes collected from third-instar wandering larvae as they are easier to collect and then to analyze, e.g., by FACS. However, it should be pointed out that these hemocytes have been strongly exposed to a peak of ecdysone, which may alter their properties. Like for S2 cells, it is thus not clear whether third-instar larval hemocytes faithfully reproduce the situation in adults. The phagocytic assays are often performed with killed bacteria. Evidence with live microorganisms is better, especially with pathogens. Assays with live bacteria require however, an antibody used in a differential permeabilization protocol. Furthermore, the killing method alters the surface of the microorganisms, a key property for phagocytic uptake. Bacterial surface changes are minimal when microorganisms are killed by X-ray or UV light. These limitations should be kept in mind when proceeding to inference analysis of the consistency of claims. Eater illustrates this point well. Westlake et al. state that:" [...] subsequent studies showed that a null mutation of eater does not impact phagocytosis". The authors refer here to Bretscher et al., Biology Open, 2015, in which binding to heat-killed E. coli was assessed in an ex vivo assay in third instar larvae. In contrast, Chung and Kocks (JBC, 2011) tested whether the recombinant extracellular N-terminal ligand-binding domain was able to bind to bacteria. They found that this domain binds to live Gram-positive bacteria but not to live Gram-negative bacteria. For the latter, killing bacteria with ethanol or heating, but not by formaldehyde treatment, allowed binding. More importantly, Chung and Kocks documented a complex picture in which AMPs may be needed to permeabilize the Gram-negative bacterial cell wall that would then allow access of at least the recombinant secreted Eater extracellular domain to peptidoglycan or peptidoglycan-associated molecules. Thus, the systemic Imd-dependent immune response would be required in vivo to allow Eater-dependent uptake of Gram-negative bacteria by adult hemocytes. In ex vivo assays, any AMPs may be diluted too much to effectively attack the bacterial membrane. A prediction is then that there should be an altered phagocytosis of Gram-negative bacteria in IMD-pathway mutants, e.g., an imd null mutant but not the hypomorphic imd[1] allele. This could easily be tested by ReproSci using the adult phagocytosis assay used by Kocks et al, Cell, 2005. At the very least, the part on the role of Eater in phagocytosis should take the Chung &Kocks study into account, and the conclusions modulated.

      Another point is that some mutant phenotypes may be highly sensitive to the genetic background, for instance, even after isogenization in two different backgrounds. In the framework of a Reproducibility project, there might be no other option for such cases than direct reproduction of the experiment as relying solely on inference may not be reliable enough.

      With respect to the experimental part, some minor weaknesses have been noted. The authors rely on survival to infection experiments, but often do not show any control experiments with mock-challenged or noninfected mutant fly lines. In some cases, monitoring the microbial burden would have strengthened the evidence. For long survival experiments, a check on the health status of the lines (viral microbiota, Wolbachia) would have been welcome. Also, the experimental validation of reagents, RNAi lines, or KO lines is not documented in all cases.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an ambitious and large-scale reproducibility analysis of 400 articles on Drosophila immunity published before 2011. They extract major and minor claims from each article, assess their verifiability through literature comparison and, when possible, through targeted experimental re-testing, and synthesize their findings in an openly accessible online database. The goal is to provide clarity to the community regarding claims that have been contradicted, incompletely supported, or insufficiently followed up in the literature, and to foster broader community participation in evaluating historical findings. The manuscript summarizes the major insights emerging from this systematic effort.

      Strengths:

      (1) Novelty and community value: This work represents a rare example of a systematic, transparent, and community-facing reproducibility project in a specific research domain. The creation of a dedicated public platform for disseminating and discussing these assessments is particularly innovative.

      (2) Breadth and depth: The authors analyze an impressive number of publications spanning multiple decades, and they couple literature-based assessments with new experimental data where follow-up is missing.

      (3) Clarity of purpose: The manuscript carefully distinguishes between assessing evidential support for claims and judging the scientific merit of historical work. This helps frame the project as constructive rather than punitive.

      (4) Metascientific relevance: The analysis identifies methodological and contextual factors that commonly underlie irreproducible claims, providing a useful guide for future study design and interpretation.

      (5) Transparency: Supplementary datasets and the public website provide an exceptional degree of openness, which should facilitate community engagement and further refinement.

      Weaknesses:

      (1) Subjectivity in selection: Despite the authors' efforts, the choice of which papers and claims to highlight cannot be entirely objective. This is an inherent limitation of any retrospective curation effort, but it remains important to acknowledge explicitly.

      (2) Emphasis on irreproducible claims: The manuscript focuses primarily on claims that are challenged or found to be weakly supported. While understandable from the perspective of novelty, this emphasis may risk overshadowing the value of claims that are well supported and reproducible.

      (3) Framing and language: Certain passages could benefit from more neutral phrasing and avoidance of binary terms such as "correct" or "incorrect," in keeping with the open-ended and iterative nature of scientific progress.

      (4) Community interaction with the dataset: While the website is an excellent resource, the manuscript could further clarify how the community is expected to contribute, challenge, or refine the annotations, especially given the large volume of supplementary data.

      (5) Minor inconsistency: The manuscript states that papers from 1959-2011 were included, but the Methods section mentions a range beginning in 1940. This should be aligned for clarity.

      Impact and significance:

      This contribution is likely to have a meaningful impact on both the Drosophila immunity community and the broader scientific ecosystem. It highlights methodological pitfalls, encourages transparent post-publication evaluation, and offers a reusable framework that other fields could adopt. The work also has pedagogical value for early-career researchers entering the field, who often struggle to navigate contradictory or outdated claims. By centralizing and contextualizing these discussions, the manuscript should help accelerate more robust and reproducible research.

    4. Reviewer #3 (Public review):

      Summary:

      In this ambitious study, the authors set out to analyse the validity of a number of claims, both minor and major, from 400 published articles within the field of Drosophila immunity that were published before 2011. The authors were able to determine initially if claims were supported by comparing them to other published literature in the field and, if required, by experimentally testing 'unchallenged' claims that had not been followed up in subsequent published literature. Using this approach, the authors identified a number of claims that had contradictory evidence using new methods or taking into account developments within the field post-initial publication. They put their findings on a publicly available website designed to enable the research community to assess published work within the field with greater clarity.

      Strengths:

      The work presented is rigorous and methodical, the data presentation is high quality, and importantly, the data presented support the conclusions. The discussion is balanced, and the study is written considerately and respectfully, highlighting that the aim of the study is not to assign merit to individual scientists or publications but rather to improve clarity for scientists across the field. The approach carried out by the researchers focuses on testing the validity of the claims made in the original papers rather than testing whether the original experimental methods produced reproducible results. This is an important point since there are many reasons why the original interpretation of data may have understandably led to the claims made. These potential explanations for irreproducible data or conclusions are discussed in detail by the authors for each claim investigated.

      The authors have generated an accompanying website, which provides a valuable tool for the Drosophila Immunity research community that can be used to fact-check key claims and encourages community engagement. This will achieve one important goal of this study - to prevent time loss for scientists who base their research on claims that are irreproducible. The authors rightly point out that it is impossible (and indeed undesirable) to avoid publication of irreproducible results within a field since science is 'an exploratory process where progress is made by constant course correction'. This study is, however, an important piece of work that will make that course correction more efficient.

      Weaknesses:

      I have little to recommend for the improvement of this manuscript. As outlined in my comments above, I am very supportive of this manuscript and think it is a bold and ambitious body of work that is important for the Drosophila immunity field and beyond.

    5. Reviewer #4 (Public review):

      This is an important paper that can do much to set an example for thoughtful and rigorous evaluation of a discipline-wide body of literature. The compiled website of publications in Drosophila immunity is by itself a valuable contribution to the field. There is much to praise in this work, especially including the extensive and careful evaluation of the published literature. However, there are also cautions.

      One notable concern is that the validation experiments are generally done at low sample sizes and low replication rates, and often lack statistical analysis. This is slippery ground for declaring a published study to be untrue. Since the conclusions reported here are nearly all negative, it is essential that the experiments be performed with adequate power to detect the originally described effects. At a minimum, they should be performed with the same sample size and replication structure as the originally reported studies.

      The first section of Results should be an overview of the general accuracy of the literature. Of all claims made in the 400 evaluated papers, what proportion fell into each category of "verified", "unchallenged", "challenged", "mixed", or "partially verified"? This summary overview would provide a valuable assessment of the field as a whole. A detailed dispute of individual highlighted claims could follow the summary overview.

      Section headings are phrased as declarative statements, "Gene X is not involved in process Y", which is more definitive phrasing than we typically use in scientific research. It implies proving a negative, which is difficult and rare, and the evidence provided in the present manuscript generally does not reach that threshold. A more common phrasing would be "We find no evidence that gene X contributes to process Y". A good model for this more qualified phrasing is the "We conclude that while Caspar might affect the Imd pathway in certain tissue-specific contexts, it is unlikely to act as a generic negative regulator of the Imd pathway," concluding the section on the role of Caspar. I am sure the authors feel that the softer, more qualified phrasing would undermine their article's goal of cleansing the literature of inaccuracies, but the hard declarative 'never' statements are difficult to justify unless every validation experiment is done with a high degree of rigor under a variety of experimental conditions. This caveat is acknowledged in the 3rd paragraph of the Discussion, but it is not reflected in the writing of the Results. The caveat should also appear in the Introduction.

      The article is clear that "Claims were assessed as verified, unchallenged, challenged, mixed, or partially verified," but the project is called "reproducibility project" in the 7th line of the abstract, and the website is "ReproSci". The fourth line of the abstract and the introduction call some published research "irreproducible". Most of the present manuscript does not describe reproduction or replication. It describes validation, or independent experimental tests for consistency. Published work is considered validated if subsequent studies using distinct approaches yielded consistent results. For work that the authors consider suspicious, or that has not been subsequently tested, the new experiments provided here do not necessarily recreate the published experiment. Instead, the published result is evaluated with experiments that use different tools or methods, again testing for consistency of results. This is an important form of validation, but it is not reproduction, and it should not be referred to as such. I strongly suggest that variations of the words "reproducible" or "replication" be removed from the manuscript and replaced with "validation". This will be more scientifically accurate and will have the additional benefit of reducing the emotional charge that can be associated with declaring published research to be irreproducible.

      The manuscript includes an explanatory passage in the Results section, "Our project focuses on assessing the strength of the claims themselves (inferential/indirect reproducibility) rather than testing whether the original methods produce repeatable results (results/direct reproducibility). Thus, our conclusions do not directly challenge the initial results leading to a claim, but rather the general applicability of the claim itself." Rather than first appearing in Results, this statement should appear prominently in the abstract and introduction because it is a core element of the premise of the study. This can be combined with the content of the present Disclaimer section into a single paragraph in the Introduction instead of appearing in two redundant passages. I would again encourage the authors to substitute the word validation for reproduction, which would eliminate the need for the invented distinction between indirect versus direct reproduction. It is notable that the authors have chosen to title the relevant Methods section "Experimental Validation" and not "Replication".

      Experimental data "from various laboratories" in the last paragraph of the Introduction and the first paragraph of the Results are ambiguous. Since these new experiments are part of the central core of the manuscript, the specific laboratories contributing them should be named in the two paragraphs. If experiments are being contributed by all authors on the manuscript, it would suffice to say "the authors' laboratories". The attribution to "various labs" appears to be contradicted by the Discussion paragraph 2, which states "the host laboratory has expertise in" antibacterial and antifungal defense, implying a single lab. The claim of expertise by the lead author's laboratory is unnecessary and can be deleted if the Lemaitre lab is the ultimate source of all validation experiments.

      The passage on the controversial role of Duox in the gut is balanced and scholarly, and stands out for its discussion of multiple alternative lines of evidence in the published literature and supplement. This passage may benefit from research by multiple groups following up on the original claims that are not available for other claims, but the tone of the Duox section can be a model for the other sections.

      Comments on other sections and supplements:

      I understand the desire to explain how original results may have been obtained when they are not substantiated by subsequent experiments. However, statements such as "The initial results may have been obtained due to residual impurities in preparations of recombinant GNBP1" and "Non-replicable results on the roles of Spirit, Sphinx and Spheroide in Toll pathway activation may be due to off-target effects common to first-generation RNAi tools" are speculation. No experimental data are presented to support these assertions, so these statements and others like them (currently at the end of most "insights" sections) should not appear in Results. I recognize that the authors are trying to soften their criticism of prior studies by providing explanations for how errors may have occurred innocently. If they wish to do so, the speculative hypotheses should appear in the Discussion.

      The statement in Results that "The initial claim concerning wntD may be explained by a genetic background effect independent of wntD" similarly appears to be a speculation based on the reading of the main text Results. However, the Discussion clarifies that "Here, we obtained the same results as the authors of the claim when using the same mutant lines, but the result does not stand when using an independent mutant of the same gene, indicating the result was likely due to genetic background." That additional explanation in the Discussion greatly increases reader confidence in the Result and should be explained with reference to S5 in the Results. Such complete explanations should be provided everywhere possible without requiring the reader to check the Supplement in each instance.

      In some cases, such as "The results of the initial papers are likely due to the use of ubiquitous overexpression of PGRP-LE, resulting in melanization due to overactivation of the Imd pathway and resulting tissue damage", the claim to explain the original finding would be easy to test. The authors should perform those tests where they can, if they wish to retain the statements in the manuscript. Similarly, the claim "The published data are most consistent with a scenario in which RNAi generated off-target knockdown of a protein related to retinophilin/undertaker, while Undertaker itself is unlikely to have a role in phagocytosis" would be stronger if the authors searched the Drosophila genome for a plausible homolog that might have been impacted by the RNAi construct, and then put forth an argument as to why the off-target gene is more likely to have generated the original phenotype than the nominally targeted gene. There is a brief mention in S19 that junctophilin is the authors' preferred off-target candidate, but no evidence or rationale is presented to support that assertion. If the original RNAi line is still available, it would be easy enough to test whether junctophilin is knocked down as an off-target, and ideally then to use an independent knockdown of junctophilin to recapitulate the original phenotype. Otherwise, the off-target knockdown hypothesis is idle speculation.

      A good model is the passage on extracellular DNA, which states, "experiments performed for ReproSci using the original DNAse IIlo hypomorph show that elevated Diptericin expression in the hypomorph is eliminated by outcrossing of chromosome II, and does not occur in an independent DNAse II null mutant, indicating that this effect is due to genetic background (Supplementary S11)." In this case, the authors have performed a clear experiment that explains the original finding, and inclusion of that explanation is warranted. Similar background replacement experiments in other validations are equally compelling.

      The statement "Analysis of several fly stocks expected to carry the PGRP-SDdS3 mutation used in the initial study revealed the presence of a wild-type copy PGRP-SD, suggesting that either the stock used in this study did not carry the expected mutation, or that the mutation was lost by contamination prior to sharing the stock with other labs" provides a documentable explanation of a potential error in the original two manuscripts, but the subsequent "analysis of several fly stocks" needs citations to published literature or explanation in the supplement. It is unclear from this passage how the wildtype allele in the purportedly mutant stocks could have led to the misattribution of function to PGRP-SD, so that should be explained more clearly in the manuscript.

      The originally claimed anorexia of the Gr28b mutation is explained as having been "likely obtained due to comparison to a wild-type line with unusually high feeding rates". This claim would be stronger if the wildtype line in question were named and data showing a high rate of feeding were presented in the supplement or cited from published literature. Otherwise, this appears to be speculation.

      In the section "The Toll immune pathway is not negatively regulated by wntD", FlyAtlas is cited as evidence that wntD is not expressed in adult flies. However, the FlyAtlas data is not adequately sensitive to make this claim conclusively. If the present authors wish to state that wntD is not expressed in adults, they should do a thorough test themselves and report it in the Supplement.

      Alternatively, the statement "data from FlyAtlas show that wntD is only expressed at the embryonic stage and not at the adult stage at which the experiments were performed by (Gordon et al., 2005a)" could be rephrased to something like "data from FlyAtlas show strong expression of wntD in the embryo but not the adult" and it should be followed by a direct statement that adult expression was also found to be near-undetectable by qPCR in supplement S5. That data is currently "not shown" in the supplement, but it should be shown because this is a central result that is being used to refute the original claim. This manuscript passage should also describe the expression data described in Gordon et al. (2005), for contrast, which was an experimental demonstration of expression in the embryo and a claim "RT-PCR was used to confirm expression of endogenous wntD RNA in adults (data not shown)."

      Inclusion of the section on croquemort is curious because it seems to be focused exclusively on clearance of apoptotic cells in the embryo, not on anything related to immunity. The subsection is titled "Croquemort is not a phagocytic engulfment receptor for apoptotic cells or bacteria", but the text passage contains no mention of phagocytosis of bacteria, and phagocytosis of bacteria is not tested in the S17 supplement. I would suggest deleting this passage entirely if there is not going to be any discussion of the immune-related phenotypes.

      The claim "Toll is not activated by overexpression of GNBP3 or Grass: Experiments performed for ReproSci find that contrary to previous reports, overexpression of GNBP3 (Gottar et al., 2006) or<br /> Grass (El Chamy et al., 2008) in the absence of immune challenge does not effectively activate Toll signaling (Supplementaries S6, S7)" is overly strongly stated unless the authors can directly repeat the original published studies with identical experimental conditions. In the absence of that, the claim in the present manuscript needs to be softened to "we find no evidence that..." or something similar. The definitive claim "does not" presumes that the current experiments are more accurate or correct than the published ones, but no explanation is provided as to why that should be the case. In the absence of a clear and compelling argument as to why the current experiment is more accurate, it appears that there is one study (the original) that obtained a certain result and a second study (the present one) that did not. This can be reported as an inconsistency, but the second experiment does not prove that the first was an error. The same comment applies to the refutation of the roles for Edin and IRC. Even though the current experiments are done in the context of a broader validation study, this does not automatically make them more correct. The present work should adhere to the same standards of reporting that we expect in any other piece of science.

      The statement "Furthermore, evidence from multiple papers suggests that this result, and other instances where mutations have been found to specifically eliminate Defensin expression, is likely due to segregating polymorphisms within Defensin that disrupt primer binding in some genetic backgrounds and lead to a false negative result (Supplementary S20)" should include citations to the multiple papers being referenced. This passage would benefit from a brief summary of the logic presented in S20 regarding the various means of quantifying Defensin expression.

      In S22 Results, the statement "For general characterization of the IrcMB11278 mutant, including developmental and motor defects and survival to septic injury, see additional information on the ReproSci website" is not acceptable. All necessary information associated with the paper needs to be included in the Supplement. There cannot be supporting data relegated to an independent website with no guaranteed stability or version control. The same comment applies to "Our results show that eiger flies do not have reduced feeding compared to appropriate controls (See ReproSci website)" in S25.

      Supplement S21 appears to show a difference between the wildtype and hemese mutants in parasitoid encapsulation, which would support the original finding. However, the validation experiment is performed at a small sample size and is not replicated, so there can be no statistical analysis. There is no reported quantification of lamellocytes or total hemocytes. The validation experiment does not support the conclusion that the original study should be refuted. The S21 evaluation of hemese must either be performed rigorously or removed from the Supplement and the main text.

      In S22, the second sentence of the passage "Due to the fact that IrcMB11278 flies always survived at least 24h prior to death after becoming stuck to the substrate by their wings, we do not attribute the increased mortality in Ecc15-fed IrcMB11278 flies primarily to pathogen ingestion, but rather to locomotor defects. The difference in survival between sucrose-fed and Ecc15-fed IrcMB11278 flies may be explained by the increased viscosity of the Ecc15-containing substrate compared to the sucrose-containing substrate" is quite strange. The first sentence is plausible and a reasonable interpretation of the observations. But to then conclude that the difference between the bacterial treatment versus the control is more plausibly due to substrate viscosity than direct action of the bacteria on the fly is surprising. If the authors wish to put forward that interpretation, they need to test substrate viscosity and demonstrate that fly mortality correlates with viscosity. Otherwise, they must conclude that the validation experiment is consistent with the original study.

      In S27, the visualization of eiger expression using a GFP reporter is very non-standard as a quantitative assay. The correct assay is qPCR, as is performed in other validation experiments, and which can easily be done on dissected fat body for a tissue-specific analysis. S27 Figure 1 should be replaced with a proper experiment and quantitative analysis. In S27 Figure 2, the authors should add a panel showing that eiger is successfully knocked down with each driver>construct combination. This is important because the data being reported show no effect of knockdown; it is therefore imperative to show that the knockdown is actually occurring. The same comment applies everywhere there is an RNAi to demonstrate a lack of effect.

      The Drosomycin expression data in S3 Figure 2A look extremely noisy and are presented without error bars or statistical analysis. The S4 claim that sphinx and spheroid are not regulators of the Toll pathway because quantitative expression levels of these genes do not correlate with Toll target expression levels is an extremely weak inference. The RNAi did not work in S4, so no conclusion should be inferred from those experiments. Although the original claims in dispute may be errors in both cases, the validation data used to refute the original claims must be rigorous and of an acceptable scientific standard.

      In S6 Figure 1, it is inappropriate to plot n=2 data points as a histogram with mean and standard errors. If there are fewer than four independent points, all points should be plotted as a dot plot. This comment applies to many qPCR figures throughout the supplement. In S7 Figure 1, "one representative experiment" out of two performed is shown. This strongly suggests that the two replicates are noisy, and a cynical reader might suspect that the authors are trying to hide the variance. This also applies to S5 Fig 3. Particularly in the context of a validation study, it is imperative to present all data clearly and objectively, especially when these are the specific data that are being used to refute the claim.

      Other comments:

      In S26, the authors suggest that much of the observed melanization arises from excessive tissue damage associated with abdominal injection contrasted to the lesser damage associated with thoracic injection. I believe there may be a methodological difference here. The Methods of S27 are not entirely clear, but it appears that the validation experiment was done with a pinprick, whereas the original Mabary and Schneider study was done with injection via a pulled capillary. My lab group (and I personally) have extensive experience with both techniques. In our hands, pinpricks to the abdomen do indeed cause substantial injury, and the physically less pliable thorax is more robust to pinpricks. However, capillary injections to the abdomen do virtually no tissue damage - very probably less than thoracic injections - and result in substantially higher survivals of infection even than thoracic injections. Thus, the present manuscript may infer substantial tissue damage in the original study because they are employing a different technique.

    1. eLife Assessment

      This important study builds on previous work from the same authors to present a conceptually distinct workflow for cryo-EM reconstruction that uses 2D template matching to enable high-resolution structure determination of small (sub-50 kDa) protein targets. The paper describes how density for small-molecule ligands bound to such targets can be reconstructed without these ligands being present in the template. However, the evidence described for the claim that this technique "significantly" improves the alignment of the reconstruction of small complexes is incomplete. The authors could better evaluate the effects of model bias on the reconstructed densities.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes an application of the high-resolution cryo-EM 2D template matching technique to sub-50kDa complexes. The paper describes how density for ligands can be reconstructed without having to process cryo-EM data through the conventional single particle analysis pipelines.

      Strengths:

      This paper contributes additional data (alongside other papers by the same authors) to convey the message that high-resolution 2D template matching is a powerful alternative for cryo-EM structure determination. The described application to ligand density reconstruction, without the need for extensive refinements, will be of interest to the pharmaceutical industry, where often multiple structures of the same protein in complex with different ligands are solved as part of their drug development pipelines. Improved insights into which particles contribute to the best ligand density are also highly valuable and transferable to other applications of the same technique.

      Weaknesses:

      Although the convenient visualisation of small molecules bound to protein targets of a known structure would be relevant for the pharmaceutical industry, the evidence described for the claim that this technique "significantly" improves alignment of reconstruction of small complexes is incomplete. The authors are encouraged to better evaluate the effects of model bias on the reconstructed densities in a revised paper.

    3. Reviewer #2 (Public review):

      In this manuscript, Zhang et al describe a method for cryo-EM reconstruction of small (sub-50kDa) complexes using 2D template matching. This presents an alternative, complementary path for high-resolution structure determination when there is a prior atomic model for alignment. Importantly, regions of the atomic model can be deleted to avoid bias in reconstructing the structure of these regions, serving as an important mechanism of validation.

      The manuscript focuses its analysis on a recently published dataset of the 40kDa kinase complex deposited to EMPIAR. The original processing workflow produced a medium resolution structure of the kinase (GSFSC ~4.3A, though features of the map indicate ~6-7A resolution); at this resolution, the binding pocket and ligand were not resolved in the original published map. With 2DTM, the authors produce a much higher resolution structure, showing clear density for the ATP binding pocket and the bound ATP molecule. With careful curation of the particle images using statistically derived 2DTM p-values, a high-resolution 2DTM structure was reconstructed from just 8k particles (2.6A non-gold standard FSC; ligand Q-score of 0.6), in contrast to the 74k particles from the original publication. This aligns with recent trends that fewer, higher-quality particles can produce a higher-quality structure. The authors perform a detailed analysis of some of the design choices of the method (e.g., p-value cutoff for particle filtering; how large a region of the template to delete).

      Overall, the workflow is a conceptually elegant alternative to the traditional bottom-up reconstruction pipeline. The authors demonstrate that the p-values from 2DTM correlations provide a principled way to filter/curate which particle images to extract, and the results are impressive. There are only a few minor recommendations that I could make for improvement.